Discussion:
[SciPy-Dev] GSoC'17: Circular statistics
Vladislav Iakovlev
2017-03-11 19:50:21 UTC
Permalink
Hello!


My name is Vladislav Iakovlev. I am a master student of Department of
Applied Math, HSE University, Moscow, Russia. I have never contributed to
open source projects, but I would be happy to start it with SciPy.


I noticed that existing functionality for circular statistics is
insufficient. So, I want to suggest an idea for GSoC: implement it to
scipy.stats. My plan is to do it through the next steps:

1) Develop the class rv_circular, analogous to rv_continuous adjusted
to circular statistic functions.

2) Develop derived classes for circular distributions.

3) Develop point estimations and statistical tests functions.

During the summer, I assume to implement materials from chapters 1-8 of the
book “MARDIA, K. V. AND JUPP , P. E. (2000), Directional Statistics, John
Wiley”, documentation and unit tests for it.


Is this idea interesting for the Community? I’m glad to any feedback.
j***@gmail.com
2017-03-11 21:11:04 UTC
Permalink
Post by Vladislav Iakovlev
Hello!
My name is Vladislav Iakovlev. I am a master student of Department of
Applied Math, HSE University, Moscow, Russia. I have never contributed to
open source projects, but I would be happy to start it with SciPy.
I noticed that existing functionality for circular statistics is
insufficient. So, I want to suggest an idea for GSoC: implement it to
1) Develop the class rv_circular, analogous to rv_continuous adjusted
to circular statistic functions.
2) Develop derived classes for circular distributions.
3) Develop point estimations and statistical tests functions.
During the summer, I assume to implement materials from chapters 1-8 of the
book “MARDIA, K. V. AND JUPP , P. E. (2000), Directional Statistics, John
Wiley”, documentation and unit tests for it.
Is this idea interesting for the Community? I’m glad to any feedback.
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
More general question: What should be the future of circular stats in python?

https://github.com/circstat/pycircstat is a python translation of the
matlab toolbox. It doesn't have a license (MIT is commented out in
setup.py) but the original matlab toolbox was BSD licensed on the file
exchange.

I worked on circularstats for several months in 2012 and didn't look
at it since then. I also translated the matlab toolbox and added some
more parts based on Mardia.
http://josef-pkt.github.io/pages/packages/circularstats/circular.html
table of content only, I never open sourced it. (*)
(Related: some work on mixtures of VonMises distributions
https://github.com/rc/dist_mixtures )
I looked into it briefly again
https://github.com/statsmodels/statsmodels/issues/3530 but mainly with
respect to adding regression with circular response variable.


(*) AFAIR, I didn't implement (m)any distribution in scipy
distribution style. I was looking mainly at generic constructors for
distributions, e.g. using wrapping to go from distribution on the real
line to distributions on the circle based on characteristic function
and fourier expansion.

Josef
Evgeni Burovski
2017-03-16 15:46:00 UTC
Permalink
Hi,
Post by j***@gmail.com
Post by Vladislav Iakovlev
Hello!
My name is Vladislav Iakovlev. I am a master student of Department of
Applied Math, HSE University, Moscow, Russia. I have never contributed to
open source projects, but I would be happy to start it with SciPy.
I noticed that existing functionality for circular statistics is
insufficient. So, I want to suggest an idea for GSoC: implement it to
1) Develop the class rv_circular, analogous to rv_continuous adjusted
to circular statistic functions.
2) Develop derived classes for circular distributions.
3) Develop point estimations and statistical tests functions.
During the summer, I assume to implement materials from chapters 1-8 of the
book “MARDIA, K. V. AND JUPP , P. E. (2000), Directional Statistics, John
Wiley”, documentation and unit tests for it.
Is this idea interesting for the Community? I’m glad to any feedback.
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
Full disclosure: I work at HSE where Vladislav is a student and we are
in touch off-line.
Post by j***@gmail.com
More general question: What should be the future of circular stats in python?
My personal take is that basic things are in scope for scipy and more
advanced things are better suited for statsmodels (e.g., a regression
w.r.t. a circular variable? maybe except the analog of
scipy.stats.linregress).

There is certainly room for rv_circular variable
(https://github.com/scipy/scipy/issues/4598#issuecomment-77359593).
There is a series of PRs from astropy which stall because of backwards
compat: https://github.com/scipy/scipy/pull/5747, also
https://github.com/scipy/scipy/issues/6644

I personally find it a bit odd to have to install the whole of astropy
to do a one-off calculation of a circular median or something like
that. But it can be just me :-).
If astropy devs want to collaborate, all the better.

IMO, this can be a small subpackage in scipy.stats,
scipy.stats.circular (a made-up name).
Post by j***@gmail.com
https://github.com/circstat/pycircstat is a python translation of the
matlab toolbox. It doesn't have a license (MIT is commented out in
setup.py) but the original matlab toolbox was BSD licensed on the file
exchange.
The (lack of a) license is a bit of a problem. It also seems to depend
on pandas.
Post by j***@gmail.com
I worked on circularstats for several months in 2012 and didn't look
at it since then. I also translated the matlab toolbox and added some
more parts based on Mardia.
http://josef-pkt.github.io/pages/packages/circularstats/circular.html
table of content only, I never open sourced it. (*)
Would you be interested in working on this again? (working, advising,
reviewing, some other form of participation?)

Cheers,

Evgeni

Todd
2017-03-15 22:47:57 UTC
Permalink
On Mar 11, 2017 14:50, "Vladislav Iakovlev" <***@gmail.com> wrote:

Hello!


My name is Vladislav Iakovlev. I am a master student of Department of
Applied Math, HSE University, Moscow, Russia. I have never contributed to
open source projects, but I would be happy to start it with SciPy.


I noticed that existing functionality for circular statistics is
insufficient. So, I want to suggest an idea for GSoC: implement it to
scipy.stats. My plan is to do it through the next steps:

1) Develop the class rv_circular, analogous to rv_continuous adjusted
to circular statistic functions.

2) Develop derived classes for circular distributions.

3) Develop point estimations and statistical tests functions.

During the summer, I assume to implement materials from chapters 1-8 of the
book “MARDIA, K. V. AND JUPP , P. E. (2000), Directional Statistics, John
Wiley”, documentation and unit tests for it.


Is this idea interesting for the Community? I’m glad to any feedback.


This may be a little crazy, but would it be possible to wait for
parameterized dtypes and create a circular dtype to handle all this
automagically?
j***@gmail.com
2017-03-15 23:30:49 UTC
Permalink
Post by Vladislav Iakovlev
Hello!
My name is Vladislav Iakovlev. I am a master student of Department of
Applied Math, HSE University, Moscow, Russia. I have never contributed to
open source projects, but I would be happy to start it with SciPy.
I noticed that existing functionality for circular statistics is
insufficient. So, I want to suggest an idea for GSoC: implement it to
1) Develop the class rv_circular, analogous to rv_continuous adjusted
to circular statistic functions.
2) Develop derived classes for circular distributions.
3) Develop point estimations and statistical tests functions.
During the summer, I assume to implement materials from chapters 1-8 of the
book “MARDIA, K. V. AND JUPP , P. E. (2000), Directional Statistics, John
Wiley”, documentation and unit tests for it.
Is this idea interesting for the Community? I’m glad to any feedback.
This may be a little crazy, but would it be possible to wait for
parameterized dtypes and create a circular dtype to handle all this
automagically?
A dtype will not know statistics. AFAIU, I would be just something
like a substitute for units as in astropy, somewhere there still needs
to be the code to compute statistics on a circle or sphere or ...

(Personally, I like plain float and complex ndarrays and prefer to
leave all units, datetimes and whatever in a wrapper for the interface
but out of the algorithms.)

One design decision for rv_circular is how to handle integration on a
circle, i.e. how to map cdf on R into a cdf on a circle, and the
associated integration over subintervals.
vonmised.cdf is defined on R which turned out to be computationally
convenient but doesn't make a "proper" cdf that is e.g. limited to [0,
1] and doesn't define support bounds for integration.

Josef
Post by Vladislav Iakovlev
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
Loading...