Discussion:
[SciPy-Dev] Adding a convenient method to create ufuncs in for scipy.stats
Warren Weckesser
2017-03-16 19:19:08 UTC
Permalink
I'm working on an update to the Frechet distribution in scipy.stats (see
https://github.com/scipy/scipy/issues/3258 and
https://github.com/scipy/scipy/pull/3275).

Instead jumping through the "lazy_where" hoops that are required for
conditional computations, it would be much easier to create a ufunc for the
standard PDF, CDF and possibly other required functions. Easier, that is,
if I use the ufunc generation tools that we have over in scipy.special.
Would there be any objections to this? We already have quite a few
functions for probability distributions in scipy.special:
https://docs.scipy.org/doc/scipy/reference/special.html#raw-statistical-functions

I wouldn't mind creating ufuncs for some of the other distributions, too.
A ufunc implementation is more efficient, simplifies the code in
scipy.stats, and automatically handles broadcasting.

I'm bringing this up here to see if anyone has any objections to the
expansion of the statistical functions in scipy.special.

Warren
Warren Weckesser
2017-03-16 19:39:49 UTC
Permalink
On Thu, Mar 16, 2017 at 3:19 PM, Warren Weckesser <
Post by Warren Weckesser
I'm working on an update to the Frechet distribution in scipy.stats (see
https://github.com/scipy/scipy/issues/3258 and https://github.com/scipy/
scipy/pull/3275).
Instead jumping through the "lazy_where" hoops that are required for
conditional computations, it would be much easier to create a ufunc for the
standard PDF, CDF and possibly other required functions. Easier, that is,
if I use the ufunc generation tools that we have over in scipy.special.
Would there be any objections to this? We already have quite a few
https://docs.scipy.org/doc/scipy/reference/special.html#
raw-statistical-functions
I wouldn't mind creating ufuncs for some of the other distributions, too.
A ufunc implementation is more efficient, simplifies the code in
scipy.stats, and automatically handles broadcasting.
I'm bringing this up here to see if anyone has any objections to the
expansion of the statistical functions in scipy.special.
Warren
In my previous email, the heading hints at an alternative that I didn't
mention in the text. The question implied in the heading is: what do folks
think about adding ufunc generation tools to scipy.stats, instead of
generating the ufuncs in scipy.special. There are a lot of conditional
computations in scipy.stats that would benefit from being implemented as
ufuncs, but probably don't need to be public functions. So instead of
adding more functions to scipy.special, perhaps we could add code in
scipy.stats for generating ufuncs, many of which would be private. Of
course, we could just generate private ufuncs in scipy.special, and only
use them in scipy.stats.

What do you think?

Warren
Robert Kern
2017-03-16 20:14:32 UTC
Permalink
On Thu, Mar 16, 2017 at 12:39 PM, Warren Weckesser <
Post by Warren Weckesser
On Thu, Mar 16, 2017 at 3:19 PM, Warren Weckesser <
Post by Warren Weckesser
I'm working on an update to the Frechet distribution in scipy.stats (see
https://github.com/scipy/scipy/issues/3258 and
https://github.com/scipy/scipy/pull/3275).
Post by Warren Weckesser
Post by Warren Weckesser
Instead jumping through the "lazy_where" hoops that are required for
conditional computations, it would be much easier to create a ufunc for the
standard PDF, CDF and possibly other required functions. Easier, that is,
if I use the ufunc generation tools that we have over in scipy.special.
Would there be any objections to this? We already have quite a few
functions for probability distributions in scipy.special:
https://docs.scipy.org/doc/scipy/reference/special.html#raw-statistical-functions
Post by Warren Weckesser
Post by Warren Weckesser
I wouldn't mind creating ufuncs for some of the other distributions,
too. A ufunc implementation is more efficient, simplifies the code in
scipy.stats, and automatically handles broadcasting.
Post by Warren Weckesser
Post by Warren Weckesser
I'm bringing this up here to see if anyone has any objections to the
expansion of the statistical functions in scipy.special.
Post by Warren Weckesser
Post by Warren Weckesser
Warren
In my previous email, the heading hints at an alternative that I didn't
mention in the text. The question implied in the heading is: what do folks
think about adding ufunc generation tools to scipy.stats, instead of
generating the ufuncs in scipy.special. There are a lot of conditional
computations in scipy.stats that would benefit from being implemented as
ufuncs, but probably don't need to be public functions. So instead of
adding more functions to scipy.special, perhaps we could add code in
scipy.stats for generating ufuncs, many of which would be private. Of
course, we could just generate private ufuncs in scipy.special, and only
use them in scipy.stats.

+1 for adding additional more standard PDF/CDF functions to scipy.special
as needed.

There's already precedent for putting statistics-related but not
distribution-related ufuncs into scipy.special, specifically for the
conditional operations, e.g. boxcox(). On the other hand, if the functions
you are thinking of would not be part of the public API, then I'd prefer to
implement them in scipy.stats instead of scipy.special.

What work do you think is entailed in implementing the ufuncs in
scipy.stats? Is there infrastructure we need to duplicate? Can we abstract
out the build infrastructure to a common place? I haven't looked at the
build details for scipy.special in some time.

--
Robert Kern
Joshua Wilson
2017-03-16 20:28:10 UTC
Permalink
Can we abstract out the build infrastructure to a common place?
This should be fairly easy to do. It could be set up so that each
module that needs ufuncs could have a local config file (cribbed off
of the current `FUNCS` string in `generate_ufuncs.py`).

Though I also don't object to adding more functions to special.
On Thu, Mar 16, 2017 at 12:39 PM, Warren Weckesser
Post by Warren Weckesser
On Thu, Mar 16, 2017 at 3:19 PM, Warren Weckesser
Post by Warren Weckesser
I'm working on an update to the Frechet distribution in scipy.stats (see
https://github.com/scipy/scipy/issues/3258 and
https://github.com/scipy/scipy/pull/3275).
Instead jumping through the "lazy_where" hoops that are required for
conditional computations, it would be much easier to create a ufunc for the
standard PDF, CDF and possibly other required functions. Easier, that is,
if I use the ufunc generation tools that we have over in scipy.special.
Would there be any objections to this? We already have quite a few
https://docs.scipy.org/doc/scipy/reference/special.html#raw-statistical-functions
I wouldn't mind creating ufuncs for some of the other distributions, too.
A ufunc implementation is more efficient, simplifies the code in
scipy.stats, and automatically handles broadcasting.
I'm bringing this up here to see if anyone has any objections to the
expansion of the statistical functions in scipy.special.
Warren
In my previous email, the heading hints at an alternative that I didn't
mention in the text. The question implied in the heading is: what do folks
think about adding ufunc generation tools to scipy.stats, instead of
generating the ufuncs in scipy.special. There are a lot of conditional
computations in scipy.stats that would benefit from being implemented as
ufuncs, but probably don't need to be public functions. So instead of
adding more functions to scipy.special, perhaps we could add code in
scipy.stats for generating ufuncs, many of which would be private. Of
course, we could just generate private ufuncs in scipy.special, and only use
them in scipy.stats.
+1 for adding additional more standard PDF/CDF functions to scipy.special as
needed.
There's already precedent for putting statistics-related but not
distribution-related ufuncs into scipy.special, specifically for the
conditional operations, e.g. boxcox(). On the other hand, if the functions
you are thinking of would not be part of the public API, then I'd prefer to
implement them in scipy.stats instead of scipy.special.
What work do you think is entailed in implementing the ufuncs in
scipy.stats? Is there infrastructure we need to duplicate? Can we abstract
out the build infrastructure to a common place? I haven't looked at the
build details for scipy.special in some time.
--
Robert Kern
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
Joshua Wilson
2017-03-16 20:38:41 UTC
Permalink
ps--if we decide to abstract out the infrastructure I'd be willing to
write the code.

On Thu, Mar 16, 2017 at 3:28 PM, Joshua Wilson
Post by Joshua Wilson
Can we abstract out the build infrastructure to a common place?
This should be fairly easy to do. It could be set up so that each
module that needs ufuncs could have a local config file (cribbed off
of the current `FUNCS` string in `generate_ufuncs.py`).
Though I also don't object to adding more functions to special.
On Thu, Mar 16, 2017 at 12:39 PM, Warren Weckesser
Post by Warren Weckesser
On Thu, Mar 16, 2017 at 3:19 PM, Warren Weckesser
Post by Warren Weckesser
I'm working on an update to the Frechet distribution in scipy.stats (see
https://github.com/scipy/scipy/issues/3258 and
https://github.com/scipy/scipy/pull/3275).
Instead jumping through the "lazy_where" hoops that are required for
conditional computations, it would be much easier to create a ufunc for the
standard PDF, CDF and possibly other required functions. Easier, that is,
if I use the ufunc generation tools that we have over in scipy.special.
Would there be any objections to this? We already have quite a few
https://docs.scipy.org/doc/scipy/reference/special.html#raw-statistical-functions
I wouldn't mind creating ufuncs for some of the other distributions, too.
A ufunc implementation is more efficient, simplifies the code in
scipy.stats, and automatically handles broadcasting.
I'm bringing this up here to see if anyone has any objections to the
expansion of the statistical functions in scipy.special.
Warren
In my previous email, the heading hints at an alternative that I didn't
mention in the text. The question implied in the heading is: what do folks
think about adding ufunc generation tools to scipy.stats, instead of
generating the ufuncs in scipy.special. There are a lot of conditional
computations in scipy.stats that would benefit from being implemented as
ufuncs, but probably don't need to be public functions. So instead of
adding more functions to scipy.special, perhaps we could add code in
scipy.stats for generating ufuncs, many of which would be private. Of
course, we could just generate private ufuncs in scipy.special, and only use
them in scipy.stats.
+1 for adding additional more standard PDF/CDF functions to scipy.special as
needed.
There's already precedent for putting statistics-related but not
distribution-related ufuncs into scipy.special, specifically for the
conditional operations, e.g. boxcox(). On the other hand, if the functions
you are thinking of would not be part of the public API, then I'd prefer to
implement them in scipy.stats instead of scipy.special.
What work do you think is entailed in implementing the ufuncs in
scipy.stats? Is there infrastructure we need to duplicate? Can we abstract
out the build infrastructure to a common place? I haven't looked at the
build details for scipy.special in some time.
--
Robert Kern
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
Warren Weckesser
2017-03-17 04:24:59 UTC
Permalink
Post by Joshua Wilson
ps--if we decide to abstract out the infrastructure I'd be willing to
write the code.
Great! I know you've been doing a lot of work in scipy.special, so you
probably know the code generation parts better than most of us.

Warren
Post by Joshua Wilson
On Thu, Mar 16, 2017 at 3:28 PM, Joshua Wilson
Post by Joshua Wilson
Can we abstract out the build infrastructure to a common place?
This should be fairly easy to do. It could be set up so that each
module that needs ufuncs could have a local config file (cribbed off
of the current `FUNCS` string in `generate_ufuncs.py`).
Though I also don't object to adding more functions to special.
On Thu, Mar 16, 2017 at 12:39 PM, Warren Weckesser
Post by Warren Weckesser
On Thu, Mar 16, 2017 at 3:19 PM, Warren Weckesser
Post by Warren Weckesser
I'm working on an update to the Frechet distribution in scipy.stats
(see
Post by Joshua Wilson
Post by Warren Weckesser
Post by Warren Weckesser
https://github.com/scipy/scipy/issues/3258 and
https://github.com/scipy/scipy/pull/3275).
Instead jumping through the "lazy_where" hoops that are required for
conditional computations, it would be much easier to create a ufunc
for the
Post by Joshua Wilson
Post by Warren Weckesser
Post by Warren Weckesser
standard PDF, CDF and possibly other required functions. Easier,
that is,
Post by Joshua Wilson
Post by Warren Weckesser
Post by Warren Weckesser
if I use the ufunc generation tools that we have over in
scipy.special.
Post by Joshua Wilson
Post by Warren Weckesser
Post by Warren Weckesser
Would there be any objections to this? We already have quite a few
https://docs.scipy.org/doc/scipy/reference/special.html#
raw-statistical-functions
Post by Joshua Wilson
Post by Warren Weckesser
Post by Warren Weckesser
I wouldn't mind creating ufuncs for some of the other distributions,
too.
Post by Joshua Wilson
Post by Warren Weckesser
Post by Warren Weckesser
A ufunc implementation is more efficient, simplifies the code in
scipy.stats, and automatically handles broadcasting.
I'm bringing this up here to see if anyone has any objections to the
expansion of the statistical functions in scipy.special.
Warren
In my previous email, the heading hints at an alternative that I didn't
mention in the text. The question implied in the heading is: what do
folks
Post by Joshua Wilson
Post by Warren Weckesser
think about adding ufunc generation tools to scipy.stats, instead of
generating the ufuncs in scipy.special. There are a lot of conditional
computations in scipy.stats that would benefit from being implemented
as
Post by Joshua Wilson
Post by Warren Weckesser
ufuncs, but probably don't need to be public functions. So instead of
adding more functions to scipy.special, perhaps we could add code in
scipy.stats for generating ufuncs, many of which would be private. Of
course, we could just generate private ufuncs in scipy.special, and
only use
Post by Joshua Wilson
Post by Warren Weckesser
them in scipy.stats.
+1 for adding additional more standard PDF/CDF functions to
scipy.special as
Post by Joshua Wilson
needed.
There's already precedent for putting statistics-related but not
distribution-related ufuncs into scipy.special, specifically for the
conditional operations, e.g. boxcox(). On the other hand, if the
functions
Post by Joshua Wilson
you are thinking of would not be part of the public API, then I'd
prefer to
Post by Joshua Wilson
implement them in scipy.stats instead of scipy.special.
What work do you think is entailed in implementing the ufuncs in
scipy.stats? Is there infrastructure we need to duplicate? Can we
abstract
Post by Joshua Wilson
out the build infrastructure to a common place? I haven't looked at the
build details for scipy.special in some time.
--
Robert Kern
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
Warren Weckesser
2017-03-17 04:20:52 UTC
Permalink
Post by Robert Kern
On Thu, Mar 16, 2017 at 12:39 PM, Warren Weckesser <
Post by Warren Weckesser
On Thu, Mar 16, 2017 at 3:19 PM, Warren Weckesser <
Post by Warren Weckesser
I'm working on an update to the Frechet distribution in scipy.stats
(see https://github.com/scipy/scipy/issues/3258 and
https://github.com/scipy/scipy/pull/3275).
Post by Warren Weckesser
Post by Warren Weckesser
Instead jumping through the "lazy_where" hoops that are required for
conditional computations, it would be much easier to create a ufunc for the
standard PDF, CDF and possibly other required functions. Easier, that is,
if I use the ufunc generation tools that we have over in scipy.special.
Would there be any objections to this? We already have quite a few
https://docs.scipy.org/doc/scipy/reference/special.html#
raw-statistical-functions
Post by Warren Weckesser
Post by Warren Weckesser
I wouldn't mind creating ufuncs for some of the other distributions,
too. A ufunc implementation is more efficient, simplifies the code in
scipy.stats, and automatically handles broadcasting.
Post by Warren Weckesser
Post by Warren Weckesser
I'm bringing this up here to see if anyone has any objections to the
expansion of the statistical functions in scipy.special.
Post by Warren Weckesser
Post by Warren Weckesser
Warren
In my previous email, the heading hints at an alternative that I didn't
mention in the text. The question implied in the heading is: what do folks
think about adding ufunc generation tools to scipy.stats, instead of
generating the ufuncs in scipy.special. There are a lot of conditional
computations in scipy.stats that would benefit from being implemented as
ufuncs, but probably don't need to be public functions. So instead of
adding more functions to scipy.special, perhaps we could add code in
scipy.stats for generating ufuncs, many of which would be private. Of
course, we could just generate private ufuncs in scipy.special, and only
use them in scipy.stats.
+1 for adding additional more standard PDF/CDF functions to scipy.special
as needed.
There's already precedent for putting statistics-related but not
distribution-related ufuncs into scipy.special, specifically for the
conditional operations, e.g. boxcox(). On the other hand, if the functions
you are thinking of would not be part of the public API, then I'd prefer to
implement them in scipy.stats instead of scipy.special.
What work do you think is entailed in implementing the ufuncs in
scipy.stats? Is there infrastructure we need to duplicate? Can we abstract
out the build infrastructure to a common place? I haven't looked at the
build details for scipy.special in some time.
The code that generates the ufunc boilerplate code is in
scipy/special/generate_ufuncs.py. It generates the appropriate wrapper
code for a core scalar function that is written in Cython, C or C++. I
just submitted a pull request (https://github.com/scipy/scipy/pull/7190,
still WIP) in which I wrote the core distribution functions for the Frechet
distribution in Cython, added the signature information to the big honkin'
FUNCS string in generate_ufuncs.py, added placeholders for the docstrings
in add_newdocs.py, and then used the ufuncs in the implementation of the
`frechet` class in stats.

For the moment, the Frechet distribution ufuncs are in scipy.special, and
they are private, but a trivial change will make them public, if there is
interest. I don't have a strong opinion either way, but as you say, there
is a precedent for including them as public functions in scipy.special.
If we start converting existing distribution implementations (which I think
would be a good thing for the stats code), we'll end up with a *lot* more
functions being added somewhere.

Warren
Post by Robert Kern
--
Robert Kern
_______________________________________________
SciPy-Dev mailing list
https://mail.scipy.org/mailman/listinfo/scipy-dev
Loading...