[SciPy-dev] fast gaussian filtering

Discussion:

Sturla Molden

2008-12-19 15:59:45 UTC

Gaussian filtering is a very useful method for smoothing signals and
images, as well as for kernel density estimation. I have previously used
it to estimate the firing rates of neurons.

I've noticed that SciPy implements Gaussian filtering (in the ndimage
package) using convolution. It is implemented by loops in Python, not
even using a vectorized numpy.convolve or an FFT. For the sake of speed,
SciPy is using the worst thinkable implementation.

To make it fast:

Gaussian filtering can be implemented using recursion (Young & Vliet,
1995; Signal Processing, 44: 139-151). The recursive approximation is
very accurate and does not introduce ringing. It is anti-causal
(forward-backward) and has zero phase response. The filtering can be
done in C using scipy's signal.lfilter function.

I found a solution in plain C on the Matlab central. But there is really
no benefit from doing this in C as lfilter, flipud and fliplr does all
the hard work. An efficient Matlab version could have been constructed
similarly, without resorting to C.

http://www.mathworks.com/matlabcentral/fileexchange/5087
http://www.ph.tn.tudelft.nl/~lucas/publications/papersLJvV.html

Note that Young & Vliet (1995) also provides recursive solutions for the
first, second and third derivatives. They are equally easy to implement
using lfilter.

Regards,
Sturla Molden

from numpy import array, zeros, ones, flipud, fliplr
from scipy.signal import lfilter
from math import sqrt

def __gausscoeff(s):
if s < .5: raise ValueError, \
'Sigma for Gaussian filter must be >0.5 samples'
q = 0.98711*s - 0.96330 if s > 0.5 else 3.97156 \
- 4.14554*sqrt(1.0 - 0.26891*s)
b = zeros(4)
b[0] = 1.5785 + 2.44413*q + 1.4281*q**2 + 0.422205*q**3
b[1] = 2.44413*q + 2.85619*q**2 + 1.26661*q**3
b[2] = -(1.4281*q**2 + 1.26661*q**3)
b[3] = 0.422205*q**3
B = 1.0 - ((b[1] + b[2] + b[3])/b[0])
# convert to a format compatible with lfilter's
# difference equation
B = array([B])
A = ones(4)
A[1:] = -b[1:]/b[0]
return B,A

def Gaussian1D(signal, sigma, padding=0):
n = signal.shape[0]
tmp = zeros(n + padding)
if tmp.shape[0] < 4: raise ValueError, \
'Signal and padding too short'
tmp[:n] = signal
B,A = __gausscoeff(sigma)
tmp = lfilter(B, A, tmp)
tmp = tmp[::-1]
tmp = lfilter(B, A, tmp)
tmp = tmp[::-1]
return tmp[:n]

def Gaussian2D(image, sigma, padding=0):
n,m = image.shape[0],image.shape[1]
tmp = zeros((n + padding, m + padding))
if tmp.shape[0] < 4: raise ValueError, \
'Image and padding too small'
if tmp.shape[1] < 4: raise ValueError, \
'Image and padding too small'
B,A = __gausscoeff(sigma)
tmp[:n,:m] = image
tmp = lfilter(B, A, tmp, axis=0)
tmp = flipud(tmp)
tmp = lfilter(B, A, tmp, axis=0)
tmp = flipud(tmp)
tmp = lfilter(B, A, tmp, axis=1)
tmp = fliplr(tmp)
tmp = lfilter(B, A, tmp, axis=1)
tmp = fliplr(tmp)
return tmp[:n,:m]

Zachary Pincus

2008-12-19 17:31:41 UTC

Permalink

Hi,

Post by Sturla Molden
I've noticed that SciPy implements Gaussian filtering (in the ndimage
package) using convolution. It is implemented by loops in Python, not
even using a vectorized numpy.convolve or an FFT. For the sake of speed,
SciPy is using the worst thinkable implementation.

What makes you think this?

A brief look at the implementation in scipy/ndimage/filters.py shows
that scipy.ndimage.gaussian_filter, after doing some set-up, defers to
gaussian_filter1d to do the filtering on each dimension (as Gaussian
filtering is separable). Then gaussian_filter1d uses for-loops to do
some further set-up, like calculating the filter kernel, and then
defers to correlate1d for the actual filtering. This function does
some error-checking, and then calls _nd_image.correlate1d. The
_nd_image library is a C extension module: _nd_image.correlate1d
corresponds to Py_Correlate1D, which is defined in scipy/ndimage/src/
nd_image.c, and which calls to NI_Correlate1D to do the lifting.
NI_Correlate1D, in scipy/ndimage/src/ni_filters.c, implements the
filtering in a straightforward loop, but also takes advantage where it
can of filter-kernel symmetry or anti-symmetry.

Aside from the several thin layers of error-checking and set-up, the
code does the filtering in C in about the fastest way possible that
doesn't use recursion or FFT. (And even then, I don't know under what
circumstances the direct-convolution method might still win --
probably with small kernels.)

Post by Sturla Molden
Gaussian filtering can be implemented using recursion (Young & Vliet,
1995; Signal Processing, 44: 139-151). The recursive approximation is
very accurate and does not introduce ringing. It is anti-causal
(forward-backward) and has zero phase response. The filtering can be
done in C using scipy's signal.lfilter function.

Your implementation looks pretty straightforward. Here are some
timings (sm_gaussian is your Gaussian2D, nd_gaussian is
scipy.ndimage.gaussian_filter):

In: img = numpy.random.rand(100, 100)

In: timeit sm_gaussian(img, 2)
1000 loops, best of 3: 1.54 ms per loop

In: timeit nd_gaussian(img, 2)
1000 loops, best of 3: 1.26 ms per loop

In: timeit sm_gaussian(img, 20)
1000 loops, best of 3: 1.55 ms per loop

In: timeit nd_gaussian(img, 20)
100 loops, best of 3: 7.41 ms per loop

In: img = numpy.random.rand(1000, 1000)

In: timeit sm_gaussian(img, 2)
10 loops, best of 3: 168 ms per loop

In: timeit nd_gaussian(img, 2)
10 loops, best of 3: 92.8 ms per loop

In: timeit sm_gaussian(img, 20)
10 loops, best of 3: 167 ms per loop

In: timeit nd_gaussian(img, 20)
10 loops, best of 3: 657 ms per loop

As expected, the recursive approach is a definite win with larger
kernels, but with small kernels the constant factors dominate and the
straightforward implementation is narrowly faster. At the very least,
you should post this implementation on the cookbook page...

Zach

Sturla Molden

2008-12-19 17:45:59 UTC

Permalink

Post by Zachary Pincus
A brief look at the implementation in scipy/ndimage/filters.py shows
that scipy.ndimage.gaussian_filter, after doing some set-up, defers to
gaussian_filter1d to do the filtering on each dimension (as Gaussian
filtering is separable). Then gaussian_filter1d uses for-loops to do
some further set-up, like calculating the filter kernel, and then
defers to correlate1d for the actual filtering. This function does
some error-checking, and then calls _nd_image.correlate1d.

Ok, sorry, I did not read the code thoroughly then. :)

Post by Zachary Pincus
Aside from the several thin layers of error-checking and set-up, the
code does the filtering in C in about the fastest way possible that
doesn't use recursion or FFT. (And even then, I don't know under what
circumstances the direct-convolution method might still win --
probably with small kernels.)

Yes, for kernels smaller than 10 or so it will be faster. But a Gaussian
is actually not of finite size. So a finite sized Gaussian is actually a
misnomer; it should be a truncated Gaussian. The recursion does not
truncate the Gaussian kernel (but at some point the impulse response
will be so small that it falls into the range of rounding error).

Sturla Molden

Zachary Pincus

2008-12-19 18:51:30 UTC

Permalink

Post by Sturla Molden
Yes, for kernels smaller than 10 or so it will be faster. But a Gaussian
is actually not of finite size. So a finite sized Gaussian is
actually a
misnomer; it should be a truncated Gaussian. The recursion does not
truncate the Gaussian kernel (but at some point the impulse response
will be so small that it falls into the range of rounding error).

Aah, cool -- that's good to know. I'll definitely keep your
implementation handy. It would definitely be great up on the cookbook!

Zach