[SciPy-Dev] On the leastsq/curve

Post by Gianfranco Durin
Dear all,
I wanted to briefly show you the results of an analysis I did on the performances of the optimize.leastsq method for data fitting. I presented these results at the last Python in Physics workshop. You can download the pdf here: http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error bar of the fitting parameters. In the docs, it is set that "this matrix must be multiplied by the residual standard deviation to get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially correct. It is correct if there are no error bars of the input data (the sigma of curve_fit is None). But if provided, they are used as "as weights in least-squares problem" (curve_fit doc), and cov_x gives directly the covariance of the parameter estimates (i.e. the diagonal terms are the errors in the parameters). See for instance here: http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of-best-fit-parameters.html.
This means that not only the doc needs fixing, but also the curve_fit code, those estimation of the parameters' error is INDEPENDENT of the values of the data errors in the case they are constant, which is clearly wrong.
I have never provided a patch, but the fix should be quite simple, just please give me indication on how to do that.

Since this depends on what you define as weight or sigma, both are
correct but use different definitions.

Since we just had this discussion, I'm not arguing again. I just want
to have clear definitions that the "average" user can use by default.
I don't really care which it is if the majority of users are engineers
who can tell what their errors variances are before doing any
estimation.

Josef

Post by Gianfranco Durin
2. The convergence of the fit in the most difficult cases (see page 15 of my presentation) can required up to about 3000 iterations, reduced to 800 when using analytical derivatives. For quite a long time I did not realized that the fit needed more iterations that the number set by maxfev, and thus I started to think that the leastsq was not good enough for 'hard' data.
As a matter of fact,
maxfev : int
The maximum number of calls to the function. If zero, then 100*(N+1) is
the maximum where N is the number of elements in x0.
is pretty low, so I suggest to increase the prefactor 100 to 1000. A relatively huge number is not a problem, by the way, because if the system is sloppy. i.e. one parameter does not move too much, the routine stops and complains with "Both actual and predicted relative reductions in the sum of squares are at most 0.000000 and the relative error between two consecutive iterates is at most 0.000000" as in the case of boxBod at pg. 15.
By the way, we should also advice that the in case of analytical derivative this number is half, even if I personally would keep the same number for both cases.
This is all for the moment.
Many thanks for your attention, and sorry for the long mail
Gianfranco
_______________________________________________
SciPy-Dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev

Gianfranco Durin

2011-09-26 13:57:25 UTC

----- Original Message -----

Post by Gianfranco Durin
Dear all,
I wanted to briefly show you the results of an analysis I did on
the performances of the optimize.leastsq method for data fitting.
I presented these results at the last Python in Physics workshop.
http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error
bar of the fitting parameters. In the docs, it is set that "this
matrix must be multiplied by the residual standard deviation to
get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially
correct. It is correct if there are no error bars of the input
data (the sigma of curve_fit is None). But if provided, they are
used as "as weights in least-squares problem" (curve_fit doc), and
cov_x gives directly the covariance of the parameter estimates
(i.e. the diagonal terms are the errors in the parameters). See
http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of-best-fit-parameters.html.
This means that not only the doc needs fixing, but also the
curve_fit code, those estimation of the parameters' error is
INDEPENDENT of the values of the data errors in the case they are
constant, which is clearly wrong.
I have never provided a patch, but the fix should be quite simple,
just please give me indication on how to do that.

Since this depends on what you define as weight or sigma, both are
correct but use different definitions.

Of course, but this is not the point. Let me explain.

In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the jacobian. The Jacobian, unless provided directly, is calculated using the definition of the residuals, which in curve_fit method are "_general_function", and "_weighted_general_function". The latter function explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1, where W is the matrix of the weights, and J is just the matrix of the first derivatives. Thus in this case, the diagonal elements of cov_x give the variance of the parameters. No need to multiply by the residual standard deviation.
In case of W == 1, i.e. no errors in the data are provided, as reported here http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted_linear_least_squares

one uses the variance of the observations, i.e. uses the

s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))

as an estimate of the variance, as done in curve_fit.

BUT, we cannot multiply the cov_x obtained with the _weighted_general_function by s_sq. As I told, we already took it into account in the definition of the residuals. Thus:

if (len(ydata) > len(p0)) and pcov is not None:
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
pcov = pcov * s_sq
else:
pcov = inf

where func can be both "_general_function" and "_weighted_general_function", is not correct.

Post by j***@gmail.com
Since we just had this discussion, I'm not arguing again. I just want
to have clear definitions that the "average" user can use by default.
I don't really care which it is if the majority of users are
engineers
who can tell what their errors variances are before doing any
estimation.

Oh, interesting. Where did you have this discussion? On this list? I could not find it...

Here the problem is not to decide an 'average' behaviour, but to correctly calculate the parameters' error when the user does or does not provide the errors in the data.

Hope this helps
Gianfranco

Paul Kuin

2011-09-26 14:31:31 UTC

I am just a simple user, but perhaps this gives some idea of what us users
experience:

I got so fed up with leastsq not having good documentation, not being able to
set limits to the parameters to be fit, and not handling errors in
input measurements
in a transparent way, that I was very happy to replace it with the
pkfit routine
based on Craig Markwards IDL code. I am happy now, but I waisted a lot of time
because of these leastsq issues.

Anyway - I am happy now.

Paul Kuin

Post by Gianfranco Durin
----- Original Message -----

Post by Gianfranco Durin
Dear all,
I wanted to briefly show you the results of an analysis I did on
the performances of the optimize.leastsq method for data fitting.
I presented these results at the last Python in Physics workshop.
http://emma.inrim.it:8080/gdurin/talks.
1. The main concern is about the use of cov_x to estimate the error
bar of the fitting parameters. In the docs, it is set that "this
matrix must be multiplied by the residual standard deviation to
get the covariance of the parameter estimates -- see curve_fits.""
Unfortunately, this is not correct, or better it is only partially
correct. It is correct if there are no error bars of the input
data (the sigma of curve_fit is None). But if provided, they are
used as "as weights in least-squares problem" (curve_fit doc), and
cov_x gives directly the covariance of the parameter estimates
(i.e. the diagonal terms are the errors in the parameters). See
http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of-best-fit-parameters.html.
This means that not only the doc needs fixing, but also the
curve_fit code, those estimation of the parameters' error is
INDEPENDENT of the values of the data errors in the case they are
constant, which is clearly wrong.
I have never provided a patch, but the fix should be quite simple,
just please give me indication on how to do that.

Since this depends on what you define as weight or sigma, both are
correct but use different definitions.

Of course, but this is not the point. Let me explain.
In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the jacobian. The Jacobian, unless provided directly, is calculated using the definition of the residuals, which in curve_fit method are "_general_function", and "_weighted_general_function". The latter function explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1, where W is the matrix of the weights, and J is just the matrix of the first derivatives. Thus in this case, the diagonal elements of cov_x give the variance of the parameters. No need to multiply by the residual standard deviation.
In case of W == 1, i.e. no errors in the data are provided, as reported here http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted_linear_least_squares
one uses the variance of the observations, i.e. uses the
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
as an estimate of the variance, as done in curve_fit.
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
pcov = pcov * s_sq
pcov = inf
where func can be both "_general_function" and "_weighted_general_function", is not correct.

Oh, interesting. Where did you have this discussion? On this list? I could not find it...
Here the problem is not to decide an 'average' behaviour, but to correctly calculate the parameters' error when the user does or does not provide the errors in the data.
Hope this helps
Gianfranco
_______________________________________________
SciPy-Dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev

j***@gmail.com

2011-09-26 16:39:30 UTC

Post by Paul Kuin
I am just a simple user, but perhaps this gives some idea of what us users
I got so fed up with leastsq not having good documentation, not being able to
set limits to the parameters to be fit, and not handling errors in
input measurements
in a transparent way, that I was very happy to replace it with the
pkfit routine
based on Craig Markwards IDL code. I am happy now, but I waisted a lot of time
because of these leastsq issues.
Anyway - I am happy now.

I'm a pretty happy user of scipy.optimize.leastsq because it is just a low
level optimizer that I can use for many different models and because it is
not a curve fitting function.

Josef

Post by Paul Kuin
Paul Kuin

Post by Gianfranco Durin
----- Original Message -----

http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of-best-fit-parameters.html
.

Post by Gianfranco Durin
This means that not only the doc needs fixing, but also the
curve_fit code, those estimation of the parameters' error is
INDEPENDENT of the values of the data errors in the case they are
constant, which is clearly wrong.
I have never provided a patch, but the fix should be quite simple,
just please give me indication on how to do that.

Since this depends on what you define as weight or sigma, both are
correct but use different definitions.

Of course, but this is not the point. Let me explain.
In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the

jacobian. The Jacobian, unless provided directly, is calculated using the
definition of the residuals, which in curve_fit method are
"_general_function", and "_weighted_general_function". The latter function
explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1,
where W is the matrix of the weights, and J is just the matrix of the first
derivatives. Thus in this case, the diagonal elements of cov_x give the
variance of the parameters. No need to multiply by the residual standard
deviation.

Post by Gianfranco Durin
In case of W == 1, i.e. no errors in the data are provided, as reported

here
http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted_linear_least_squares

Post by Gianfranco Durin
one uses the variance of the observations, i.e. uses the
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
as an estimate of the variance, as done in curve_fit.
BUT, we cannot multiply the cov_x obtained with the

_weighted_general_function by s_sq. As I told, we already took it into

Post by Gianfranco Durin
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
pcov = pcov * s_sq
pcov = inf
where func can be both "_general_function" and

"_weighted_general_function", is not correct.

Post by j***@gmail.com
Since we just had this discussion, I'm not arguing again. I just want
to have clear definitions that the "average" user can use by default.
I don't really care which it is if the majority of users are engineers
who can tell what their errors variances are before doing any
estimation.

Oh, interesting. Where did you have this discussion? On this list? I

could not find it...

Post by Gianfranco Durin
Here the problem is not to decide an 'average' behaviour, but to

correctly calculate the parameters' error when the user does or does not
provide the errors in the data.

Post by Gianfranco Durin
Hope this helps
Gianfranco
_______________________________________________
SciPy-Dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev

_______________________________________________
SciPy-Dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev

j***@gmail.com

2011-09-26 16:19:02 UTC

Post by Gianfranco Durin
----- Original Message -----

http://www.gnu.org/s/gsl/manual/html_node/Computing-the-covariance-matrix-of-best-fit-parameters.html
.

Since this depends on what you define as weight or sigma, both are
correct but use different definitions.

Of course, but this is not the point. Let me explain.
In our routines, the cov_x is calculated as (J^T * J) ^-1, where J is the
jacobian. The Jacobian, unless provided directly, is calculated using the
definition of the residuals, which in curve_fit method are
"_general_function", and "_weighted_general_function". The latter function
explicitly uses the weights, so the cov_x is more precisely (J^T W J) ^-1,
where W is the matrix of the weights, and J is just the matrix of the first
derivatives. Thus in this case, the diagonal elements of cov_x give the
variance of the parameters. No need to multiply by the residual standard
deviation.
In case of W == 1, i.e. no errors in the data are provided, as reported
here
http://en.wikipedia.org/wiki/Linear_least_squares_%28mathematics%29#Weighted_linear_least_squares
one uses the variance of the observations, i.e. uses the
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
as an estimate of the variance, as done in curve_fit.
BUT, we cannot multiply the cov_x obtained with the
_weighted_general_function by s_sq. As I told, we already took it into
s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
pcov = pcov * s_sq
pcov = inf
where func can be both "_general_function" and
"_weighted_general_function", is not correct.

*M* = *ó*2*I ok unit weights

M = W^(-1) your case, W has the estimates of the error covariance

M = **ó*2* **W^(-1) I think this is what curve_fit uses, and *what is in
(my) textbooks defined as weighted least squares

Do we use definition 2 or 3 by default? both are reasonable

3 is what I expected when I looked at curve_fit
2 might be more useful for two stage estimation, but I didn't have time to
check the details

Oh, interesting. Where did you have this discussion? On this list? I could not find it...

Sorry, I needed to find it

http://groups.google.com/group/scipy-user/browse_thread/thread/55497657b2f11c14?hl=en

Post by Gianfranco Durin
Here the problem is not to decide an 'average' behaviour, but to correctly
calculate the parameters' error when the user does or does not provide the
errors in the data.

correctness depends on the definitions, and definitions should be made
clearer in the documentation, but it looks like the default interpretation
(for users that don't read the small print) will depend on the background.

Josef

Post by Gianfranco Durin
Hope this helps
Gianfranco
_______________________________________________
SciPy-Dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev

Gianfranco Durin

2011-09-27 09:17:40 UTC

Post by Gianfranco Durin
where func can be both "_general_function" and
"_weighted_general_function", is not correct.
M = σ 2 I ok unit weights
M = W^(-1) your case, W has the estimates of the error covariance
M = σ 2 W^(-1) I think this is what curve_fit uses, and what is in
(my) textbooks defined as weighted least squares
Do we use definition 2 or 3 by default? both are reasonable
3 is what I expected when I looked at curve_fit
2 might be more useful for two stage estimation, but I didn't have
time to check the details

Ehmm, no, curve_fit does not use def 3, as the errors would scale with W, but they don't. By the way, it does not have the correct units.

Curve_fit calculates

M = W \sigma^2 W^(-1) = \sigma^2

so it gives exactly the same results of case 1, irrespective the W's. This is why the errors do not scale with W.

Gianfranco

j***@gmail.com

2011-09-27 14:20:07 UTC

Post by Gianfranco Durin
where func can be both "_general_function" and
"_weighted_general_function", is not correct.
M = Ï 2 I ok unit weights
M = W^(-1) your case, W has the estimates of the error covariance
M = Ï 2 W^(-1) I think this is what curve_fit uses, and what is in
(my) textbooks defined as weighted least squares
Do we use definition 2 or 3 by default? both are reasonable
3 is what I expected when I looked at curve_fit
2 might be more useful for two stage estimation, but I didn't have
time to check the details

Ehmm, no, curve_fit does not use def 3, as the errors would scale with W,
but they don't. By the way, it does not have the correct units.

http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node31.html

'''
The minimization of [image: $ \chi^{2}_{}$] above is sometimes called *weighted
least squares* in which case the inverse quantities 1/*e*2 are called the
weights. Clearly this is simply a different word for the same thing, but in
practice the use of these words sometimes means that the interpretation of *
e*2 as variances or squared errors is not straightforward. The word weight
often implies that only the relative weights are known (``point two is twice
as important as point one'') in which case there is apparently an unknown
overall normalization factor. Unfortunately the parameter errors coming out
of such a fit will be proportional to this factor, and the user must be
aware of this in the formulation of his problem.
'''
(I don't quite understand the last sentence.)

M = Ï^2 W^(-1), where Ï^2 is estimated by residual sum of squares from
weighted regression.

W only specifies relative errors, the assumption is that the covariance
matrix of the errors is *proportional* to W. The scaling is arbitrary. If
the scale of W changes, then the estimated residual sum of squares from
weighted regression will compensate for it. So, rescaling W has no effect on
the covariance of the parameter estimates.

I checked in Greene: Econometric Analysis, and briefly looked at the SAS
description. It looks like weighted least squares is always with automatic
scaling, W is defined only as relative weights.

All I seem to be able to find is weighted least squares with automatic
scaling (except for maybe some two-step estimators).

Post by Gianfranco Durin
Curve_fit calculates
M = W \sigma^2 W^(-1) = \sigma^2

If I remember correctly (calculated from the transformed model) it should
be:

the cov of the parameter estimates is s^2 (X'WX)^(-1)
error estimates should be s^2 * W

where W = diag(1/curvefit_sigma**2) unfortunate terminology for
curve_fit's sigma or intentional ? (as I mentioned in the other thread)

Josef

Post by Gianfranco Durin
so it gives exactly the same results of case 1, irrespective the W's. This
is why the errors do not scale with W.
Gianfranco
_______________________________________________
SciPy-Dev mailing list
http://mail.scipy.org/mailman/listinfo/scipy-dev

Bruce Southey

2011-09-27 16:53:52 UTC

Post by Gianfranco Durin
where func can be both "_general_function" and
"_weighted_general_function", is not correct.
M = ó 2 I ok unit weights
M = W^(-1) your case, W has the estimates of the error covariance
M = ó 2 W^(-1) I think this is what curve_fit uses, and what is in
(my) textbooks defined as weighted least squares
Do we use definition 2 or 3 by default? both are reasonable
3 is what I expected when I looked at curve_fit
2 might be more useful for two stage estimation, but I didn't have
time to check the details

Ehmm, no, curve_fit does not use def 3, as the errors would scale with W,
but they don't. By the way, it does not have the correct units.

http://wwwasdoc.web.cern.ch/wwwasdoc/minuit/node31.html
'''
The minimization of [image: $ \chi^{2}_{}$] above is sometimes called *weighted
least squares* in which case the inverse quantities 1/*e*2 are called the
weights. Clearly this is simply a different word for the same thing, but in
practice the use of these words sometimes means that the interpretation of
*e*2 as variances or squared errors is not straightforward. The word
weight often implies that only the relative weights are known (``point two
is twice as important as point one'') in which case there is apparently an
unknown overall normalization factor. Unfortunately the parameter errors
coming out of such a fit will be proportional to this factor, and the user
must be aware of this in the formulation of his problem.
'''
(I don't quite understand the last sentence.)
M = ó^2 W^(-1), where ó^2 is estimated by residual sum of squares from
weighted regression.
W only specifies relative errors, the assumption is that the covariance
matrix of the errors is *proportional* to W. The scaling is arbitrary. If
the scale of W changes, then the estimated residual sum of squares from
weighted regression will compensate for it. So, rescaling W has no effect on
the covariance of the parameter estimates.
I checked in Greene: Econometric Analysis, and briefly looked at the SAS
description. It looks like weighted least squares is always with automatic
scaling, W is defined only as relative weights.
All I seem to be able to find is weighted least squares with automatic
scaling (except for maybe some two-step estimators).

Post by Gianfranco Durin
Curve_fit calculates
M = W \sigma^2 W^(-1) = \sigma^2

If I remember correctly (calculated from the transformed model) it should
the cov of the parameter estimates is s^2 (X'WX)^(-1)
error estimates should be s^2 * W
where W = diag(1/curvefit_sigma**2) unfortunate terminology for
curve_fit's sigma or intentional ? (as I mentioned in the other thread)
Josef

Post by Gianfranco Durin
so it gives exactly the same results of case 1, irrespective the W's. This
is why the errors do not scale with W.
Gianfranco
_______________________________________________

Gianfranco,
Can you please provide some Python and R (or SAS) code to show what you
mean?

In the linear example below, I get agreement between SAS and curve_fit with
and without defining a weight. I do know that the immediate result from
leastsq does not agree with values from SAS. Thus, with my limited
knowledge, I consider that the documentation is correct.

import numpy as np
from scipy.optimize import curve_fit
from scipy.optimize import leastsq
x=np.array([200, 400, 300, 400, 200, 300, 300, 400, 200, 400, 200, 300],
dtype=float)
y=np.array([28, 75, 37, 53, 22, 58, 40, 96, 34, 52, 30, 69], dtype=float)
w=np.array([0.001168822, 0.000205547, 0.000408122, 0.000205547, 0.001168822,
0.000408122, 0.000408122, 0.000205547, 0.001168822, 0.000205547,
0.001168822, 0.000408122])
sw=1/np.sqrt(w)

def linfunc(x, a, b):
return a + b*x
popt, pcov = curve_fit(linfunc, x, y)
wopt, wcov = curve_fit(linfunc, x, y, sigma=sw)

Output:
No weighted form usage of curve_fit which matches SAS:
a= -11.25 with SE= 15.88585587
b=0.2025 with SE=0.05109428

Weighted form if I understand the difference between how weights need to be
specified so the results match SAS:
a= -12.91438 with SE= 11.09325
b=0.20831 with SE= 0.04342

Bruce

Martin Teichmann

2011-10-01 21:31:52 UTC

Hello everybody,

I just saw that there is some discussion about leastsq, and
guessed that this is a good time to add some ideas.

I have been working on an all-python fitter, which is just
the minpack fitter translated to python. This is why I was
bugging yall for the qr_multiply stuff, for those who remember.

Having the fitter in python is quite useful, because this
calling python from FORTRAN sometimes was ugly: most
importantly because exceptions often didn't find their way
through FORTRAN.

I added some new stuff as well: if the fitted function realizes
that it's parameters are running away, it may raise an
InvalidParameter exception, and the fitting algorithm will
choose a smaller step size on this parameter.

You can find the code at https://github.com/scipy/scipy/pull/88

Greetings

Martin

Gianfranco Durin

2011-10-03 15:06:11 UTC

<snip>

The minimization of $ \chi^{2}_{}$above is sometimes called weighted
least squares in which case the inverse quantities 1/ e 2 are called
the weights. Clearly this is simply a different word for the same
thing, but in practice the use of these words sometimes means that
the interpretation of e 2 as variances or squared errors is not
straightforward. The word weight often implies that only the
relative weights are known (``point two is twice as important as
point one'') in which case there is apparently an unknown overall
normalization factor. Unfortunately the parameter errors coming out
of such a fit will be proportional to this factor, and the user must
be aware of this in the formulation of his problem.
'''
(I don't quite understand the last sentence.)
M = σ^2 W^(-1), where σ^2 is estimated by residual sum of squares
from weighted regression.
W only specifies relative errors, the assumption is that the
covariance matrix of the errors is *proportional* to W. The scaling
is arbitrary. If the scale of W changes, then the estimated residual
sum of squares from weighted regression will compensate for it. So,
rescaling W has no effect on the covariance of the parameter
estimates.
I checked in Greene: Econometric Analysis, and briefly looked at the
SAS description. It looks like weighted least squares is always with
automatic scaling, W is defined only as relative weights.
All I seem to be able to find is weighted least squares with
automatic scaling (except for maybe some two-step estimators).
Curve_fit calculates
M = W \sigma^2 W^(-1) = \sigma^2
If I remember correctly (calculated from the transformed model) it
the cov of the parameter estimates is s^2 (X'WX)^(-1)
error estimates should be s^2 * W
where W = diag(1/curvefit_sigma**2) unfortunate terminology for
curve_fit's sigma or intentional ? (as I mentioned in the other
thread)
Josef
_______________________________________________
Gianfranco,
Can you please provide some Python and R (or SAS) code to show what
you mean?
...
Bruce

Bruce, Josef and the others,
after reading a few books, searching over many website, and tried many different software packages (and, with the help of many colleagues), I came to a conclusion which should fix the question:

Weights and data errors (or uncertainties) CAN be different concepts, as written above. It depends on the user...
In particular:
1. If he/she thinks the sigma JUST as weights and NOT as standard-deviation of ydata, cov_x MUST be multiplied by the residual standard deviation
2. If the user thinks the sigma as standard-deviation of ydata (i.e. measurement errors), which are by the way ALSO good to weight the data themself, then cov_x DOES NOT NEED to be multiplied by the residual standard deviation.

I tried a few packages, and found they assume one of the two options by default without 'asking' (or making aware of) the user. In particular (check using the very same data and errors):
Option 1: SAS, R, gnuplot, octave...
Option 2: Profit, Origin, ...

And mathematica? In the HOWTO: Fit Models with Measurement Errors (see below), mathematica makes the difference between weights and measurement errors, so the user can decide how to use his/her sigma.

I think we should make this distinction explicit also in our curve_fit, and report it on the leastsq doc.

Gianfranco

============================================================
from Mathematica:
"Particularly in the physical sciences,it is common to use measurement errors as weights to incorporate measured variation into the fitting. Weights have a relative effect on the parameter estimates, but an error variance still needs to be estimated in weighted regression, and this impacts error estimates for results."

1. When using Weights alone, the variance scale is estimated using the default method [i.e. our s_sq]. Error estimates will depend on both the weights and the estimated variance scale. However, if the weights are from measurement errors, you would want error estimates to depend solely on the weights.
It is important to note that weights do not change the fitting or error estimates. For example, multiplying all weights by a constant increases the estimated variance,but does not change the parameter estimates or standard errors. (Ps. This is what I meant saying that the parameters' errors
....

2. For measurements errors, you want standard errors to be computed only from the weights.... While weights have an impact on parameter estimates, the variance estimate itself does not."

Bruce Southey

2011-10-03 15:51:19 UTC