Tags: error, matlab, programming, regression, regstatsfunction, returning, root, squared, value, working

Root Mean Squared Error

On Programmer » Matlab

6,370 words with 5 Comments; publish: Sat, 26 Apr 2008 22:54:00 GMT; (20046.88, « »)

I am working on a regression problem and am using the regstats

function. I am returning the mean squared error value of my result

(and taking the square root) and comparing it to my own calculation

of the RMS error. I am finding a difference between the two. The

matlab mse is dividing the squared error by (n - var - 1) where n is

the number of data points and var is the number of variables. My

question is why does matlab subtract the number of variables from the

number of data points? Thanks.

All Comments

Leave a comment...

  • 5 Comments
    • In article <ef56432.-1.matlab.todaysummary.com.webcrossing.raydaftYaTP>,

      John Sheldon <john.sheldon.matlab.todaysummary.com.gmail.com> wrote:

      >I am working on a regression problem and am using the regstats

      >function. I am returning the mean squared error value of my result

      >(and taking the square root) and comparing it to my own calculation

      >of the RMS error. I am finding a difference between the two. The

      >matlab mse is dividing the squared error by (n - var - 1) where n is

      >the number of data points and var is the number of variables. My

      >question is why does matlab subtract the number of variables from the

      >number of data points? Thanks.

      In my (limited) experience, when the number of variables is subtracted out,

      it usually has to do with "degrees of freedom".

      --

      I was very young in those days, but I was also rather dim.

      -- Christopher Priest

      #1; Sat, 26 Apr 2008 22:56:00 GMT
    • John Sheldon wrote:

      > I am working on a regression problem and am using the regstats

      > function. I am returning the mean squared error value of my result

      > (and taking the square root) and comparing it to my own calculation

      > of the RMS error. I am finding a difference between the two. The

      > matlab mse is dividing the squared error by (n - var - 1) where n is

      > the number of data points and var is the number of variables. My

      > question is why does matlab subtract the number of variables from the

      > number of data points? Thanks.

      John, that's the standard definition for the MSE as an estimator of the

      population variance sigma^2. If it helps, imagine what would it be if you h

      ad

      no vars, only a constant -- the usual (unbiased) estimator of the variance f

      or a

      normal distribution. Hope this helps.

      - Peter Perkins

      The MathWorks, Inc.

      #2; Sat, 26 Apr 2008 22:57:00 GMT
    • On May 8, 4:13 pm, "John Sheldon" <john.shel....matlab.todaysummary.com.gmail.com> wrote:

      > I am working on aregressionproblem and am using the regstats

      > function. I am returning the mean squared error value of my result

      > (and taking the square root) and comparing it to my own calculation

      > of the RMS error. I am finding a difference between the two. The

      > matlab mse is dividing the squared error by (n - var - 1) where n is

      > the number of data points and var is the number of variables. My

      > question is why does matlab subtract the number of variables from the

      > number of data points? Thanks.

      If you estimate the MSE of a model with p+1 parameters using n

      observations that were not used to estimate the parameters, then

      MSE = SSE/n

      yields an unbiased estimate.

      However, that formula will yield a biased estimate if those n

      observations were used to estimate the p+1 parameters.

      In the latter case the number of independent observations

      is n-(p+1) and the formula for an unbiased estimate is

      MSE = SSE/(n-p-1).

      In the simplest case where the model is just the mean value,

      p = 0 and the MSE is just the unbiased estimate of the

      sample variance with n-1 in the denominator.

      It is assumed that n > p+1 and the system of n equations

      for p+1 variables is overdetermined.

      If n = p+1 then the system of n equations for n variables

      should have an exact solution. Therefore SSE = 0,

      the ratio SSE/(n-p-1) is indeterminate (0/0), and MSE

      is undefined.

      Hope this helps.

      Greg

      #3; Sat, 26 Apr 2008 22:58:00 GMT
    • On May 9, 8:56 am, Greg Heath <h....matlab.todaysummary.com.alumni.brown.edu> wrote:

      > On May 8, 4:13 pm, "John Sheldon" <john.shel....matlab.todaysummary.com.gmail.com> wrote:

      >

      > If you estimate the MSE of a model with p+1 parameters using n

      > observations that were not used to estimate the parameters, then

      > MSE = SSE/n

      > yields an unbiased estimate.

      > However, that formula will yield a biased estimate if those n

      > observations were used to estimate the p+1 parameters.

      > In the latter case the number of independent observations

      > is n-(p+1) and the formula for an unbiased estimate is

      > MSE = SSE/(n-p-1).

      > In the simplest case where the model is just the mean value,

      > p = 0 and the MSE is just the unbiased estimate of the

      > sample variance with n-1 in the denominator.

      > It is assumed that n > p+1 and the system of n equations

      > for p+1 variables is overdetermined.

      > If n = p+1 then the system of n equations for n variables

      > should have an exact solution. Therefore SSE = 0,

      > the ratio SSE/(n-p-1) is indeterminate (0/0), and MSE

      > is undefined.

      If n = p+1 then the system of n equations for n variables

      should have an exact solution. Therefore SSE = 0,

      and the biased estimate SSE/n = 0. However, the ratio

      SSE/(n-p-1) is indeterminate (0/0), and the unbiased

      estimate for MSE is undefined.

      Hope this helps.

      Greg

      #4; Sat, 26 Apr 2008 22:59:00 GMT
    • Greg Heath wrote:

      >

      > On May 9, 8:56 am, Greg Heath <h....matlab.todaysummary.com.alumni.brown.edu> wrote:

      <john.shel....matlab.todaysummary.com.gmail.com> wrote:

      regstats

      my

      > result

      > calculation

      two.

      > The

      where

      > n is

      variables.

      > My

      variables

      > from the

      then

      > If n = p+1 then the system of n equations for n variables

      > should have an exact solution. Therefore SSE = 0,

      > and the biased estimate SSE/n = 0. However, the ratio

      > SSE/(n-p-1) is indeterminate (0/0), and the unbiased

      > estimate for MSE is undefined.

      > Hope this helps.

      > Greg

      >

      Thanks Greg,

      Your explanation was very clear and helpful. I thought it was

      something simple, I just couldn't put my finger on it at the time.

      Cheers,

      John

      #5; Sat, 26 Apr 2008 23:00:00 GMT