Multivariate Regression in Calc

Recently I added the support for doing multivariate regression in LibreOffice Calc via its regression tool along with other related improvements. Now the regression analysis output includes a lot more statistical measures including confidence interval for all estimated parameters.

The improved regression tool in Calc can be accessed via Data > Statistics > Regression
Here is a screenshot of the regression dialog box showing the new features highlighted with red boxes.

regression-calc

Lets go through each of these new features –

A.  Now you can enter a single range that contains multiple X variable observations (along columns or rows). In this example the data-set contains three X variables namely My_X1, My_X2 and My_X3 and the dependent variable My_Y. To do a regression on this data, enter the range A1:C11 to the “Independent variable(s) (X) range” box and the range D1:D11 to the “Dependent variable (Y) range” box as shown above in the figure. With this data we want to estimate the slopes and intercept of a linear function relating the Y and X variables given by :-

My_Y  =  Slope_1 * My_X1   +   Slope_2 * My_X2   +   Slope_3 * My_X3    +    Intercept

B.  The user can now let the regression tool know that the data ranges for X and Y have text labels (or variable names) in them.

C.  In earlier versions, the regression tool only computed the slope and intercepts, so there was no way of knowing how uncertain these numbers are. Now the users can specify confidence level as a percentage and the tool will compute the corresponding confidence intervals for each of the estimates (namely the slopes and intercept).

D.  Finally now there is a way to opt out of computing the residuals for all observations. This is beneficial to someone who is only interested in the slopes and intercept estimates and their statistics.

Following screenshot shows the output of the regression tool.

regression-output

So, for our toy dataset the estimates for Slope_1 = 0.0075 (Cell B42), Slope_2 = 0.0343 (Cell B43), Slope_3 = 1.0663 (Cell B44) and the Intercept = 101.4513 (Cell B41). The 95% confidence interval for these estimates are in the columns F and G. For example Intercept’s 95% confidence interval is [98.6231, 104.2795].

Note that the above is the result of a linear regression. We can do logarithmic and power regression with multiple X variables as well in the same way.

These new features can be seen in the current development build of LibreOffice Calc and will be available in the 6.2 release. The source code patch for these features can be seen at https://bit.ly/2KI3aVk

I would like to thank Collabora Productivity for their continuous support and encouragement to work on this feature.

Advertisements