Multivariate Regression in Calc

Recently I added the support for doing multivariate regression in LibreOffice Calc via its regression tool along with other related improvements. Now the regression analysis output includes a lot more statistical measures including confidence interval for all estimated parameters.

The improved regression tool in Calc can be accessed via Data > Statistics > Regression
Here is a screenshot of the regression dialog box showing the new features highlighted with red boxes.

regression-calc

Lets go through each of these new features –

A.  Now you can enter a single range that contains multiple X variable observations (along columns or rows). In this example the data-set contains three X variables namely My_X1, My_X2 and My_X3 and the dependent variable My_Y. To do a regression on this data, enter the range A1:C11 to the “Independent variable(s) (X) range” box and the range D1:D11 to the “Dependent variable (Y) range” box as shown above in the figure. With this data we want to estimate the slopes and intercept of a linear function relating the Y and X variables given by :-

My_Y  =  Slope_1 * My_X1   +   Slope_2 * My_X2   +   Slope_3 * My_X3    +    Intercept

B.  The user can now let the regression tool know that the data ranges for X and Y have text labels (or variable names) in them.

C.  In earlier versions, the regression tool only computed the slope and intercepts, so there was no way of knowing how uncertain these numbers are. Now the users can specify confidence level as a percentage and the tool will compute the corresponding confidence intervals for each of the estimates (namely the slopes and intercept).

D.  Finally now there is a way to opt out of computing the residuals for all observations. This is beneficial to someone who is only interested in the slopes and intercept estimates and their statistics.

Following screenshot shows the output of the regression tool.

regression-output

So, for our toy dataset the estimates for Slope_1 = 0.0075 (Cell B42), Slope_2 = 0.0343 (Cell B43), Slope_3 = 1.0663 (Cell B44) and the Intercept = 101.4513 (Cell B41). The 95% confidence interval for these estimates are in the columns F and G. For example Intercept’s 95% confidence interval is [98.6231, 104.2795].

Note that the above is the result of a linear regression. We can do logarithmic and power regression with multiple X variables as well in the same way.

These new features can be seen in the current development build of LibreOffice Calc and will be available in the 6.2 release. The source code patch for these features can be seen at https://bit.ly/2KI3aVk

I would like to thank Collabora Productivity for their continuous support and encouragement to work on this feature.

Advertisements

Dart polymer datatable widget

I have created a polymer widget that draws a datatable from a data source url. The code is at https://github.com/dennisfrancis/polymer-dart-datatable-example

Here is the template of the polymer widget

<polymer-element name="dennis-datatable" attributes="dataSrc,jsonArgs,refreshInterval,method">
 
  <template>
  <link rel="stylesheet" href="dennis_datatable.css">

    <div>
      <button id="refresh_button" on-click="{{fetch_data}}">Refresh</button>
      <table class="bordered" id="datatable_table">
        <tr>
          <th template repeat="{{col in cols}}">
              {{col}}  
          </th>
        </tr>
        <tr template repeat="{{row in rows}}">

            <td template repeat="{{col in cols}}">
              {{row[col]}}  
            </td>
        </tr>
      </table>
    </div>
  </template>
  <script type="application/dart" src="dennis_datatable.dart"></script>
</polymer-element>

And here is the dart code of the widget that also prints the rendering latency to the Dartium console.

import 'package:polymer/polymer.dart';
import 'dart:async';
import 'dart:html';
import 'dart:convert';

/**
 * A Polymer datatable element.
 */
@CustomTag('dennis-datatable')
class DataTable extends PolymerElement {
  @published String dataSrc;
  @published String jsonArgs = "{}";  // Sorting can be done on server side by passing something like sortcol:"colname"
  @published int refreshInterval = 0;
  @published String method = "GET";
  
  @observable int numrows = 0;
  @observable int numcols = 0;
  @observable List<String> cols;
  @observable List<Map<String,dynamic>> rows;
  
  ButtonElement refreshButton;
  bool doing_refresh = false;
  
  Timer fetch_timer;
  
  String _json_response = "";
  
  MutationObserver observer;
  
  DateTime _start_render;
  DateTime _stop_render;

 
  DataTable.created() : super.created() {
    
    observer = new MutationObserver(_onMutation);
    observer.observe(getShadowRoot('dennis-datatable').querySelector('#datatable_table'), childList: true, subtree: true);

    refreshButton = this.shadowRoot.querySelector("#refresh_button");
    
    fetch_data();
    
    if (refreshInterval != 0) {
      fetch_timer = new Timer.periodic(new Duration(seconds : refreshInterval), fetch_data);
    }
  }
  
  void _onMutation(List<MutationRecord> mutations, MutationObserver observer) {
    print('${mutations.length} mutations occurred, the first to ${mutations[0].target}');
    _prn_stat();
  }
  
  void fetch_data([Timer _]) {

    if (doing_refresh) { return; }
    
    refreshButton.disabled = true;
    
    doing_refresh = true;

 
    if (dataSrc.isEmpty) { return; }

    HttpRequest request = new HttpRequest(); // create a new XHR
    
    // add an event handler that is called when the request finishes
    request.onReadyStateChange.listen((_) {
      if (request.readyState == HttpRequest.DONE &&
          (request.status == 200 || request.status == 0)) {
        // data saved OK.
        //print(request.responseText); // output the response from the server
        if (_json_response != request.responseText) {  // Avoid json parsing and rendering if same data
          update_data(request.responseText);
          _json_response = request.responseText;
        } else {
          print("No new data");
        }
        
      }
      doing_refresh = false;
      refreshButton.disabled = false;
    });

    // POST the data to the server
    request.open(method, dataSrc);

    if(method == "POST") {
      request.send(jsonArgs); // perform the async POST
    } else {
      request.send();
    }
    
  }
    
  void update_data(String jsonString) {
    
    Map data = JSON.decode(jsonString);
    
    numrows = data["numrows"];
    numcols = data["numcols"];
    cols    = data["cols"];
    rows    = data["rows"];
    
    _start_render = new DateTime.now();
    
  }
  
  void _prn_stat() {

    if (_start_render == null) { return; }
    
    _stop_render = new DateTime.now();
    int diff = _stop_render.millisecondsSinceEpoch - _start_render.millisecondsSinceEpoch;
    
    print("render time = $diff ms");
    
  }
  
}

The sad part about this widget is that the rendering time is about 20 times worse than what it would take a raw dart program to generate a table with the same data.
See my previous post about rendering latency without using polymer widgets.

I decided to move the table rendering logic from templates to a dart function, until polymer template rendering speed improves. I created a new git branch called min_templates for the same project at github.

Now the rendering latency is competitive to that of the non-polymer dart table benchmark

 

Table rendering performance with dart, dart2js and handwritten js

The data for creating the table is a matrix of 4000 rows x 10 cols of random strings.

I tested the table rendering performance for the following scenarios with the same data

  1. Dart code generating the table and run in Dartium (Chromium browser with Dart VM built-in) with checked mode disabled
  2. Ran dart2js on the same dart code and ran it on Chromium
  3. Hand-written JS program with no table libraries
  4. Hand-written JS program with Google table library.
I have posted the code for all these tests at github. [https://github.com/dennisfrancis/table-rendering-benchmark]

Test pseudocode


fetch_test_data()
run_table_render()
run_table_render()
start_time = now()
loop 20 times {
cleanup_table_html()
run_table_render()
}
stop_time = now()
avg_time = (stop_time - start_time)/20
report avg_time in milliseconds

Software versions
Dart Editor version 1.0.0_r30798 (STABLE)
Dart SDK version 1.0.0.10_r30798

Dartium Version 31.0.1650.39 (1593)
Chromium Version 27.0.1453.93 Built from source for Fedora release 19 (Schrödinger’s Cat) (200836)

Results

Table rendering performance (Smaller is better)