Discrete Fourier Transform in Calc

Recently I implemented FOURIER() formula for LibreOffice Calc that computes Discrete Fourier Transform [DFT] of a real/complex data sequence. Computation is done using a couple of Fast Fourier Transform algorithms (all implemented from scratch). I’d like to thank Collabora Productivity for a fully funded hack week and lots of encouragement that enabled me to work on this feature.

The syntax of the formula is :

FOURIER(Array, GroupedByColumns, Inverse, Polar, MinimumMagnitude)

First and second arguments : First argument is the data array range. There are some restrictions on its shape. If the array is grouped by columns(rows), you need to indicate that by setting the second argument GroupedByColumns = TRUE(FALSE). In case of grouped by columns(rows) the “Array” can contain 1 or 2 columns(rows), where the first column(row) should contain the real part of the input sequence and second column(row) if present should contain the imaginary part of the input sequence. If there is only 1 column(row), the input sequence is assumed to be purely real. There is no restriction on the length of the input sequence. For example the length need not be an exact power of 2 like MS Excel’s Fourier analysis tool requires. The DFT computed by FOURIER formula is exact for the given length of the input sequence meaning it does not pad zeroes to the end of the input sequence to make the length an even power of 2.

The third argument “Inverse” is a boolean flag to indicate whether an inverse DFT needs to be computed. This argument is optional and the default value is FALSE.

The fourth argument “Polar” is a boolean flag to indicate whether the final output needs to be in polar coordinates. This argument is optional and the default value is FALSE.

The fifth argument “MinimumMagnitude” is a numeric argument and is used only if Polar=TRUE. All output components(frequency components if Inverse=FALSE) with magnitude less than MinimumMagnitude will be suppressed by a zero magnitude-phase entry. This is very useful when looking at the magnitude-phase spectrum of a signal (especially when its spectrum is sparse) because there is always some very tiny amount of rounding error when doing FFT algorithms and results in incorrect non-zero phase for some non-existent frequencies. By providing a suitable value to this parameter, these non-existent frequency components can be suppressed. The default value of this parameter is 0.0, so no suppression is done by default.

The result of this formula consists of two columns – first column contains the real parts (or the magnitudes if Polar=TRUE) and second column contains the imaginary parts (or the phases if Polar=TRUE).

Below is a screenshot of a sample usage of FOURIER() formula.

fourier-FM-trimmed

The data x[n] column was generated by the formula “=10*COS(2*PI()*COS(2*PI()*A2:A257/128))” (a typical frequency modulation example). Here a Phase-Magnitude spectrum is computed using FOURIER formula

=FOURIER(B2:B257,1,0,1, 1E-10)

Note that the Polar argument is set to 1, and MinimumMagnitude is set to 1E-10, to suppress the non-existent frequency components due to rounding errors. The plots here just visualize the data. The top plot shows the time-domain waveform and the next two plots represent the magnitude and phase spectra.

Fourier Analysis Tool

A Fourier analysis tool is also added to Statistics menu. All features of FOURIER formula are available graphically in this tool. Here is a screenshot of Fourier Analysis tool in action.

fourier-analysis-tool

Implementation details of FOURIER formula

Basically two different Fast Fourier Transform (FFT) algorithms are implemented.

  1. Radix-2 Decimation in Time FFT :    This  is used when the length of input sequence is an even power of 2.
  2. Bluestein’s algorithm : This algorithm is used when the length of data sequence is not an even power of 2.

Both algorithms have asymptotic time complexity of O(N lgN), but Bluestein’s algorithm is much slower in practice. However many optimizations are in place to make the computation faster, especially when the input sequence is purely real.

Below are the patches that shows the evolution of the implementation of FOURIER formula over time.

  1. tdf#74664 : Adds FOURIER() formula
  2. tdf#74664 : Compute the phase correctly using atan2
  3. tdf#74664 : optimize the computation of twiddle factors.
  4. tdf#74664 : Unit test for FOURIER formula
  5. FOURIER : use Bluestein’s algorithm for N != 2^k
  6. More test cases for FOURIER formula
  7. tdf#74664: FOURIER: add 5th optional parameter MinimumMagnitude

Performance

Below are some perf measurements for various input cases. These numbers are only for the DFT computation part and does not include the time for writing data to the spreadsheet model. We don’t use multi-threading yet for FOURIER, so these are numbers for just one cpu-core.

Case#1 :  Data length is a power of 2

Real Input length Time (ms)
32768 2
65536 4
262144 21
1048576 185
Complex Input length Time (ms)
32768 2
65536 5
262144 30
1048576 400

Case#2 : Data length is even but not a power of 2

Real Input length Time (ms)
32766 8
65534 20
262142 105
1048574 1440
Complex Input length Time (ms)
32766 15
65534 36
262142 313
1048574 3332

Case#3 : Data length is odd but not a power of 2

Real Input length Time (ms)
32767 16
65535 39
262143 313
1048575 2729
Complex Input length Time (ms)
32767 16
65535 38
262143 309
1048575 2703

 

 

Multivariate Regression in Calc

Recently I added the support for doing multivariate regression in LibreOffice Calc via its regression tool along with other related improvements. Now the regression analysis output includes a lot more statistical measures including confidence interval for all estimated parameters.

The improved regression tool in Calc can be accessed via Data > Statistics > Regression
Here is a screenshot of the regression dialog box showing the new features highlighted with red boxes.

regression-calc

Lets go through each of these new features –

A.  Now you can enter a single range that contains multiple X variable observations (along columns or rows). In this example the data-set contains three X variables namely My_X1, My_X2 and My_X3 and the dependent variable My_Y. To do a regression on this data, enter the range A1:C11 to the “Independent variable(s) (X) range” box and the range D1:D11 to the “Dependent variable (Y) range” box as shown above in the figure. With this data we want to estimate the slopes and intercept of a linear function relating the Y and X variables given by :-

My_Y  =  Slope_1 * My_X1   +   Slope_2 * My_X2   +   Slope_3 * My_X3    +    Intercept

B.  The user can now let the regression tool know that the data ranges for X and Y have text labels (or variable names) in them.

C.  In earlier versions, the regression tool only computed the slope and intercepts, so there was no way of knowing how uncertain these numbers are. Now the users can specify confidence level as a percentage and the tool will compute the corresponding confidence intervals for each of the estimates (namely the slopes and intercept).

D.  Finally now there is a way to opt out of computing the residuals for all observations. This is beneficial to someone who is only interested in the slopes and intercept estimates and their statistics.

Following screenshot shows the output of the regression tool.

regression-output

So, for our toy dataset the estimates for Slope_1 = 0.0075 (Cell B42), Slope_2 = 0.0343 (Cell B43), Slope_3 = 1.0663 (Cell B44) and the Intercept = 101.4513 (Cell B41). The 95% confidence interval for these estimates are in the columns F and G. For example Intercept’s 95% confidence interval is [98.6231, 104.2795].

Note that the above is the result of a linear regression. We can do logarithmic and power regression with multiple X variables as well in the same way.

These new features can be seen in the current development build of LibreOffice Calc and will be available in the 6.2 release. The source code patch for these features can be seen at https://bit.ly/2KI3aVk

I would like to thank Collabora Productivity for their continuous support and encouragement to work on this feature.

Dart polymer datatable widget

I have created a polymer widget that draws a datatable from a data source url. The code is at https://github.com/dennisfrancis/polymer-dart-datatable-example

Here is the template of the polymer widget

<polymer-element name="dennis-datatable" attributes="dataSrc,jsonArgs,refreshInterval,method">
 
  <template>
  <link rel="stylesheet" href="dennis_datatable.css">

    <div>
      <button id="refresh_button" on-click="{{fetch_data}}">Refresh</button>
      <table class="bordered" id="datatable_table">
        <tr>
          <th template repeat="{{col in cols}}">
              {{col}}  
          </th>
        </tr>
        <tr template repeat="{{row in rows}}">

            <td template repeat="{{col in cols}}">
              {{row[col]}}  
            </td>
        </tr>
      </table>
    </div>
  </template>
  <script type="application/dart" src="dennis_datatable.dart"></script>
</polymer-element>

And here is the dart code of the widget that also prints the rendering latency to the Dartium console.

import 'package:polymer/polymer.dart';
import 'dart:async';
import 'dart:html';
import 'dart:convert';

/**
 * A Polymer datatable element.
 */
@CustomTag('dennis-datatable')
class DataTable extends PolymerElement {
  @published String dataSrc;
  @published String jsonArgs = "{}";  // Sorting can be done on server side by passing something like sortcol:"colname"
  @published int refreshInterval = 0;
  @published String method = "GET";
  
  @observable int numrows = 0;
  @observable int numcols = 0;
  @observable List<String> cols;
  @observable List<Map<String,dynamic>> rows;
  
  ButtonElement refreshButton;
  bool doing_refresh = false;
  
  Timer fetch_timer;
  
  String _json_response = "";
  
  MutationObserver observer;
  
  DateTime _start_render;
  DateTime _stop_render;

 
  DataTable.created() : super.created() {
    
    observer = new MutationObserver(_onMutation);
    observer.observe(getShadowRoot('dennis-datatable').querySelector('#datatable_table'), childList: true, subtree: true);

    refreshButton = this.shadowRoot.querySelector("#refresh_button");
    
    fetch_data();
    
    if (refreshInterval != 0) {
      fetch_timer = new Timer.periodic(new Duration(seconds : refreshInterval), fetch_data);
    }
  }
  
  void _onMutation(List<MutationRecord> mutations, MutationObserver observer) {
    print('${mutations.length} mutations occurred, the first to ${mutations[0].target}');
    _prn_stat();
  }
  
  void fetch_data([Timer _]) {

    if (doing_refresh) { return; }
    
    refreshButton.disabled = true;
    
    doing_refresh = true;

 
    if (dataSrc.isEmpty) { return; }

    HttpRequest request = new HttpRequest(); // create a new XHR
    
    // add an event handler that is called when the request finishes
    request.onReadyStateChange.listen((_) {
      if (request.readyState == HttpRequest.DONE &&
          (request.status == 200 || request.status == 0)) {
        // data saved OK.
        //print(request.responseText); // output the response from the server
        if (_json_response != request.responseText) {  // Avoid json parsing and rendering if same data
          update_data(request.responseText);
          _json_response = request.responseText;
        } else {
          print("No new data");
        }
        
      }
      doing_refresh = false;
      refreshButton.disabled = false;
    });

    // POST the data to the server
    request.open(method, dataSrc);

    if(method == "POST") {
      request.send(jsonArgs); // perform the async POST
    } else {
      request.send();
    }
    
  }
    
  void update_data(String jsonString) {
    
    Map data = JSON.decode(jsonString);
    
    numrows = data["numrows"];
    numcols = data["numcols"];
    cols    = data["cols"];
    rows    = data["rows"];
    
    _start_render = new DateTime.now();
    
  }
  
  void _prn_stat() {

    if (_start_render == null) { return; }
    
    _stop_render = new DateTime.now();
    int diff = _stop_render.millisecondsSinceEpoch - _start_render.millisecondsSinceEpoch;
    
    print("render time = $diff ms");
    
  }
  
}

The sad part about this widget is that the rendering time is about 20 times worse than what it would take a raw dart program to generate a table with the same data.
See my previous post about rendering latency without using polymer widgets.

I decided to move the table rendering logic from templates to a dart function, until polymer template rendering speed improves. I created a new git branch called min_templates for the same project at github.

Now the rendering latency is competitive to that of the non-polymer dart table benchmark

 

Table rendering performance with dart, dart2js and handwritten js

The data for creating the table is a matrix of 4000 rows x 10 cols of random strings.

I tested the table rendering performance for the following scenarios with the same data

  1. Dart code generating the table and run in Dartium (Chromium browser with Dart VM built-in) with checked mode disabled
  2. Ran dart2js on the same dart code and ran it on Chromium
  3. Hand-written JS program with no table libraries
  4. Hand-written JS program with Google table library.
I have posted the code for all these tests at github. [https://github.com/dennisfrancis/table-rendering-benchmark]

Test pseudocode


fetch_test_data()
run_table_render()
run_table_render()
start_time = now()
loop 20 times {
cleanup_table_html()
run_table_render()
}
stop_time = now()
avg_time = (stop_time - start_time)/20
report avg_time in milliseconds

Software versions
Dart Editor version 1.0.0_r30798 (STABLE)
Dart SDK version 1.0.0.10_r30798

Dartium Version 31.0.1650.39 (1593)
Chromium Version 27.0.1453.93 Built from source for Fedora release 19 (Schrödinger’s Cat) (200836)

Results

Table rendering performance (Smaller is better)