Mocha tests in Collabora Online

We have many data structure classes in Collabora Online. Most of them are complex enough to warrant unit tests for their correctness. We already have cypress tests in Online which does integration testing, but we need unit tests for the internal data structure classes too. Mocha testing framework is a good match for this. With this we can do asserts via its BDD interface. In fact we could use any assertion library like Chai or Expect.js. You can now add Mocha tests in Online to test its any existing typescript classes or functions.

Here is a sample mocha test written for CPointSet typescript class:

/// <reference path="../src/layer/vector/CPoint.ts" />
/// <reference path="../src/layer/vector/CPointSet.ts" />

var assert = require('assert');

describe('CPointSet empty() tests', function () {

	describe('new CPointSet()', function () {
		it('should be empty', function () {
			assert.ok((new
                           CPointSet()).empty());
		});
	});
});

How to add a new Mocha test to Online?

Lets write a new test for the function CBounds.parse() as an example:

  • Create a new test file for tests related to CBounds class under loleaflet/mocha_tests/. Lets name it CBounds.test.ts
  • Reference the needed types from CBounds.ts by adding these lines.
/// <reference path="../src/layer/vector/CPoint.ts" />
/// <reference path="../src/layer/vector/CBounds.ts" />
  • Import Nodejs assert.
var assert = require('assert').strict;
  • Add a BDD style description of the set of tests we are going to write for CBounds.parse().
describe('CBounds parse() tests', function () {
    // Write unit tests here.
});
  • Inside that add a few test cases with descriptions like below:
describe('CBounds.parse() call with an empty string argument',
    function () {
	it('should return undefined', function () {
		assert.equal(CBounds.parse(''), undefined);
	});
    });
  • Now run ‘make check’ inside the loleaflet/ dir to see these tests in action. In current master branch you should see something like below along with the results for the new test you have added.

If interested, do take a look at the commit 8b2eae423cf77e7c725843061898c3968a72c547 that integrated Mocha framework to Online.

Discrete Fourier Transform in Calc

Recently I implemented FOURIER() formula for LibreOffice Calc that computes Discrete Fourier Transform [DFT] of a real/complex data sequence. Computation is done using a couple of Fast Fourier Transform algorithms (all implemented from scratch). I’d like to thank Collabora Productivity for a fully funded hack week and lots of encouragement that enabled me to work on this feature.

The syntax of the formula is :

FOURIER(Array, GroupedByColumns, Inverse, Polar, MinimumMagnitude)

First and second arguments : First argument is the data array range. There are some restrictions on its shape. If the array is grouped by columns(rows), you need to indicate that by setting the second argument GroupedByColumns = TRUE(FALSE). In case of grouped by columns(rows) the “Array” can contain 1 or 2 columns(rows), where the first column(row) should contain the real part of the input sequence and second column(row) if present should contain the imaginary part of the input sequence. If there is only 1 column(row), the input sequence is assumed to be purely real. There is no restriction on the length of the input sequence. For example the length need not be an exact power of 2 like MS Excel’s Fourier analysis tool requires. The DFT computed by FOURIER formula is exact for the given length of the input sequence meaning it does not pad zeroes to the end of the input sequence to make the length an even power of 2.

The third argument “Inverse” is a boolean flag to indicate whether an inverse DFT needs to be computed. This argument is optional and the default value is FALSE.

The fourth argument “Polar” is a boolean flag to indicate whether the final output needs to be in polar coordinates. This argument is optional and the default value is FALSE.

The fifth argument “MinimumMagnitude” is a numeric argument and is used only if Polar=TRUE. All output components(frequency components if Inverse=FALSE) with magnitude less than MinimumMagnitude will be suppressed by a zero magnitude-phase entry. This is very useful when looking at the magnitude-phase spectrum of a signal (especially when its spectrum is sparse) because there is always some very tiny amount of rounding error when doing FFT algorithms and results in incorrect non-zero phase for some non-existent frequencies. By providing a suitable value to this parameter, these non-existent frequency components can be suppressed. The default value of this parameter is 0.0, so no suppression is done by default.

The result of this formula consists of two columns – first column contains the real parts (or the magnitudes if Polar=TRUE) and second column contains the imaginary parts (or the phases if Polar=TRUE).

Below is a screenshot of a sample usage of FOURIER() formula.

fourier-FM-trimmed

The data x[n] column was generated by the formula “=10*COS(2*PI()*COS(2*PI()*A2:A257/128))” (a typical frequency modulation example). Here a Phase-Magnitude spectrum is computed using FOURIER formula

=FOURIER(B2:B257,1,0,1, 1E-10)

Note that the Polar argument is set to 1, and MinimumMagnitude is set to 1E-10, to suppress the non-existent frequency components due to rounding errors. The plots here just visualize the data. The top plot shows the time-domain waveform and the next two plots represent the magnitude and phase spectra.

Fourier Analysis Tool

A Fourier analysis tool is also added to Statistics menu. All features of FOURIER formula are available graphically in this tool. Here is a screenshot of Fourier Analysis tool in action.

fourier-analysis-tool

Implementation details of FOURIER formula

Basically two different Fast Fourier Transform (FFT) algorithms are implemented.

  1. Radix-2 Decimation in Time FFT :    This  is used when the length of input sequence is an even power of 2.
  2. Bluestein’s algorithm : This algorithm is used when the length of data sequence is not an even power of 2.

Both algorithms have asymptotic time complexity of O(N lgN), but Bluestein’s algorithm is much slower in practice. However many optimizations are in place to make the computation faster, especially when the input sequence is purely real.

Below are the patches that shows the evolution of the implementation of FOURIER formula over time.

  1. tdf#74664 : Adds FOURIER() formula
  2. tdf#74664 : Compute the phase correctly using atan2
  3. tdf#74664 : optimize the computation of twiddle factors.
  4. tdf#74664 : Unit test for FOURIER formula
  5. FOURIER : use Bluestein’s algorithm for N != 2^k
  6. More test cases for FOURIER formula
  7. tdf#74664: FOURIER: add 5th optional parameter MinimumMagnitude

Performance

Below are some perf measurements for various input cases. These numbers are only for the DFT computation part and does not include the time for writing data to the spreadsheet model. We don’t use multi-threading yet for FOURIER, so these are numbers for just one cpu-core.

Case#1 :  Data length is a power of 2

Real Input length Time (ms)
32768 2
65536 4
262144 21
1048576 185
Complex Input length Time (ms)
32768 2
65536 5
262144 30
1048576 400

Case#2 : Data length is even but not a power of 2

Real Input length Time (ms)
32766 8
65534 20
262142 105
1048574 1440
Complex Input length Time (ms)
32766 15
65534 36
262142 313
1048574 3332

Case#3 : Data length is odd but not a power of 2

Real Input length Time (ms)
32767 16
65535 39
262143 313
1048575 2729
Complex Input length Time (ms)
32767 16
65535 38
262143 309
1048575 2703

 

 

Multivariate Regression in Calc

Recently I added the support for doing multivariate regression in LibreOffice Calc via its regression tool along with other related improvements. Now the regression analysis output includes a lot more statistical measures including confidence interval for all estimated parameters.

The improved regression tool in Calc can be accessed via Data > Statistics > Regression
Here is a screenshot of the regression dialog box showing the new features highlighted with red boxes.

regression-calc

Lets go through each of these new features –

A.  Now you can enter a single range that contains multiple X variable observations (along columns or rows). In this example the data-set contains three X variables namely My_X1, My_X2 and My_X3 and the dependent variable My_Y. To do a regression on this data, enter the range A1:C11 to the “Independent variable(s) (X) range” box and the range D1:D11 to the “Dependent variable (Y) range” box as shown above in the figure. With this data we want to estimate the slopes and intercept of a linear function relating the Y and X variables given by :-

My_Y  =  Slope_1 * My_X1   +   Slope_2 * My_X2   +   Slope_3 * My_X3    +    Intercept

B.  The user can now let the regression tool know that the data ranges for X and Y have text labels (or variable names) in them.

C.  In earlier versions, the regression tool only computed the slope and intercepts, so there was no way of knowing how uncertain these numbers are. Now the users can specify confidence level as a percentage and the tool will compute the corresponding confidence intervals for each of the estimates (namely the slopes and intercept).

D.  Finally now there is a way to opt out of computing the residuals for all observations. This is beneficial to someone who is only interested in the slopes and intercept estimates and their statistics.

Following screenshot shows the output of the regression tool.

regression-output

So, for our toy dataset the estimates for Slope_1 = 0.0075 (Cell B42), Slope_2 = 0.0343 (Cell B43), Slope_3 = 1.0663 (Cell B44) and the Intercept = 101.4513 (Cell B41). The 95% confidence interval for these estimates are in the columns F and G. For example Intercept’s 95% confidence interval is [98.6231, 104.2795].

Note that the above is the result of a linear regression. We can do logarithmic and power regression with multiple X variables as well in the same way.

These new features can be seen in the current development build of LibreOffice Calc and will be available in the 6.2 release. The source code patch for these features can be seen at https://bit.ly/2KI3aVk

I would like to thank Collabora Productivity for their continuous support and encouragement to work on this feature.