### Reconcile tickerplant feeds using longest common subsequence matching

Sanket Agrawal just posted some very cool code to the k4 mailing list for finding the longest common sub-sequences. This actually solves a problem that some colleagues had where they wanted to combine two tickerplant feeds that never seemed to match up. Here’s an example:

``` q)/ assuming t is our perfect knowledge table```

``` ```

```q)t time sym size price ------------------------ 09:00 A 8 7.343062 09:01 A 0 5.482385 09:02 B 5 8.847715 09:03 A 1 6.78881 09:04 B 5 3.432312 09:05 A 0 0.2801381 09:06 A 2 3.775222 09:07 B 3 1.676582 09:08 B 7 7.163578 09:09 B 4 3.300548 ```
Let us now create two tables, u and v, neither of which contain all the data.
``` q)u:t except t 7 8 q)v:t except t 1 2 3 4 q)u time sym size price ------------------------ 09:00 A 8 7.343062 09:01 A 0 5.482385 09:02 B 5 8.847715 09:03 A 1 6.78881 09:04 B 5 3.432312 09:05 A 0 0.2801381 09:06 A 2 3.775222 09:09 B 4 3.300548 q)v time sym size price ------------------------ 09:00 A 8 7.343062 09:05 A 0 0.2801381 09:06 A 2 3.775222 09:07 B 3 1.676582 09:08 B 7 7.163578 09:09 B 4 3.300548```

``` ```

We can find the indices that differ using Sankets difftables function
``` q)p:diffTables[u;v;t `sym;`price`size] q)show p 0; / rows in u that are not in v 1 2 3 4 q)show p 1; / rows in v that are not in u 3 4```

``` / combine together again ```

```q)`time xasc (update src:`u from u),update src:`v from v p 1 time sym size price src ---------------------------- 09:00 A 8 7.343062 u 09:01 A 0 5.482385 u 09:02 B 5 8.847715 u 09:03 A 1 6.78881 u 09:04 B 5 3.432312 u 09:05 A 0 0.2801381 u 09:06 A 2 3.775222 u 09:07 B 3 1.676582 v 09:08 B 7 7.163578 v 09:09 B 4 3.300548 u q)t~`time xasc u,v p 1 1b ```

The code can be downloaded at:
http://code.kx.com/wsvn/code/contrib/sagrawal/lcs/miller.q

Thanks Sanket.

Algorithm details:

Myers O(ND): http://www.xmailserver.org/diff2.pdf

Miller O(NP):  http://www.bookoff.co.jp/files/ir_pr/6c/np_diff.pdf

1. No Comments