Final Results (Kind of) |

The wiki page has now been updated with the final set of results for small pattern sizes. I’ve upped the number of repeats for each test from 3 to 7, so everything’s taking a little longer, but I’ll set up some new tests for larger pattern sizes to run over the weekend. Fortunately, I’ve now modified the test harness to output a data file with which I can plot a graph with gnuplot nice and quickly. Not quite sure why I wasn’t doing that before to be honest. This also has the bonus of producing SVG output.

The latest set of results include real-to-real transforms, the $O(n \log m)$ algorithm with a minimum sub-string size, a variation where no wild cards are allowed in the text and combinations of the above. To summarize some results:

All methods seem to be performing better wrt to the naive approach. I can only assume this is relation to improvements in the compiler after I updated earlier in the week.
The absolute best seems to be a combination of the above (using every trick in the book you might say): we use the $O(n \log m)$ algorithm, with a minimum sub-string size, disallowing wild-cards in the text, and use the real-to-real transform. This beats the naive method on all but the first sample point at $m=32$ .
Forcing a minimum size has completely gotten rid of the erratic behaviour for small $m$
If we allow wild cards in the text, the $O(n \log n)$ and $O(n \log m)$ versions become faster than the naive approach when $m$ is greater than approximately 300 and 80 respectively.
These results seem to hold fairly well for different $n$ — i’m currently running a test with $n=2^{21}=2097152$ which should verify that.
FLINT is the exception to the consistency rule — it gets noticeably worse wrt to the naive method as $m$ increases (ranging from being faster for $m > 600$ to $m > 900$ so far)

There’s plenty of graphs on the wiki page here, as well as updated results. The FFT page has also been updated.

So, still to do:

Produce final results for large m
Put the library of methods together so it’s a bit more usable and upload it to the wiki
Maybe give some consideration to memory usage

Ben Smithers

Rambling thoughts on Programming / Bioinformatics / Personal Life

Final Results (Kind of)

Leave a Reply Cancel reply