I’ve been meaning to get around to benchmarking a couple of implementations that solve the “matching with don’t cares” problem by using some randomisation. This reduces the number of convolutions we need at the sacrifice of the small chance of a false-positive. More specifically, we can reduce the number of convolutions from 3 to 2, or even 1 if we consider the case where wild-cards are only allowed in the pattern.
The implementation allowing wild-cards in either follows straight from Adam Kalai’s paper; the method only allowing wild-cards in the pattern is a little different though follows the same broad idea. If I get time I guess I should probably implement Kalai’s method for this for completeness.
Anywho, onto the results which are quite pleasing:
- In both cases, the point at which the algorithms become faster than the naive method is somewhere between a half and two thirds of the deterministic equivalent. This follows nicely from the reduction in the number of convolutions.
- With wild cards in the text and pattern, this is a reduction from around 80 to 45-50.
- With no wild cards in the text, we get a reduction from about 50 to 25.
You can see some graphs of this either in SVG format or in PNG format.
These results were repeats for a few different text sizes in the range of 1-4 million; I might repeat it overnight with a larger text size but I don’t expect to see any difference. You can read more about the testing and results over at the page on StringPedia.
The wiki page has now been updated with the final set of results for small pattern sizes. I’ve upped the number of repeats for each test from 3 to 7, so everything’s taking a little longer, but I’ll set up some new tests for larger pattern sizes to run over the weekend. Fortunately, I’ve now modified the test harness to output a data file with which I can plot a graph with gnuplot nice and quickly. Not quite sure why I wasn’t doing that before to be honest. This also has the bonus of producing SVG output.
The latest set of results include real-to-real transforms, the algorithm with a minimum sub-string size, a variation where no wild cards are allowed in the text and combinations of the above. To summarize some results:
- All methods seem to be performing better wrt to the naive approach. I can only assume this is relation to improvements in the compiler after I updated earlier in the week.
- The absolute best seems to be a combination of the above (using every trick in the book you might say): we use the algorithm, with a minimum sub-string size, disallowing wild-cards in the text, and use the real-to-real transform. This beats the naive method on all but the first sample point at .
- Forcing a minimum size has completely gotten rid of the erratic behaviour for small
- If we allow wild cards in the text, the and versions become faster than the naive approach when is greater than approximately 300 and 80 respectively.
- These results seem to hold fairly well for different — i’m currently running a test with which should verify that.
- FLINT is the exception to the consistency rule — it gets noticeably worse wrt to the naive method as increases (ranging from being faster for to so far)
There’s plenty of graphs on the wiki page here, as well as updated results. The FFT page has also been updated.
So, still to do:
- Produce final results for large m
- Put the library of methods together so it’s a bit more usable and upload it to the wiki
- Maybe give some consideration to memory usage
So most of the day has been spent updating the wiki with the findings of the last set of tests. The page for exact matching with don’t cares and all the pages it links to are starting to look a lot more complete. It’s taken a bit longer than I’d hoped, but I think there’s a few interesting results. There’s still more to be updated on the wiki; I think I may just try to add a little more to the section on comparing all the methods and then update the rest side-by-side with whatever I start to look at next…it’s quite hard and time consuming trying to write clearly and concisely for the wiki, but mixing it in with some research and coding should provide a nice break.
The tests that run last night showed that the naive method bettered the version of the algorithm for pattern sizes up to somewhere between 800 and 1500 (depending on text size), whilst the version ran faster with m greater than about 750-800:
However, after running those tests, FFTW’s “wisdom” generator run overnight too (which basically produces the information required for plans of the specified sizes), and we can get even greater speed-ups from FFTW. So bad news for the naive method and for FLINT, which has only occasionally bettered FFTW, but only when the pattern size has been so small that the naive method wins by far anyway.
So I’ve set up some even more thorough tests to run over the weekend which should give us a really detailed view of both small pattern sizes and how things scale (though I’m not bothering to run the naive method on large pattern sizes — it takes too long!). That should give us the final results. In particular, it’s still not that clear when the version (padded to a power of 2 or otherwise) beats the version. This is a particularly tricky question given that the run times for the version are quite unstable as m (or the transform size, in the padded case) varies. Fortunately, I think I may have found the reason for this variation and the strangely long run-times for small m: FFTWs own benchmarks show a great speed variation too!
They have a whole set of benchmarks on various platforms and for various transforms here. Of note in particular is that for transforms that are powers of 2, FFTW (and most other FFT implementations) are quite slow for small-sized transforms in comparison to larger sized transforms, as the following image shows:
So in the case of the version, when m is small, we’re doing lots and lots of quite slow transforms! In addition, when the size of the transform isn’t a power of 2, there’s a huge variation in the speed of the transform:
Both of these results occur on different platforms but in slightly different ways, so I think this probably goes a long way to explain the occasional oddities in run-times.
The plan now then is to produce some final graphs on Monday and that should hopefully wrap things up for this particular problem.
So I think I’m nearly there with the matching with don’t care’s algorithms, though I’ll need to do work work to the wiki pages once I’ve produced some statistics from the tests I’ve got running overnight. There’s still the unanswered question of whether or not using floating-point arithmetic is going to cause a problem (with false positives, for example), but it’s probably time to move on. If there’s an opportunity to look at this again later at some point that might be nice though…especially since the new number theory library (FLINT) seems slower than FFTW.
Having gotten to the bottom of some of the oddities in the timing (though the version without any padding is still throwing up a few surprising results) it really does highlight the need to be careful when implementing some of the theory; all the stuff we ignore or “hide” in the asymptotic notation suddenly becomes a problem.
Also, we now have latex on the wiki!
So after having spent a while fixing my implementation of the Fourier Transform, a couple of early results seem to be:
+Using either FFTW or my own Fourier Transform, the run time is remarkably unaffected by the pattern size. I guess the theory tells us this much, but it’s still surprising to see an average run-time of 0.36 across about 7 different pattern sizes!
+Using my own Fourier Transform, the O(n log m) version is always quicker than the O(n log m) version and the larger n/m is, the greater the difference
+Using FFTW, this doesn’t quite follow. A graph of the the O (n log m) version would show a curve, starting and finishing higher (as m varies) than the O(n log n) version, but it does appear to dip below in the middle.
Some more tests will run overnight which should firm up these initial findings. It would seem then that the overhead is with FFTW and not the Fourier Transform per se. Though FFTW does, unsurprisingly, out-performs my code by a very long way, except for when the pattern is very very short. I hope to get to the bottom of this bizarre time increase for very small patterns tomorrow.
It really does seem like there is some-kind of sweet spot in which the O(n log m) version of the FFT algorithm is faster than the O(n log n). Perhaps this make sense…if m is very small in comparison to n, the sheer number of Fourier Transforms and their associated overhead outweighs the asymptotic benefit, but if m is large compared to n, O(n log m) is roughly the same as O(n log n) and the increased complexity means the former takes longer. In any case, it’s back to basics and I’m currently implementing the Fourier Transform myself for two reasons: firstly so that I can verify the above results aren’t due to FFTW’s “plans” and secondly as it should providing a starting point for implementing the NTT myself, which I think may be necessary in order to fully understand what’s going on. Some more research on the NTT has yielded some potential bounds on the input in order to prevent any overflow, but the more I read the less clear the workings of the library become. Understanding the NTT has also required a bit of a “back to basics” approach…much of this morning was spent trying to get some basic number theory concepts clear in my head, which should hopefully help with the NTT as a whole.
So at the end of the first week, I feel it’s been a bit up and down. At times you seem to be getting a lot done but there are also times when it’s hard going. I guess that’s the nature of anything like this.
Plan for next week: get some definitive results on the algorithms for exact pattern matching with don’t cares.
Progress made in all of the intended areas today. Managed to get something working with NTTs using the library, though there’s a question over false positives. I do seem to be getting some of those now, but that may be an implementation error. Something to look into tomorrow I think.
I’ve been putting together a test harness today as well in order to run the algorithms and benchmark them. Early results are a little unexpected. So far it seems that the theoretically faster O(n log m) FFT-based algorithm is generally slower than the O(n log n) version (though it does have some advantages — the smaller memory footprint means I’ve been able to test a few larger inputs). However, there do seem to be *some* input sizes that reverse the trend. Perhaps there’s a sweet spot of pattern size in relation to the text size. More work is needed, but I think the problem may be down to the sheer number of Fourier transforms being performed. Perhaps some tweaking with FFTW is required. Annoyingly I thought I’d realised my mistake…I was re-computing the Fourier Transforms of the pattern for each substring of the text (as I was using the original algorithm as a black box)…but after fixing there was only a fairly modest improvement.
The NTT is proving to be quite hard. Despite finding a handy looking library (Finite Transform Library), it’s been causing me a bit of a headache all day. The problem seems to be that, because all arithmetic is done modulo some prime, the calculated results are obviously modulo that prime. It seems that it should be possible to get round this, perhaps by picking some prime big enough, but no luck thus far.
On the plus side, the web hosts have been great again and the equations in the wiki are now being generated by LaTeX on the server and they look much nicer. Also, the streaming model stuff is looking very interesting so I look forward to getting onto that.
Plan for tomorrow: re-organise the existing implementations into a more usable form and begin to devise and run some tests. Followed by another look at this NTT…
Having spent the last few months pretty much just using Java, my C skills seemed a bit rusty. Nevertheless, two initial implementations now exist for the “exact pattern matching with don’t cares” problem, the naive version and the FFT-based one. FFTW was quite straightforward to use in the end.
I’m now just beginning to get a sense of the scope of this project — there’s so much to do it’s hard to know where I should focus next…I need to add details for these implementations, come up with some way of testing and benchmarking their speeds, investigate the Number Theoretic Transform and make a list of other problems to solve. It’s quite exciting but a little daunting too! Onwards and upwards I guess.