Wisdom

The tests that run last night showed that the naive method bettered the O(n \log n) version of the algorithm for pattern sizes up to somewhere between 800 and 1500 (depending on text size), whilst the O(n \log m) version ran faster with m greater than about 750-800:

Graph showing run-time of various algorithms for n=524,288 and varying pattern size

However, after running those tests, FFTW’s “wisdom” generator run overnight too (which basically produces the information required for plans of the specified sizes), and we can get even greater speed-ups from FFTW. So bad news for the naive method and for FLINT, which has only occasionally bettered FFTW, but only when the pattern size has been so small that the naive method wins by far anyway.

So I’ve set up some even more thorough tests to run over the weekend which should give us a really detailed view of both small pattern sizes and how things scale (though I’m not bothering to run the naive method on large pattern sizes — it takes too long!). That should give us the final results. In particular, it’s still not that clear when the O(n \log m) version (padded to a power of 2 or otherwise) beats the O(n \log n) version. This is a particularly tricky question given that the run times for the O(n \log m) version are quite unstable as m (or the transform size, in the padded case) varies. Fortunately, I think I may have found the reason for this variation and the strangely long run-times for small m: FFTWs own benchmarks show a great speed variation too!

They have a whole set of benchmarks on various platforms and for various transforms here. Of note in particular is that for transforms that are powers of 2, FFTW (and most other FFT implementations) are quite slow for small-sized transforms in comparison to larger sized transforms, as the following image shows:

FFTWs benchmarks for single precision, real 1D data

So in the case of the O(n \log m) version, when m is small, we’re doing lots and lots of quite slow transforms! In addition, when the size of the transform isn’t a power of 2, there’s a huge variation in the speed of the transform:

Both of these results occur on different platforms but in slightly different ways, so I think this probably goes a long way to explain the occasional oddities in run-times.

The plan now then is to produce some final graphs on Monday and that should hopefully wrap things up for this particular problem.

Nearly There

So I think I’m nearly there with the matching with don’t care’s algorithms, though I’ll need to do work work to the wiki pages once I’ve produced some statistics from the tests I’ve got running overnight. There’s still the unanswered question of whether or not using floating-point arithmetic is going to cause a problem (with false positives, for example), but it’s probably time to move on. If there’s an opportunity to look at this again later at some point that might be nice though…especially since the new number theory library (FLINT) seems slower than FFTW.

Having gotten to the bottom of some of the oddities in the timing (though the O(n\log m) version without any padding is still throwing up a few surprising results) it really does highlight the need to be careful when implementing some of the theory; all the stuff we ignore or “hide” in the asymptotic notation suddenly becomes a problem.

Also, we now have latex on the wiki!

Or maybe it’s to do with FFTW

So after having spent a while fixing my implementation of the Fourier Transform, a couple of early results seem to be:

+Using either FFTW or my own Fourier Transform, the run time is remarkably unaffected by the pattern size. I guess the theory tells us this much, but it’s still surprising to see an average run-time of 0.36 across about 7 different pattern sizes!

+Using my own Fourier Transform, the O(n log m) version is always quicker than the O(n log m) version and the larger n/m is, the greater the difference

+Using FFTW, this doesn’t quite follow. A graph of the the O (n log m) version would show a curve, starting and finishing higher (as m varies) than the O(n log n) version, but it does appear to dip below in the middle.

Some more tests will run overnight which should firm up these initial findings. It would seem then that the overhead is with FFTW and not the Fourier Transform per se. Though FFTW does, unsurprisingly, out-performs my code by a very long way, except for when the pattern is very very short. I hope to get to the bottom of this bizarre time increase for very small patterns tomorrow.

Back To Basics

It really does seem like there is some-kind of sweet spot in which the O(n log m) version of the FFT algorithm is faster than the O(n log n). Perhaps this make sense…if m is very small in comparison to n, the sheer number of Fourier Transforms and their associated overhead outweighs the asymptotic benefit, but if  m is large compared to n, O(n log m) is roughly the same as O(n log n) and the increased complexity means the former takes longer. In any case, it’s back to basics and I’m currently implementing the Fourier Transform myself for two reasons: firstly so that I can verify the above results aren’t due to FFTW’s “plans” and secondly as it should providing a starting point for implementing the NTT myself, which I think may be necessary in order to fully understand what’s going on. Some more research on the NTT has yielded some potential bounds on the input in order to prevent any overflow, but the more I read the less clear the workings of the library become. Understanding the NTT has also required a bit of a “back to basics” approach…much of this morning was spent trying to get some basic number theory concepts clear in my head, which should hopefully help with the NTT as a whole.

So at the end of the first week, I feel it’s been a bit up and down. At times you seem to be getting a lot done but there are also times when it’s hard going. I guess that’s the nature of anything like this.

Plan for next week: get some definitive results on the algorithms for exact pattern matching with don’t cares.

Not so fast

Progress made in all of the intended areas today. Managed to get something working with NTTs using the library, though there’s a question over false positives. I do seem to be getting some of those now, but that may be an implementation error. Something to look into tomorrow I think.

I’ve been putting together a test harness today as well in order to run the algorithms and benchmark them. Early results are a little unexpected. So far it seems that the theoretically faster O(n log m) FFT-based algorithm is generally slower than the O(n log n) version (though it does have some advantages — the smaller memory footprint means I’ve been able to test a few larger inputs). However, there do seem to be *some* input sizes that reverse the trend. Perhaps there’s a sweet spot of pattern size in relation to the text size. More work is needed, but I think the problem may be down to the sheer number of Fourier transforms being performed. Perhaps some tweaking with FFTW is required. Annoyingly I thought I’d realised my mistake…I was re-computing the Fourier Transforms of the pattern for each substring of the text (as I was using the original algorithm as a black box)…but after fixing there was only a fairly modest improvement.

Number Theoretic Transforms…

The NTT is proving to be quite hard. Despite finding a handy looking library (Finite Transform Library), it’s been causing me a bit of a headache all day. The problem seems to be that, because all arithmetic is done modulo some prime, the calculated results are obviously modulo that prime. It seems that it should be possible to get round this, perhaps by picking some prime big enough, but no luck thus far.

On the plus side, the web hosts have been great again and the equations in the wiki are now being generated by LaTeX on the server and they look much nicer. Also, the streaming model stuff is looking very interesting so I look forward to getting onto that.

Plan for tomorrow: re-organise the existing implementations into a more usable form and begin to devise and run some tests. Followed by another look at this NTT…

A Return To Coding in C

Having spent the last few months pretty much just using Java, my C skills seemed a bit rusty. Nevertheless, two initial implementations now exist for the “exact pattern matching with don’t cares” problem, the naive version and the FFT-based one. FFTW was quite straightforward to use in the end.

I’m now just beginning to get a sense of the scope of this project — there’s so much to do it’s hard to know where I should focus next…I need to add details for these implementations, come up with some way of testing and benchmarking their speeds, investigate the Number Theoretic Transform and make a list of other problems to solve. It’s quite exciting but a little daunting too! Onwards and upwards I guess.

StringPedia is born

One of the  purposes of this blog is to document the progress of my summer research project, StringPedia. The project is funded by a research bursary from EPSRC and I am working under the supervision of Dr Raphael Clifford.

StringPedia is (as the name suggests) a wiki-based project. The idea is to collect together a variety of string-based pattern matching algorithms and analyse them. A wiki is being used in the hope that the project may grow enough to get other people involved. It also means no-one can complain if I don’t add their algorithm as they can do it themselves!

Job 1 has been to set up this blog and MediaWiki, which has proved relatively straight-forward. Onwards to something more technical now; I’m going to dive in and put together an implementation of the FFT-based algorithm for solving pattern matching with don’t cares. Better get to grips with FFTW first though…