First Results For K-mismatches |

I’ve just finished putting together something to time a few of the different methods for solving the k-mismatches problem. It looks like the results could be quite interesting. There’s quite a few variables and it will be interesting to see how that affects the relative performance of the methods. It’s already looking like that, for randomly generated texts and patterns, the naive method might be hard to beat. I guess this was also the case for the “exact matching with don’t cares problem”, though I’ve not really run any proper tests with randomized inputs for that.

It may be worthwhile putting more effort into the LCE generation. At the moment the pre-processing time is $O(n + m^2)$ and so is taking a fair while. But I’m seeing results that show that, if I exclude the pre-processing from the timing, the $O(n\sqrt{k}\log{m})$ method is running quite a lot faster than the $O(n\sqrt{m\log{m}})$ algorithm with $k=1$ . I guess I should wait till morning and see more results before drawing too many conclusions though! In any case, it will be interesting to time the creation of the suffix tree; in particular I’ll be interested to see how the creation of a suffix tree for text and pattern compares with just creating the suffix free for the pattern and using the “p-representation” for the text.

Ben Smithers

Rambling thoughts on Programming / Bioinformatics / Personal Life

First Results For K-mismatches

Leave a Reply Cancel reply