Working, but slow |

After some work, I’ve now got the new library working — I’m using LCA queries on the suffix tree in order to calculate pattern-pattern LCEs. In addition, I’ve now converted the p-representation generated to work with the new library.

After all that, I was a bit disappointed to find out that new library isn’t performing so well; the old library builds the suffix tree quite a bit faster and is much much faster at creating the p-representations of the text. Things aren’t quite as bad as I first found: it turns out that the new library is affected quite heavily by optimization levels (perhaps unsurprising, since it’s OO and C++ so inlining, unrolling etc are going to be needed for speed) and also that I was using the slower of the two suffix tree methods. I’m currently investigating ways of making if faster : for example, it provides various implementations of suffix arrays, one of which is uncompressed, so I believe should be faster than the default, compressed array. I’m also expecting an update on the library from the author, who tells me that the latest version is faster than the one I’m currently using.

In general, I think it may always be a little slower than the old library: it provides more functionality and focuses on size as well, but I’m still surprised at how much slower it is. The best I’ve seen from it is still 6-7x slower at generating suffix trees and the p-representation generated is even worse.

Let’s hope I can fix this otherwise the best I’ll be able to do is to use the old library to create the p-representations and the new library for LCE calculations. That would involve creating two suffix trees though, so it seems a bit pointless!

Ben Smithers

Rambling thoughts on Programming / Bioinformatics / Personal Life

Working, but slow

Leave a Reply Cancel reply