Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We are talking about a brute force, exhaustive search method versus algorithms which have significantly better time complexities. I don't understand how the word "approximate" applies to these algorithms which are largely or entirely deterministic, but sequence alignment is an inherently approximate question. You want your answer to be approximate, because mismatches and gaps are real things. In fact, finding the "approximate" answers are what the OP is after - places in the genome that look like, but aren't exactly, the query sequence. And as soon as you expand your query length or edit distance even a little, finding that becomes completely infeasible. In fact, the OP lucked out because this is a very special circumstance (this method is completely computationally infeasible for 99/100 applications of this problem).


The method they accidentally found is not as bad as you make it to be.

https://en.wikipedia.org/wiki/Rabin–Karp_algorithm

Rabin karp with rolling hashes is actually (not exactly but almost) what tools like rsync, or bittorent use to find chunks and differences in files. So it scales really well.

The algorithms you cite come from a time when computers looked rather different in their architectures.

Linearly scanning the entire genome from ram for example, could be significantly better than performing multiple index lookups from disk.

Many problems of bioinformatics (the text processing) aren't that hard or special anymore, a genome is tiny compared to the amount of data we have lying around elsewhere.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: