A benchmark study of k-mer counting methods for high-throughput sequencing.
Authors of this article are:
Manekar SC, Sathe SR.
A summary of the article is shown below:
The rapid development of high-through sequencing technologies means that hundreds of gigabytes of sequencing data can be produced in a single study. Many bioinformatics tools require counts of substrings of length k in DNA/RNA sequencing reads obtained for applications such as genome and transcriptome assembly, error correction, multiple sequence alignment, and repeat detection. Recently, several techniques have been developed to count k-mers in large sequencing datasets, with a trade-off between the time and memory required to perform this function. We assessed several k-mer counting programs and evaluated their relative performance, primarily on the basis of runtime and memory usage. We also considered additional parameters, such as disk usage, accuracy, and parallelism, and the impact of compressed input, performance in terms of counting large k values, and the scalability of the application to larger datasets. We make specific recommendations for the set-up of a current state-of-the-art program, and suggestions for further development.
Check out the article’s website on Pubmed for more information:
This article is a good source of information and a good way to become familiar with topics such as:
Categories: Science News