Rechercher

Sur ce site


Overview

UrQt : Unsupervised Quality trimming for NGS data

Quality check (QC) is a necessary step of every Next Generation Sequencing (NGS) analyses. While customary, this preprocessing of the data still requires manual interventions in order to empirically choose tuning parameters according to different quality statistics. These choices require a strong experience in preprocessing NGS data to be made and can represent an obstacle for the reproducibility of an experiment. Moreover, if manual QC should provide data set of good quality, these procedures can potentially remove large number of nucleotides of good quality.

To overcome these common drawbacks of QC, we present a new method for the unsupervised quality trimming of NGS data implemented in the UrQt software (Unsupervised read Quality trimming). In this software, the trimming procedure relies on a well-­defined statistical framework to detect the best segmentation possible between a segment of nucleotides of good quality and a segment of nucleotides of unreliable quality at the head and tail of each read from an NGS experiment. The unsupervised aspect of the proposed method removes the need for manual expertise while its maximum likelihood aspect aims to minimize the number of nucleotides of good quality removed. By getting rid of manual intervention manual intervention for data preprocessing we also ensure its high reproducibility.

Moreover, the integration of the UrQt software into existing analysis pipelines could complete the automation of these pipelines by allowing them to process raw NGS data instead of requiring preprocessed data. Finally, we adapted the proposed method for the unsupervised trimming of homopolymers at the end or tail of the reads, which can be of interest, for example, to remove polyA tails from RNASeq data. We implemented this procedure in C++ for a fast and parallelized software with a low memory footprint. UrQt sources files (under the GPLv3 licence) and the binary executable for different operating systems are freely available.