Document Type
Article
Publication Date
3-2020
Abstract
Motivation
Microbiome analyses of clinical samples with low microbial biomass are challenging because of the very small quantities of microbial DNA relative to the human host, ubiquitous contaminating DNA in sequencing experiments and the large and rapidly growing microbial reference databases.
Results
We present computational subtraction-based microbiome discovery (CSMD), a bioinformatics pipeline specifically developed to generate accurate species-level microbiome profiles for clinical samples with low microbial loads. CSMD applies strategies for the maximal elimination of host sequences with minimal loss of microbial signal and effectively detects microorganisms present in the sample with minimal false positives using a stepwise convergent solution. CSMD was benchmarked in a comparative evaluation with other classic tools on previously published well-characterized datasets. It showed higher sensitivity and specificity in host sequence removal and higher specificity in microbial identification, which led to more accurate abundance estimation. All these features are integrated into a free and easy-to-use tool. Additionally, CSMD applied to cell-free plasma DNA showed that microbial diversity within these samples is substantially broader than previously believed.
Availability and implementation
CSMD is freely available at https://github.com/liuyu8721/csmd.
Recommended Citation
Liu Y, Bible PW, Zou B, Liang Q, Dong C, Wen X, Li Y, Ge X, Li X, Deng X, Ma R. CSMD: A computational subtraction-based microbiome discovery pipeline for species-level characterization of clinical metagenomic samples. Bioinformatics. 2019 Oct 18. doi: 10.1093/bioinformatics/btz790
Comments
Funding: This work was supported by the National Basic Research Program of China [2015CB964601 to L.W.]; the National Natural Science Foundation of China [81570828 to L.W., 11771462 to X.Q.W.]; and the International Science and Technology Cooperation Program of Guangdong [2016B050502007 to X.Q.W.].