Summary

NG-Tax 2.0 workflow

NG-Tax 2.0 workflow

NG-Tax 2.0 is a semantic framework for FAIR high-throughput analysis and classification of marker gene amplicon sequences including bacterial and archaeal 16S, and eukaryotic 18S and ITS rRNA sequences. It employs a fast de novo OTU-picking algorithm to generate Amplicon Sequence Variants (ASV). Using the RDF data model, ASV’s can be automatically stored in a graph database as objects that link ASV sequences with the full data-wise and element-wise provenance, thereby achieving the level of interoperability required to utilize such data to its full potential. The graph database can be directly queried, allowing for comparative analyses over thousands of samples and is connected with an interactive Rshiny toolbox for analyzing and visualizing of (meta) data. Additionally, NG-Tax 2.0 exports an extended BIOM 1.0 (JSON) file as starting point for further analyses by other means. The extended BIOM file contains new attribute types to include information about the command arguments used, the sequences of the ASVs formed, classification confidence scores and is backwards compatible.

The performance of NG-Tax 2.0 was compared with DADA2, using the plugin in the QIIME 2 analysis pipeline. Ten 16S rRNA gene amplicon mock community samples were obtained from the Mockrobiota resource and evaluated. Precision of NG-Tax 2.0 was significantly higher with an average of 0.95 vs 0.62 for DADA2-QIIME 2 while recall was comparable with an average of 0.85 and 0.86, respectively.

NG-Tax 2.0 is written in Java. The code, the ontology, a Galaxy platform implementation, the analysis toolbox and tutorials are freely available here under the MIT License.

Laboratory of Systems and Synthetic Biology - Wageningen University & Research