Standalone JAR file

To use the commandline, the standalone JAR file needs to be downloaded.
To download the latest NGTax program, go to http://download.systemsbiology.nl/ngtax/ and download the latest version (you can sort by date). At this website you will also find the datasets used for the exercises, the silva databases and if you dare, for the latest updates you can try the development version of NG-Tax.

Requirements

Java (JDK) version: 1.8

Command line options

To run NG-Tax on the command line the following options are required to run the program:

  • FastQ files: -fS, -fastQsets
  • Mapping file: -mapFile
  • Forward primer sequence: -for_p, -sequence_forward
  • Reverse primer sequence: -rev_p, -sequence_reverse
  • Reference database: -refdb, -referencedatabase
  • Biom file name: -b, -biomfile

Be aware of -for_read_len and -rev_read_len, the default value is 70 and when attempting to analyse reads shorter they will be discarded.

The next sections are examples of command line usage of NGTax 2.0 using the following input files.

Paired-end reads

java -jar NGTax.jar -fS ./small_1.fastq,./small_2.fastq -mapFile small_mapping.txt -for_p "[AG]GGATTAGATACCC" -rev_p "CGAC[AG][AG]CCATGCA[ACGT]CACCT" -refdb ./SILVA_128_SSURef_tax_silva.fasta.gz -b paired_end.biom
  • Use SILVA Fasta zipped format for a database input.
  • Forward and reverse fastq files are separated by comma (“,”) where multiple libraries are separated by space (" “).

For example: ./project/fastq1_1.fastq,./project/fastq1_2.fastq ./project/fastq2_1.fastq,./project/fastq2_2.fastq

Demultiplexed paired-end reads

# Move the test fastq files into the `my_demultiplexed_folder` folder.
# mkdir my_demultiplexed_folder
# cp small_*.fastq my_demultiplexed_folder
java -jar NGTax.jar -folder ./my_demultiplexed_folder/ -mapFile mapping_file_test.txt -for_p "[AG]GGATTAGATACCC" -rev_p "CGAC[AG][AG]CCATGCA[ACGT]CACCT" -refdb ./SILVA_128_SSURef_tax_silva.fasta.gz -b demultiplexed.biom
  • Input folder should consists of only the input fastq files.
  • Paired reads should be named such that the files follow one other.
  • Mapping file will be generated automatically and saved to the -mapFile location.

Single-end reads

java -jar NGTax.jar -single -fS ./small_1.fastq -mapFile small_mapping.txt -for_p "[AG]GGATTAGATACCC" -rev_p "CGAC[AG][AG]CCATGCA[ACGT]CACCT" -refdb ./SILVA_128_SSURef_tax_silva.fasta.gz -b single_end.biom

Overview

To obtain an overview of all possibilities you can run java -jar NGTax.jar

java -jar NGTax.jar 
Usage: <main class> [options]
  Options:
    -barcodeSplitting
      Raw data is split into individual fastq files based on the given barcode 
      in the mapping file
      Default: false
    -biom2rdf
      Directly convert existing BIOM file(s) into RDF format, space separated. 
      EX. 'biom1 biom2 biom3
      Default: false
    -db2rdf
      Convert reference database to an RDF reference database
      Default: false
    -demultiplex
      Demultiplex data for submission
      Default: false
    -exportFasta
      Export fasta
      Default: false
    -help

    -ngtax
      Runs the NG-Tax pipeline
      Default: false
    -otu2fasta
      Conversion of ASV's from Biom file to FASTA
      Default: false
    -reClassify
      ReClassify the existing NG-Tax's BIOM file using a new database.
      Default: false
    -remultiplex
      Remultiplex the data to conveniently analyse the demultiplexed samples.
      Default: false

  * required parameter

To get help for the core of NG-tax, ASV identification and classification you can run java -jar NGTax.jar -ngtax

java -jar NGTax.jar -ngtax
The following options are required: [-b | -biomfile], [-for_p | -sequence_forward]
Usage: <main class> [options]
  Options:
    --help

  * -b, -biomfile
      BiomFile location
    -cR, -chimeraRatio
      ratio otu_parent_abundance/otu_chimera_abundance (recommended 2, both 
      otu parents must be two times more abundant than the otu chimera)
      Default: 2.0
    -clR, -classifyRatio
      the minimum ratio that a taxon needs to be compared to others
      Default: 0.8
    -email
      User email (Galaxy Only)
    -errorCorr, -errorCorrClusPer
      Number of mismatch(es) allowed for each ASV clusters (only one mismatch 
      recommended, input: 1)
      Default: 1
    -fS, -fastQ
      Either a set of fastQfiles seperated by a space 'fastQF1,fastQR1 
      fastQF2,fastQR2' 
    -fQF, -fastQFiles
      Create fastQ files for each library
      Default: false
    -fastaFileLocation
      output Fasta file location
    -folder
      Folder location of the fastQfiles, should be a clean folder with only 
      the input (fastq) files
    -for_read_len
      Forward read length
      Default: 70
    -identLvl, -identityLevel
      identity level between parents and chimera (recommended 100, no error 
      allowed, chimera as perfect combination of two otus)
      Default: 100.0
    -log
      Location of the log file
    -mapFile
      Mapping file containing metadata [txt]
    -m, -markIfMoreThen1
      Mark the classification with *~ if there are more then 1 possible 
      spieces 
      Default: false
    -OTUSizeT, -minimumOTUSize
      Minimum size of an ASV before minimum threshold filtering
      Default: 100
    -minPerT, -minimumThreshold
      Minimum threshold detectable, expressed in percentage
      Default: 0.1
    -nomismatch
      Primers are not allowed to have a mismatch when a database is created 
      using a simple FASTA file
      Default: false
    -prefixId
      Prefix for the id of the otu's
    -primersRemoved
      Are the primers already removed?
      Default: false
    -project
      Project description (Galaxy Only)
    -refdb, -referencedatabase
      The reference fasta file (Aligned or not aligned)
    -rja, -rejectedASVAnnotation
      The number of rejected ASVs to be classified according to its abundance 
      (-1 to classify everything)
      Default: 100
    -rev_read_len
      Reverse read length
      Default: 70
  * -for_p, -sequence_forward
      Forward primer sequence (degenerate positions between brackets or use 
      degenerate letters)
    -rev_p, -sequence_reverse
      Reverse primer sequence (degenerate positions between brackets or use 
      degenerate letters)
    -shannon, -shannonEvenness
      Minimum threshold based upon the Shannons equitability method 
      (dynamically estimated assuming that each ASV represents a species)
      Default: false
    -single
      Single end reads
      Default: false
    -skipFiltering
      Skip filtering step reuse data already generated
      Default: false
    -t, -ttl
      Generate a BIOM RDF file

  * required parameter

Laboratory of Systems and Synthetic Biology - Wageningen University & Research