NG-Tax file layout

Files that are required to use NG-Tax are:

  • A mapping file
  • Paired or single end FASTQ files

Mapping file

The mapping file should be tab-delimited format and contain several different columns with the header names:

#sampleID, barcodeSequences, library number, direction, library name, project name, region, location, and description.

The columns after description can be used to put metadata if available.
An example of the layout:

Example of a paired-end FASTQ file:

Example of a demultiplexed paired-end FASTQ file:

Explaining the headers:

#sampleID: This is a unique name for that sample.
forwardBarcodeSequence: This is the barcode sequence for the forward FASTQ file. *
reverseBarcodeSequence: This is the barcode sequence for the reverse FASTQ file. **
LibraryNumber: This is a positive number, usually starting at 1, and increases by one for each library. ***
Direction: Use ‘p’ for paired end sequences and ‘s’ for single end sequences.
LibraryName: Name of the FASTQ files composed to the library. ProjectName: The name of the project.
Region: The 16S region sequenced.
Location: If known, give the location.
Description: This column can be add in as a last column to give a short description of the sample.

Note
* If the forward barcode is not known, the column is still compulsory but the content can be empty.
** If the reverse barcode is not known, the column is not compulsory. Can not only use a reverse barcode sequence.
*** Barcodes in each library must be unique.

For an example of the mapping file right click download and select ‘save link as’.

Pair-end FASTQ files

  • Data needs to be high read count.
  • Each sequence needs to be in standard FASTQ format as follows:
    • Start with a header which starts with a ‘>’,
    • Followed by the sequence
    • A ‘+’ sign.
    • Finally the quality score, each on a separate line.
  • FASTQ files need to be pair end in separate files, the paired file is separated by the same header with different identifier called forward and reverse files.
  • The paired files need to be the same length.

Example of forward FASTQ file: example of a forward FASTQ file

Example of reverse FASTQ file: example of a reverse FASTQ file

Single-end FASTQ files

  • Data needs to be high read count.
  • Each sequence needs to be in standard FASTQ format as follows:
  • Start with an header which starts with a ‘>’,
  • Followed by the sequence
  • A ‘+’ sign
  • Finally the quality score, each on a separate line.

Example of a single end FASTQ file: example of a single FASTQ file

Database

Generally, SILVA database can be use for taxonomic assignment. The program supported both aligned and unaligned file format.

Latest SILVA database (v.132): Unaligned and Aligned

***Noted that custom databse in FASTA format could also be use for taxonomic classification. This custom database must contain the primer for the hypervariable region of interest. The header must contain an identifier follow by a space. Then the maximum of 7 taxonomic lineage, separated by ‘;’ and cannot starts with ‘Eukaryota’. Example: “>Identifier Kingdom;Phylum;Class;Order;Family;Genus;Species”

Laboratory of Systems and Synthetic Biology - Wageningen University & Research