The galaxy environment works best when you have amplicon libraries, meaning multiple samples with barcodes in a single / paired FASTQ file (primers are optional)

When you have many demultiplexed samples we suggest to use the command line version of NGTax 2.0

## NG-Tax 2.0 in Galaxy

NG-Tax is available as a standalone software package or within galaxy through docker. Docker is a virtual environment which allows you to easily deploy a large variety of applications.

To start Galaxy with NG-Tax you need to have docker installed and then you can run the following command:

# To enable the interactive environment within galaxy we use the --privileged option.
docker run -p 8080:80 -p 8021:21 -p 8800:8800 --privileged=true -v ~/ngtax_storage/:/export/ wurssb/ngtax

# The internal ports 80, 21, 8800 are made available towards 8080, 8021 and 8800 respectively. Therefore to access galaxy you can access it via [http://localhost:8080](http://localhost:8080) or the name of the server where it runs on.

You can access the NGTax Galaxy environment via http://localhost:8080

The first time you start this docker image, galaxy will copy its internal files to your home directory inside the ngtax_storage folder to assure that the next time you restart your computer or the docker instance all your result files and accounts are not lost due to dockers nature.

Input files can be upload via the Galaxy interface through “Get data” menu or the shortcut icon. After the white upload box appeared, your files can be upload by either browsing through your computer or drag and drop the files into the box. Then click Start.

In this tutorial we use a paired-end dataset of two FASTQ files and a mapping file containing the library information.

All the files used in the tutorial can be found here: http://download.systemsbiology.nl/ngtax/dataset/

When each job is completely processed, indicates by the status, any files can be view and download. You can view the results by pressing on the eye icon at the top, or download it by clicking on the name of the job causing it to unfold and select the floppy disk icon to download as shown below.

## Using NGTax

To start using NGTax, when you have finished uploading the files, you have to unfold NG-Tax in the left panel and select NG-Tax. This will give you an overview of all the options and the next sections is a step by step guide through each section.

## Identifier & Description

In this first section you can give a short identifier that is used by galaxy in the history, so that jobs are easier to find back and give a description of the project.

Here you can use as identifier for instance demo and as a project for instance demo project

## Mapping file

Select the mapping file in the right section that you have just uploaded. The exact layout of this mapping file is specified in the files usage section. It is best to open another tab so that you do not lose the current view.

## ASV taxonomic classification

Two options are available for ASVs classification:

• “yes”: to classify the ASV into a taxonomic lineage using the provided database.
• “no”: do not classify the ASV into a taxonomic lineage.

In this demo we choose yes for classification and use Silva 132 as the database

## Input FASTQ/FASTA files

The input FASTQ/FASTA files can be either paired or single end reads.

In this demo we choose paired-end

### FASTQ/FASTA sets

A FASTQ/FASTA set is a single end or paired end FASTQ/FASTA This file / files can consists of multiple samples and are often known as amplicon libraries.

If you have demultiplexed files and many of them we suggest to use the command line and use the -folder option to automatically analyse the demultiplexed files that are stored within a given folder.

To give the FASTQ/FASTA files to the pipeline, the “Insert FASTQ/FASTA sets” button needs to be pressed. This will open up a new section where you are able to select the forward and the reverse files for paired-end reads and a single file for single-end reads.

In this demo we click insert FastQ sets once and add small_1.fastq.txt as fastQ file.1 and small_2.fastq.txt as fastQ file.2

Please note: As can be seen in the figure above, galaxy tries to automatically add the FASTQ/FASTA files from your history but by doing so only adds the last file (often twice). Therefore always double check if the input files are correct!

After pressing you can add as many pairs of FASTQ files as needed by pressing the button again.
If you want to remove a set of FASTQ files press on the garbage icon in the yellow region.

Select the preffered read length for the forward and reverse sequence used in the analysis. Default is 70 and for most optimal results also adviced.

In this demo we keep the default settings

### Primer sequences

Add the primer sequences for the forward and the reverse primers.
The degenerate positions can also be filled in with the corresponding letters.
If you don’t know the degenerate letters, you can use square brackets ‘[ ]’, as in the example below.

Please note: Primers are obligatory as they are required to build the lookup database.

In this demo we keep the default settings

## Other settings

In this demo we keep the default settings

Primer removed:
Select “yes”, if your data is demultiplexed. Meaning that the barcode and the primer is removed.

Ratio ASV abundance:
Chimera checking, select the ratio that the parents need to more abundant than the chimera.

Classify ratio:
ASV abundance in ratio to use as threshold for taxonomic classification of an ASV.

Minimum percentage threshold:
Select the minimum threshold that an ASV needs to be present compared to all the reads in the sample.

Identity level (%):
Identity level between parents and chimera (recommended 100%, no error allowed, chimera as perfect combination of two ASVs)

Error correction:
Select the number of mismatches allowed for grouping input sequences into ASVs. Strongly recommended only 1 mismatch allowed.

Show if there are more taxonomies, if applicable:
Select if you want to (show) highlight if there are more than one species for an ASV, if there are more taxonomies. I.e. when the classification confidence is <100%.

Creates a second output file in turtle format.

## Run

If all the sections are filled in, press ‘Execute’.

On a standard laptop/desktop it takes approximately 10mins to analyse this test set

## Downstream analysis

Once the output files are generated the biom file can be analysed using standard methods such as is mentioned in http://biom-format.org. In addition the Biom or RDF file can be analysed using the NGTax toolbox and the RDF file, which is exported in a TURTLE format (extension .ttl) can be directly loaded into a triple store such as GraphDB. See the RDF tutorial for more details.

## Galaxy job status

### Statuses

There are 5 states to be found in the user history in galaxy. More information can be found here

### Completed jobs & Data retrieval

You can see if a job is completed, by the fact that the color is green and the icon (next to the name) is gone.

When the job is completed, you can view the results by pressing on the eye icon at the top, or download it by clicking on the name of the job causing it to unfold and select the download icon (the floppy disk) as shown below.

### Failed

You can see if a job failed, by the fact that the color is red and there is an ‘X’ next to the name of the job.

To view what the reason is the job failed, select the job and press on the left ‘i’ icon.

## File preparation

Minimal requirements
- Mapping file.
- One or two FASTQ/FASTA file(s) containing the amplicon sequences.
- Primers used (used for the creation of the classification database), this is also the case for demultiplexed data in which the primers are removed.

For the mapping file
* If the forward barcode is removed, the column is still compulsory but the content can be empty.
** If the reverse barcode is not known, the column is not compulsory. Can not only use a reverse barcode sequence.
*** Barcodes in each library must be unique.

Demultiplexed data
We advice to try the command line version of NGTax when you work with many samples that already have been demultiplexed into separate FASTQ files. When using galaxy, each FASTQ set is a sample and corresponds to a new library (increment the library number) meaning that if you have 10 samples you will have 10 entries starting from library 1 up to library 10.

Example of a paired-end mapping file:

Example of a demultiplexed paired-end mapping file:

docker pull wurssb/ngtax