Database

Currently, the application has 3 types of databases: Sequence, Taxonomy and kraken2. Note that, if the user chooses the Sequence database, the user must also select the Taxonomy database, otherwise the data analysis will be incorrect.

For Sequence database, we have the following database types:

Name database	File	size	Link download	Description
2022.10.seqs.fna.qza	2022.10.seqs.fna.qza	286.9 MB	https://ftp.microbio.me/greengenes_release/2022.10/	The name of the file we're going to download is "2022.10.taxonomy.asv.nwk.qza", which means it is "taxonomy" data with the feature IDs represented as the actual amplicon sequence variants;
Grenegenes2022.10.backbone.full-length.fna.qza	2022.10.backbone.full-length.fna.qza	61.7 MB	https://ftp.microbio.me/greengenes_release/2022.10/	Greengenes2 contains over 20,000,000 16S rRNA V4 amplicon sequencing fragments, derived from a dizzying collection of public and private microbiome samples in Qiita, representing a very large cross section of environment types"
ref-97_otus.qza		28.3 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
sh_refs_qiime_ver9_97_29.11.2022.qza	sh_qiime_release_29.11.2022.tgz	12.5 MB	https://doi.plutof.ut.ee/doi/10.15156/BIO/2483915	QIIME is a bioinformatics data science platform, originally developed for analysis of high-throughput microbiome marker gene (e.g., 16S or 18S rRNA genes) amplicon sequencing data. There have been two major versions of the QIIME platform, QIIME 1 and QIIME 2.
sh_refs_qiime_ver9_99_29.11.2022.qza	sh_qiime_release_29.11.2022.tgz	17.8 MB	https://doi.plutof.ut.ee/doi/10.15156/BIO/2483915	QIIME is a bioinformatics data science platform, originally developed for analysis of high-throughput microbiome marker gene (e.g., 16S or 18S rRNA genes) amplicon sequencing data. There have been two major versions of the QIIME platform, QIIME 1 and QIIME 2.
sh_refs_qiime_ver9_dynamic_29.11.2022.qza	sh_qiime_release_29.11.2022.tgz	15.9 MB	https://doi.plutof.ut.ee/doi/10.15156/BIO/2483915	QIIME is a bioinformatics data science platform, originally developed for analysis of high-throughput microbiome marker gene (e.g., 16S or 18S rRNA genes) amplicon sequencing data. There have been two major versions of the QIIME platform, QIIME 1 and QIIME 2.
silva-138-99-seqs-515-806.qza	Greengenes 13_8 SEPP reference database	13.9 MB	https://docs.qiime2.org/2022.8/data-resources/#taxonomy-classifiers-for-use-with-q2-feature-classifier%20silva:%20https://www.arb-silva.de/download/archive/qiime%20silva%20104%20silva%20108%20silva%20111%20silva%20119%20silva%20123%20silva%20128%20silva%20132	The SSU Ref NR 99 138.1 dataset is based on the full SSU Ref 138.1 dataset , in total encompassing 510,508 sequences. By applying a 99% identity criterion to remove highly similar sequences using the open external link in new window vsearch tool with a custom sequence order first based on presence in the last release's Ref NR 99 and second based on combination of sequence length (weighted twofold) and quality.
silva-138-99-seqs.qza	Greengenes 13_8 SEPP reference database	92.6 MB	https://docs.qiime2.org/2022.8/data-resources/#taxonomy-classifiers-for-use-with-q2-feature-classifier%20silva:%20https://www.arb-silva.de/download/archive/qiime%20silva%20104%20silva%20108%20silva%20111%20silva%20119%20silva%20123%20silva%20128%20silva%20132	The SSU Ref NR 99 138.1 dataset is based on the full SSU Ref 138.1 dataset , in total encompassing 510,508 sequences. By applying a 99% identity criterion to remove highly similar sequences using the open external link in new window vsearch tool with a custom sequence order first based on presence in the last release's Ref NR 99 and second based on combination of sequence length (weighted twofold) and quality. For the sorting, the quality of a sequence is determined by ambiguities (50%), overall alignment quality (45%), and homopolymers (5%).
silva_132_90_16S_sequence.qza	Silva_132_release.zip	10.8 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_90_18S_sequence.qza	Silva_132_release.zip	13.2MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_94_16S_sequence.qza	Silva_132_release.zip	10.8 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_94_18S_sequence.qza	Silva_132_release.zip	5.5 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_97_16S_sequence.qza	Silva_132_release.zip	47.3 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_99_16S_sequence.qza	Silva_132_release.zip	89.9 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_99_18S_sequence.qza	Silva_132_release.zip	15.5 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.

For Taxonomy database, we have the following database types:

Name database	File	size	Link download	Description
2022.10.backbone.tax.qza	2022.10.backbone.tax.qza	4.5 MB	https://ftp.microbio.me/greengenes_release/2022.10/	The Greengenes database full redesigned from the ground up, backed by whole genomes, with a focus on harmonizing 16S rRNA and shotgun metagenomic datasets.
Grenegenes2022.10.taxonomy.md5.tsv.qza	2022.10.taxonomy.md5.tsv.qza	424.3 MB	https://ftp.microbio.me/greengenes_release/2022.10/	The Greengenes database full redesigned from the ground up, backed by whole genomes, with a focus on harmonizing 16S rRNA and shotgun metagenomic datasets.
ref-taxonomy.qza		1.2 MB	https://www.arb-silva.de/download/archive/qiime	QIIME 2 is a software platform used for microbiome analysis, particularly for processing and analyzing DNA sequence data derived from microbial communities.
sh_taxonomy_qiime_ver9_97_29.11.2022.qza	sh_qiime_release_29.11.2022.tgz	1.9 MB	https://doi.plutof.ut.ee/doi/10.15156/BIO/2483915	QIIME 2 is a software platform used for microbiome analysis, particularly for processing and analyzing DNA sequence data derived from microbial communities.
sh_taxonomy_qiime_ver9_99_29.11.2022.qza	sh_qiime_release_29.11.2022.tgz	3 MB	https://doi.plutof.ut.ee/doi/10.15156/BIO/2483915	QIIME 2 is a software platform used for microbiome analysis, particularly for processing and analyzing DNA sequence data derived from microbial communities.
sh_taxonomy_qiime_ver9_dynamic_29.11.2022.qza	2.7 MB	https://doi.plutof.ut.ee/doi/10.15156/BIO/2483915	QIIME 2 is a software platform used for microbiome analysis, particularly for processing and analyzing DNA sequence data derived from microbial communities.
silva-138-99-tax-515-806.qza	Greengenes 13_8 SEPP reference database	5.3 MB	https://docs.qiime2.org/2022.8/data-resources/#taxonomy-classifiers-for-use-with-q2-feature-classifier%20silva:%20https://www.arb-silva.de/download/archive/qiime%20silva%20104%20silva%20108%20silva%20111%20silva%20119%20silva%20123%20silva%20128%20silva%20132	The SSU Ref NR 99 138.1 dataset is based on the full SSU Ref 138.1 dataset , in total encompassing 510,508 sequences. By applying a 99% identity criterion to remove highly similar sequences using the open external link in new window vsearch tool with a custom sequence order first based on presence in the last release's Ref NR 99 and second based on combination of sequence length (weighted twofold) and quality. For the sorting, the quality of a sequence is determined by ambiguities (50%), overall alignment quality (45%), and homopolymers (5%).
silva_132_90_16S_taxonomy.qza	Silva_132_release.zip	856.1 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_90_18S_taxonomy.qza	Silva_132_release.zip	286.9 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_94_18S_taxonomy.qza	Silva_132_release.zip	286.9 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_97_16S_taxonomy.qza	Silva_132_release.zip	4.1 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_97_18S_taxonomy.qza	Silva_132_release.zip	848.6 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_99_16S_taxonomy.qza	Silva_132_release.zip	8.6 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
silva_132_99_18S_taxonomy.qza	Silva_132_release.zip	1.6 MB	https://www.arb-silva.de/download/archive/qiime	It is a single sequence. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.

For kraken2 database, we have the following database types:

Name database	File	size	Link download	Description
16S_Greengenes13.5_20200326.tgz	Greengenes 13.5.tar.gz	73.2 MB	https://benlangmead.github.io/aws-indexes/k2	It is a single Kraken 2 database. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
16S_RDP11.5_20200326.tgz	16S_RDP11.5_20200326.tgz	167.9 MB	https://benlangmead.github.io/aws-indexes/k2	All packages contain a Kraken 2 database along with Bracken databases built for 100mers, 150mers, and 200mers. Kraken 2 is a fast and memory efficient tool for taxonomic assignment of metagenomics sequencing reads. Bracken is a related tool that additionally estimates
16S_Silva132_20200326.tgz	Silva_132_release.zip	116.9 MB	https://www.arb-silva.de/download/archive/qiime	It is a single Kraken 2 database. SILVA is the database for both small subunit (SSU; 16S/18S) and large subunit (LSU; 23S/28S) ribosomal RNA (rRNA) sequences. When preparing the database compatible with SILVA 132 QIIME, full-length 16S and 18S rRNA sequences - each labeled as belonging to a specific taxonomic unit - were downloaded from SILVA.
16S_Silva138_20200326.tgz	Greengenes 13_8 SEPP reference database	112.5 MB	https://docs.qiime2.org/2022.8/data-resources/#taxonomy-classifiers-for-use-with-q2-feature-classifier%20silva:%20https://www.arb-silva.de/download/archive/qiime%20silva%20104%20silva%20108%20silva%20111%20silva%20119%20silva%20123%20silva%20128%20silva%20132	The SSU Ref NR 99 138.1 dataset is based on the full SSU Ref 138.1 dataset , in total encompassing 510,508 sequences. By applying a 99% identity criterion to remove highly similar sequences using the open external link in new window vsearch tool with a custom sequence order first based on presence in the last release's Ref NR 99 and second based on combination of sequence length (weighted twofold) and quality. For the sorting, the quality of a sequence is determined by ambiguities (50%), overall alignment quality (45%), and homopolymers (5%).