Skip to main content

DRAM

DRAM (Distilled and Refined Annotation of Metabolism) itself operates in two main stages, but it can also be integrated into larger pipelines for microbiome analysis. Here's a breakdown of DRAM's workflow:

Stage 1: Annotation

  1. Input: DRAM takes metagenomic assembled genomes (MAGs) and optionally VirSorter identified viral contigs as input.
  2. Database Search: DRAM searches against various biological databases to annotate genes within the MAGs and viral contigs. Some common databases used include:
    • KEGG (Kyoto Encyclopedia of Genes and Genomes) (if provided by the user)
    • UniRef90 (a protein sequence database)
    • PFAM (protein family database)
    • dbCAN (Carbohydrate-Active Enzymes database)
    • RefSeq viral database
    • VOGDB (Virus Orthologous Groups database)
    • MEROPS peptidase database
    • User-defined databases (optional)
  3. Gene Prediction: DRAM can also predict genes within the contigs if they haven't been previously identified.

Stage 2: Distillation

  1. Refining Annotations: DRAM refines the initial annotations by analyzing gene co-occurrence patterns and metabolic pathway completeness.
  2. Output Generation: DRAM generates two main outputs:
    • Distillate: A table summarizing information for each contig, including identified metabolic pathways and potential functions.
    • Product (DRAM-v only): For viral contigs (analyzed with DRAM-v), a summary of potential Auxiliary Metabolic Genes (AMGs) identified.

Integration into Pipelines

DRAM can be used as part of a larger microbiome analysis pipeline. Here's a possible workflow:

  1. Assembly: Raw sequencing reads are assembled into contigs, potentially using tools like MEGAHIT or SPAdes.
  2. Binning: MAGs are recovered from the assembled contigs using tools like MetaBAT or CONCOct.
  3. Viral Contig Identification: Tools like VirSorter can be used to identify viral contigs within the assembled data.
  4. DRAM Analysis: DRAM is applied to the MAGs and viral contigs to annotate genes and predict metabolic functions.
  5. Downstream Analysis: The results from DRAM (distillates and product) can be used for further analysis, such as comparing metabolic capabilities across different samples or exploring the functional potential of the microbial community.

This is a simplified example, and the specific pipeline will depend on the research goals and the chosen tools for each step.

System Requirements

DRAM has a large memory burden and is designed to be run on high performance computers. DRAM annotates against a large variety of databases which must be processed and stored. Setting up DRAM with KEGG Genes and UniRef90 will take up ~500 GB of storage after processing and require ~512 GB of RAM while using KOfam and skipping UniRef90 will mean all processed databases will take up ~30 GB of disk and will only use ~128 GB of RAM while processing. DRAM annotation memory usage depends on the databases used. When annotating with UniRef90, around 220 GB of RAM is required. If the KEGG gene database has been provided and UniRef90 is not used, then memory usage is around 100 GB of RAM. If KOfam is used to annotate KEGG and UniRef90 is not used, then less than 50 GB of RAM is required. DRAM can be run with any number of processors on a single node.