atlas run qc #or atlas run all
Runs quality control of single or paired end reads and summarizes the main QC stats in reports/QC_report.html.
Per sample it generates:
- Various quality stats in
When the input was paired end, we will put out three the reads in three fractions R1,R2 and se The se are the paired end reads which lost their mate during the filtering. The se are seamlessly integrated in the next steps.
atlas run assembly #or atlas run all
Besides the reports/assembly_report.html this rule outputs the following files per sample:
atlas run binning #or atlas run all
When you use different binners (e.g. metabat, maxbin) and a binner-reconciliator (e.g. DAS Tool), then Atlas will produce for each binner and sample:
which shows the attribution of contigs to bins. For the final_binner it produces the
See an example as a summary of the quality of all bins.
atlas run genomes #or atlas run all
As the binning can predict several times the same genome it is recommended to de-replicate these genomes. For now we use DeRep to filter and de-replicate the genomes. The Metagenome assembled genomes are then renamed, but we keep mapping files.
The fasta sequence of the dereplicated and renamed genomes can be found in
and their quality estimation are in
The quantification of the genomes can be found in:
See in Atlas example how to analyze these abundances.
The predicted genes and translated protein sequences are in
annotations: - gtdb_tree - gtdb_taxonomy - checkm_tree - checkm_taxonomy
Different annotations can be turned on and off in the config file under the heading
A taxonomy for the dereplicated genomes is proposed GTDB.
The results can be found in
The genomes are placed in a phylogenetic tree separately for bacteria and archaea (if there are any) using the GTDB markers.
In addition a tree for bacteria and archaea can be generated based on the checkm markers.
All trees are properly rooted using the midpoint. The files can be found in
atlas run all # or atlas run genecatalog
The gene catalog takes either genes predicted from the genomes or all genes predicted on the contigs and clusters them according to the configuration. This rule produces the following output file for the whole dataset.
atlas run all