Install¶
1a. Create conda environment¶
You need to install anaconda or miniconda. We recommend you to create a conda environment:
conda create -n atlasenv
conda activate atlasenv
Then install metagenome-atlas:
conda install -y -c bioconda -c conda-forge metagenome-atlas
1b. Install the development version from GitHub¶
Atlas is still under active development, therefore you may want to install the up to date atlas from GitHub.
Create an conda environment with all primary dependencies. All further dependencies are installed on the fly:
conda create -n atlasenv -c bioconda -c conda-forge python>=3.6 snakemake pandas bbmap=37.78 click=7 ruamel.yaml biopython
Load the environment:
source activate atlasenv
copy code from GitHub and install:
git clone https://github.com/metagenome-atlas/atlas.git
cd atlas
pip install --editable .
Now you should be able to run atlas:
atlas init --db-dir databases path/to/fastq/files
atlas run
2. Download all databases first¶
May be you want to make sure that all databases are downloaded correctly. Simply run:
atlas download --db-dir path/to/databases
To reassure you, most of the databases are md5 checked. The downloads use approximately 30 GB of disk space.
Usage¶
Now let’s apply atlas on your data.
atlas init¶
Usage: atlas init [OPTIONS] PATH_TO_FASTQ
Write the file CONFIG and complete the sample names and paths for all
FASTQ files in PATH.
PATH is traversed recursively and adds any file with '.fastq' or '.fq' in
the file name with the file name minus extension as the sample ID.
Options:
-d, --db-dir PATH location to store databases (need ~50GB)
[default: /Users/silas/Documents/Debug_atlas
/databases]
-w, --working-dir PATH location to run atlas
--assembler [megahit|spades] assembler [default: megahit]
--data-type [metagenome|metatranscriptome]
sample data type [default: metagenome]
--threads INTEGER number of threads to use per multi-threaded
job
-h, --help Show this message and exit.
This command creates a samples.tsv
and a config.yaml
in the working directory.
Have a look at them with a normal text editor and check if the samples names are inferred correctly. Samples should be alphanumeric names and cam be dash delimited. Underscores should be fine too.
See the example sample table
The BinGroup
parameter is used during the genomic binning.
In short: all samples in which you expect the same strain to
be found should belong to the same group,
e.g. all metagenome samples from mice in the same cage.
You should also check the config.yaml
file, especially:
- You may want to change the resources configuration, depending on the system you run atlas on.
- You may want to add ad host genomes to be removed.
Details about the parameters can be found in the section Configure Atlas
atlas run¶
Usage: atlas run [OPTIONS] [[qc|assembly|genomes|genecatalog|None|all]]
[SNAKEMAKE_ARGS]...
Runs the ATLAS pipline
By default all steps are executed but a sub-workflow can be specified.
Needs a config-file and expects to find a sample table in the working-
directory. Both can be generated with 'atlas init'
Most snakemake arguments can be appended to the command for more info see
'snakemake --help'
For more details, see: https://metagenome-atlas.readthedocs.io
Options:
-w, --working-dir PATH location to run atlas.
-c, --config-file PATH config-file generated with 'atlas init'
-j, --jobs INTEGER use at most this many jobs in parallel (see cluster
submission for mor details). [default: 8]
--no-conda do not use conda environments. good luck! [default:
False]
-n, --dryrun Test execution. [default: False]
-h, --help Show this message and exit.
atlas run
need to know the working directory with a samples.tsv
inside it.
Take note of the --dryrun
parameter, see the section Snakemake for other handy snakemake arguments.
If you want to run atlas on a cluster system you want to read the section :ref:`execution_system`_.