Components¶

These are the currently available FlowCraft components with a short description of their tasks. For a more detailed information, follow the links of each component.

Download¶

reads_download: Downloads reads from the SRA/ENA public databases from a list of accessions.
fasterq_dump: Downloads reads from the SRA public databases from a list of accessions, using fasterq-dump.

Reads Quality Control¶

check_coverage: Estimates the coverage for each sample and filters FastQ files according to a specified minimum coverage threshold.
fastqc: Runs FastQC on paired-end FastQ files.
fastqc_trimmomatic: Runs Trimmomatic on paired-end FastQ files informed by the FastQC report.
filter_poly: Runs PrinSeq on paired-end FastQ files to remove low complexity sequences.
integrity_coverage: Tests the integrity of the provided FastQ files, provides the option to filter FastQ files based on the expected assembly coverage and provides information about the maximum read length and sequence encoding.
trimmomatic: Runs Trimmomatic on paired-end FastQ files.
downsample_fastq: Subsamples fastq files up to a target coverage depth.

Assembly¶

megahit: Assembles metagenomic paired-end FastQ files using megahit.
metaspades: Assembles metagenomic paired-end FastQ files using metaSPAdes.
skesa: Assembles paired-end FastQ files using skesa.
spades: Assembles paired-end FastQ files using SPAdes.

Post-assembly¶

pilon: Corrects and filters assemblies using Pilon.
process_skesa: Processes the assembly output from Skesa and performs filtering base on quality criteria of GC content k-mer coverage and read length.
process_spades: Processes the assembly output from Spades and performs filtering base on quality criteria of GC content k-mer coverage and read length.

Binning¶

maxbin2: An automatic tool for binning metagenomic sequences

Annotation¶

abricate: Performs anti-microbial gene screening using abricate.
card_rgi: Performs anti-microbial resistance gene screening using CARD rgi (with contigs as input).
prokka: Performs assembly annotation using prokka.

Distance Estimation¶

mash_dist: Executes mash distance against a reference index plasmid database and generates a JSON for pATLAS. This component calculates pairwise distances between sequences (one from the database and the query sequence). However if a different database is provided it can use mash dist for other purposes.
mash_screen: Performs mash screen against a reference index plasmid database and generates a JSON input file for pATLAS. This component searches for containment of a given sequence in read sequencing data. However if a different database is provided it can use mash screen for other purposes.
fast_ani: Performs pairwise comparisons between fastas,

given a multifasta as input for fastANI. It will split the multifasta into single fastas that will then be provided as a matrix. The output will be the all pairwise comparisons that pass the minimum of 50 aligned sequences with a default length of 200 bp.

mash_sketch_fasta: Performs mash sketch for fasta files.
mash_sketch_fastq: Performes mash sketch for fastq files.

Mapping¶

assembly_mapping: Performs a mapping procedure of FastQ files into a their assembly and performs filtering based on quality criteria of read coverage and genome size.
bowtie: Align short paired-end sequencing reads to long reference sequences
mapping_patlas: Performs read mapping and generates a JSON input file for pATLAS.
remove_host: Performs read mapping with bowtie2 against the target host genome (default hg19) and removes the mapping reads
retrieve_mapped: Retrieves the mapped reads of a previous bowtie2 mapping process.

Taxonomic Profiling¶

kraken: Performs taxonomic identification with kraken on FastQ files (minikrakenDB2017 as default database)
kraken2: Performs taxonomic identification with kraken2 on FastQ files (minikraken2_v1_8GB as default database)
midas_species: Performs taxonomic identification on FastQ files at the species level with midas (requires database)

Typing¶

chewbbaca: Performs a core-genome/whole-genome Multilocus Sequence Typing analysis on an assembly using ChewBBACA.
metamlst: Checks the Sequence Type of metagenomic reads using Multilocus Sequence Typing.
mlst: Checks the Sequence Type of an assembly using Multilocus Sequence Typing.
patho_typing: In silico pathogenic typing from raw illumina reads.
seq_typing: Determines the type of a given sample from a set of reference sequences.
sistr: Serovar predictions from whole-genome sequence assemblies by determination of antigen gene and cgMLST gene alleles.
momps: Multi-locus sequence typing for Legionella pneumophila from assemblies and reads.