flowcraft.templates.megahit module

Purpose

This module is intended execute megahit on paired-end FastQ files.

Expected input

The following variables are expected whether using NextFlow or the main() executor.

  • sample_id : Sample Identification string.
    • e.g.: 'SampleA'
  • fastq_pair : Pair of FastQ file paths.
    • e.g.: 'SampleA_1.fastq.gz SampleA_2.fastq.gz'
  • kmers : Setting for megahit kmers. Can be either 'auto', 'default' or a user provided list. All must be odd, in the range 15-255, increment <= 28
    • e.g.: 'auto' or 'default' or '55 77 99 113 127'
  • clear : If ‘true’, remove the input fastq files at the end of the

    component run, IF THE FILES ARE IN THE WORK DIRECTORY

Generated output

  • contigs.fa : Main output of megahit with the assembly
    • e.g.: contigs.fa
  • megahit_status : Stores the status of the megahit run. If it was successfully executed, it stores 'pass'. Otherwise, it stores the STDERR message.
    • e.g.: 'pass'

Code documentation

flowcraft.templates.megahit.is_odd(k_mer)[source]
flowcraft.templates.megahit.set_kmers(kmer_opt, max_read_len)[source]

Returns a kmer list based on the provided kmer option and max read len.

Parameters:
kmer_opt : str

The k-mer option. Can be either 'auto', 'default' or a sequence of space separated integers, '23, 45, 67'.

max_read_len : int

The maximum read length of the current sample.

Returns:
kmers : list

List of k-mer values that will be provided to megahit.

flowcraft.templates.megahit.fix_contig_names(asseembly_path)[source]

Removes whitespace from the assembly contig names

Parameters:
asseembly_path : path to assembly file
Returns:
str:

Path to new assembly file with fixed contig names

flowcraft.templates.megahit.clean_up(fastq)[source]

Cleans the temporary fastq files. If they are symlinks, the link source is removed

Parameters:
fastq : list

List of fastq files.