flowcraft.templates.assembly_report module¶
Purpose¶
This module is intended to provide a summary report for a given assembly in Fasta format.
Expected input¶
The following variables are expected whether using NextFlow or the
main()
executor.
sample_id
: Sample Identification string.- e.g.:
'SampleA'
- e.g.:
assembly
: Path to assembly file in Fasta format.- e.g.:
'assembly.fasta'
- e.g.:
Generated output¶
${sample_id}_assembly_report.csv
: CSV with summary information of the assembly.- e.g.:
'SampleA_assembly_report.csv'
- e.g.:
Code documentation¶
-
class
flowcraft.templates.assembly_report.
Assembly
(assembly_file, sample_id)[source]¶ Class that parses and filters an assembly file in Fasta format.
This class parses an assembly file, collects a number of summary statistics and metadata from the contigs and reports.
Parameters: - assembly_file : str
Path to assembly file.
- sample_id : str
Name of the sample for the current assembly.
Methods
get_coverage_sliding
(self, coverage_file[, …])Parameters: get_gc_sliding
(self[, window])Calculates a sliding window of the GC content for the assembly get_summary_stats
(self[, output_csv])Generates a CSV report with summary statistics about the assembly -
summary_info
= None¶ OrderedDict: Initialize summary information dictionary. Contains keys:
ncontigs
: Number of contigsavg_contig_size
: Average size of contigsn50
: N50 metrictotal_len
: Total assembly lengthavg_gc
: Average GC proportionmissing_data
: Count of missing data characters
-
contigs
= None¶ OrderedDict: Object that maps the contig headers to the corresponding sequence
-
contig_coverage
= None¶ OrderedDict: Object that maps the contig headers to the corresponding list of per-base coverage
-
sample
= None¶ str: Sample id
-
contig_boundaries
= None¶ dict: Maps the boundaries of each contig in the genome
-
get_summary_stats
(self, output_csv=None)[source]¶ Generates a CSV report with summary statistics about the assembly
The calculated statistics are:
- Number of contigs
- Average contig size
- N50
- Total assembly length
- Average GC content
- Amount of missing data
Parameters: - output_csv: str
Name of the output CSV file.