Michael G. Campana & Ellie E. Armstrong, 2019-2024
Smithsonian Institution
Stanford University
- calc_denovo_mutation_rate
- configure.sh
- denovolib
- dnm_summary_stats
- filterGM
- indels2bed
- kochDNp
- logstats
- nextflow_split
- nextflow.config
- plotDPGQ
- ratestools
- RM2bed
- simplify_bed
- simplify_sorted_bed
- summarize_denovo
- Deprecated
Fixed minor spacing bug in output of command line options
Add DNM_GT tag for genotypes used after minAF and minAD1 processing
Fixed bug that doubled double-forward mutations in the rate estimates
Fixed bugs in calculating the bootstrapped 95% confidence interval
Fixed missing bootstrap parameter in command-line output in log
Update file headers
Handling for selfing parent
Applying both --minAD1 and --minAF no longer defaults to --minAD1. --minAD1 is applied to parents, --minAF to offspring.
Fixed GATK4 total AD = 0 bug
More accurate minAD1 help description
Fixed snp_record (line 239) glitch
Fixed missing kochDNp require_relative statement
Fixed cutoff instead of $options.cutoff bug
Fixed bug that called whole offspring_pl hash rather than individual PL array
Added handling for Koch DNp statistic filter
Generalized ad_exit for all filtering errors (method filter_exit) and moved to denovolib
Fixed double-forward mutation count bug
Improved gz_file_open using yield
Commented code
conf95 uses ".." to separate CI values to permit negative exponents
--minAF filter
--minAD1 and --parhom now shown in interpreted command output
--minAD1 regenotyping now applies to offspring too
--parhom flag fixed
Fixed minAD1 option
Alleles no longer includes "." as valid allele
Added minAD1 and parhom options
Corrected site denominator for diploidy in bootstrap_results
print_results moved to denovolib
Redundant methods (Parser, gz_open_file) moved to denovolib
Prints progress updates to STDERR
Can specify minimum bootstrapping windows to perform analysis
Can read gzipped input VCFs (Imported from BaitsTools 1.6.5)
Outputs VCF lines of identified de novo mutations
Random number seed setting
Interpreted options printed at start-up
Bootstrapping
Initial script to calculate point estimate de novo mutation rate
Updated ReadPosRankSum default filter
Removal of redundant dnm_bootstraps parameter
Updates for revised dnm_summary_stats script
trio-phasing configuration
Updated for RatesTools v. 1.0.0
Update file headers
Contig length filtration
Conda installation handling
$launchDir instead of $baseDir
Removed thread parameters
Fixed bug that doubled double-forward mutations in the rate estimates
Added binomial confidence intervals using Hmisc
Update file headers
split_vcf now uses bgzip rather than gzip for GATK site filtering
filter_exit returns exit status 1
Added handling for Koch DNp statistic filter
Generalized ad_exit from calc_denovo_mutation_rate as denovolib method filter_exit
Removed parsing options for deprecated parallel_denovo
Fixed double-forward mutation count bug
split_vcf retains complete VCF header for bedtools intersect
Improved gz_file_open using yield
Commented code
Fixed missing ')' in help for -q option
Fixed 'homozogyous' typo in --parhom option description
split_vcf moved to denovolib
--minAF filter
Site denominator bug fix
Added minAD1 and parhom options
Corrected site denominator for diploidy in print_results
Option '--nosubmit' splits VCFs and writes jobs, but does not submit them
Option '--submit' submits previously generated jobs and split VCFs
Method 'format_splash' simplifies basic script method output
print_results moved to denovolib
Redundant methods (Parser, gz_open_file) moved to denovolib
gz_open_file improved to version in BaitsTools 1.6.6
Removal of redundant dnm_bootstraps parameter
Uses DNM_GT tag to correctly count number of sites after filtering
Correctly counts double-forward sites as two mutations
Calculates binomial confidence intervals using R package Hmisc
Bug fixes in number of all-sites removed
Outputs a TSV list of sites retained to standard error for VCF processing
Outputs filtered mutational spectra
Fixes bug that incorrectly calculated adjusted standard errors
Automatically removes clumped DNM candidates
Recalculates adjusted confidence intervals
Updated for ratestools 1.0.0
Added handling for more mutation classes
Added code to identify shared mutations between offspring and recalculate mutation rates
Update file headers
First script to calculate summary statistics
Update file headers
Improved gz_file_open using yield
Code commented
Can read gzipped files
Optional "exclude" parameter produces inverse BED file for exclusion
Uses denovolib 'format_splash' method
Script capitalization consistency correction
Filters GenMap output by specified mappabilities
Update file headers
Improved gz_file_open using yield
Code commented
Can read gzipped input VCFs
Identifies deletions coded by * character
Identifies indels from allele lengths without need for external previous filter
Uses denovolib 'format_splash' method
Outputs headerless BED rather than incorrect BED header
Output BED header line starts with # as appropriate
Initial script to convert VCF of indels into BED file for variant site exclusion
Update file headers
Fixed bug that ran command-line parser when run from calc_denovo_mutation_rate
Calculates and filters by Koch DNp statistic
Update file headers
Fixed bug where $options.writecycles was not defined
Prints help when no options/-h
Code commented
Requires denovolib rather than split_vcf
Accessor script for split_vcf for nextflow
Added error handling when the run time line is not recorded
Fixed bug that discarded the unfiltered site total for contigs where all sites were removed
Update file headers
Contig length filtration
Improved sanity checks
Script to correct vcftools site overflow glitch using bcftools stats
Added BCFtools to conda configuration
Updated ReadPosRankSum default filter
Updated R channel to conda-forge
Removal of redundant dnm_bootstraps parameter
Update conda dependencies for generateSummaryStatsm, calcDNMRate, summarizeDNM
DNM clump option
phaseTrio process configuration
Updated for RatesTools v. 1.0.0
Set repeatModeler task.cpus = 2 for standard and conda profiles
Added conda.enabled = true
Update file headers
Contig length filtration
Conda profile and handling
Removed 'hydra' example profile
Removed storeDir and maxForks from local profile
Removal of thread parameters
$launchDir instead of $baseDir
Updated summarizeDNM modules for VCF output
process sanityCheckLogs for logstats.sh
Modules updated for logstats.sh
Fixed local profile RepeatModeler -> repeatModeler
Closures rather that objects for Nextflow 22.04.04 compatibility
GATK4 compatibility
Email errors/completion status
Variable indel padding
Split site_filters into vcftools_site_filters and gatk_site_filters
Updated process list in both profiles for new processes
Fixed some missing processes from v0.3.0 in default profile
Fixed RepeatModeler additional thread glitch
Added module options for bedtools, bcftools, bgzip, tabix
Updated process defaults
Update file headers
Fixed bug that misread sample names
Corrected output so that all PNGs properly named
Fixed table coding so that all individual samples calculated in CSVs
First version integrated into pipeline
Initial script (plotDPGQ-clean.R)
Fixed region filter bypass bgzip bug
Removal of redundant dnm_bootstraps parameter
Fixed header bug on candidates VCF
Outputs clump- and sibling-filtered candidates VCF
Automatic DNM clump removal
Automatic recalculation of confidence intervals
Output summary stats includes file prefix
Haplotype phasing using WhatsHap
Fixed bug in repeatMaskRM that did not retain new masked files
Upgraded to nextflow DSL2
Readgroups added and alignment sorted during alignment process
Input reads specified via a CSV
Automated sample library merging
Added samtools markdup as a duplicate option
Bumped pipeline software versions
Added GenMap mapping option parameter
Added RepeatMasker and RepeatModeler options parameters
BED simplification now uses BEDtools
simplify_bed and simplify_sorted_bed removed
Added option to ignore region filtration
errorStrategy calls moved to configuration file for easier manipulation
Added option to bypass BAM filtering
Conda installs now use Mamba by default
RepeatModeler process updated for RepeatMasker 2.0.5
Fixed bug in GATK4 pipeline that did not apply GATK site filters appropriately
Added handling for RepeatModeler non-even positive thread requests
Added tbl file of first round of RepeatMasker to retained output
Fixed RM2bed concatenation glitch in repeatMaskRM
Reduced hard-disk footprint of gatkFilterSites temporary files
RepeatMasker soft-masks rather than hard-masks
Fixed glitch in summarizeDNM when no candidate sites found
Update file headers
Contig length filtration
Handling for selfing parent in calc_denovo_mutation_rate.rb
Conda handling
Use touch rather than mkfile for RepeatModeler dummy file
DSL1 specified
Used task.cpus rather than thread parameters
Fixed bug in region filtration when empty region bed
bedtools subtract rather than interact for second attempt
summarizeDNM outputs VCF of candidate DNMs
Fixed logging glitches that copied VCF output into downstream results
Fixed glitch in BCFtools region-filtering that removed sites unexpectedly
bedtools subtract rather than intersect -v
Fixed glitch where bedtools error was not detected and region-filtered VCFs were truncated
process sanityCheckLogs for logstats.sh
Site count logging uses logstats.sh to address VCFtools overflow
GATK4 compatibility
Improved use of symbolic links/writing to disk
Completion messages and emailing of completion status/errors
Variable indel padding
Use path rather than file qualifiers
Site count logging and summary statistics
Updated plotDPGQ output PNGs
split filterSites in vcftoolsFilterSites and gatkFilterSites
Added 'NULL' option to site filtrations to bypass unnecessary processing
Divided filterChr into filterChr and splitTrios
filterChr (new version) now only does chromosome filtering
splitTrios does trio splitting after chromosome filtration
Fixed pullDPGQ input so that all individuals calculated from chr-filtered VCFs
Numbered output directories
Region filtration now uses BEDTools intersect first, then BCFtools, then VCFtools
Updated some steps to use bgzip when needed by BCFtools
Fixed RepeatModeler additional thread glitch
Added pullDPGQ process
Reordered vcf filtration so that split_vcf occurs first to maximize parallelization
Fixed major bug in fixReadGroups that assigned read groups to the incorrect BAM file
Alignments all use refseq simpleName rather than baseName for consistency of file naming
Removed duplicate BuildBamIndex step in callVariants
Nextflow pipeline for de novo mutation discovery. Previously panthera_dnm.nf during construction
Update file headers
Improved gz_file_open using yield
Code commented
Can read gzipped files
Uses denovolib 'format_splash' method
Outputs headerless BED rather than incorrect BED header
Output BED header line starts with # as appropriate
Outputs usage if no parameters specified
Initial script to convert RepeatMasker output to BED format
Update file headers
Code commented
Site denominator returned to previous value as bootstrapped estimates already multiplied by two
Resolved issue with mean_ci confidence interval splits using - as separator
Corrected site denominator for diploidy in mean_ci
Retained SNP collation bug fixed
Fixed collect_offspring bug
Uses denovolib 'format_splash' method
Redundant print_summary code moved to denovolib print_results
Unbootstrapped contigs are now excluded from calculations of bootstrap sequence lengths
Concatenates headerless VCF results of identified de novo mutations
Initial script to summarize parallelized de novo rate calculations
Removed in RatesTools version 1.0.0
Update file headers
Improved gz_file_open using yield
Complete rewrite of sorting/merging algorithm to use intervals (~8x faster)
Status output to stderr
Redundant sites not added to array to hopefully improve performance
Code commented
Initial script to combine overlapping BED entries for region inclusion/exclusion
Removed in RatesTools version 1.0.0
Update file headers
Improved gz_file_open using yield
Script to merge adjacent BED regions in a coordinate-sorted BED file to reduce simplify_bed memory use and compute time
Removed in RatesTools version 0.5.7
Updated makefile for Linux and macOS
Removed in RatesTools pipeline version 0.3
Code commented
split_vcf.rb no longer required
split_vcf moved to split_vcf.rb for separate access by nextflow
--minAF filter
--parhom and --minAD1 options added to write_qsub
'write_qsub' calc_denovo_mutation_rate.job file now gzips analyzed split VCFs
Option '--nosubmit' splits VCFs and writes jobs, but does not submit them
Option '--submit' submits previously generated jobs and split VCFs
Redundant methods (Parser, gz_open_file) moved to denovolib
summarize_denovo automatically called upon completion of all split jobs on cluster
Template qsub written once and arguments passed to it
Can restart processing from a previously split VCF
Can specify the number of variants to read while splitting before writing to disk
Can specify minimum bootstrapping windows to perform analysis
Can read gzipped input VCFs (Imported from BaitsTools 1.6.5)
Initial script to parallelize de novo mutation rate calcuations on Grid Engine HPC
Now option to gzip output for file saving
split_vcf moved to split_vcf.rb for separate access by nextflow