This repo shares scripts used to analyze whole-genome sequences of elms.
SNP-density plot for elm samples using CMplot.
For the details about the project, please see the project page at the Center for Forest Protection.
- Check the checksum of downloaded files using ‘md5sum’ or similar
- Run FASTQC on raw files for quality control
- Trim the raw files using Trimmomatic, FASTP or Cutadapt.
- Run FASTQC on trimmed files.
- Run MultiQC to aggregate FASTQC results into a single HTML file (raw and trimmed separately) for better visualization. Check the “Sequence GC content”, “Overrepresented sequences”, “Adapter Content”, etc. Exclude erroneous sample(s).
- Map the raw reads against the reference genome, Ulmus americana GCA_010015005.3 using BWA version 2 ‘mem’ algorithm.