Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF CDS_position parsing #271

Open
jberg1999 opened this issue Feb 18, 2025 · 0 comments
Open

VCF CDS_position parsing #271

jberg1999 opened this issue Feb 18, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@jberg1999
Copy link

Description of the bug

Hi nf-core team,

I am having issues with running my vcf files through the epitiope prediction pipeline. Specifically I am using VEP annotated vcf files from sarek, and have noticed that the pipeline errors out when the CDS_position from the CSQ field in my vcf is irregular. This has happened when the CDS_position contains a ? and when it is empty. Based on the error log below, I believe that the offending line is line 283 of epaa.py

tpos = int(cds_pos.split("/")[0].split("-")[0]) - 1

This seems to fail when int() can't convert cds_pos.

Thanks for the help!

Command used and terminal output

CONFIG=/michorlab/jacobg/nf-core/nextflow.config
SAMPLESHEET=samplesheet_no_bad_samples.csv
OUTDIR=/michorlab/jacobg/multiomics/epitope/results
GENOME=grch38
TOOLS=netmhcpan-4.1
MINPEPTIDELENGTH=8
MAXPEPTIDELENGTH=11
NETMHCPATH=/michorlab/jacobg/multiomics/epitope/netMHCpan-4.1b.Linux.tar.gz
VERSION=2.3.1

nextflow run nf-core/epitopeprediction -profile singularity -c $CONFIG --input $SAMPLESHEET --outdir $OUTDIR --genome_reference $GENOME --tools $TOOLS  --min_peptide_length $MINPEPTIDELENGTH  --max_peptide_length $MAXPEPTIDELENGTH --netmhcpan_path $NETMHCPATH --fasta_output -r $VERSION 



Error executing process > 'NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR (26)'

Caused by:
  Process `NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR (26)` terminated with an error exit status (1)


Command executed:

  # create folder for MHCflurry downloads to avoid permission problems when running pipeline with docker profile and mhcflurry selected
  mkdir -p mhcflurry-data
  export MHCFLURRY_DATA_DIR=./mhcflurry-data
  # specify MHCflurry release for which to download models, need to be updated here as well when MHCflurry will be updated
  export MHCFLURRY_DOWNLOADS_CURRENT_RELEASE=1.4.0
  # Add non-free software to the PATH
  shopt -s nullglob
  IFS=',' read -r -a netmhc_paths_string <<< "/michorlab/jacobg/multiomics/epitope/work/99/48181a933e1ac9db9b830c9f09023e/netmhcpan"
  for p in "${netmhc_paths_string[@]}"; do
          export PATH="$(realpath -s "$p"):$PATH";
      done
  shopt -u nullglob
  
  epaa.py --identifier DDC_vs_DDWB.mutect2.filtered_VEP.ann.chr1         --alleles 'A*03:01;A*24:02;B*07:02;B*14:02;C*08:02;C*07:02'         --tools 'netmhcpan-4.1'         --max_length 11         --min_length 8         --versions versions.csv         --fasta_output --genome_reference 'https://www.ensembl.org' --somatic_mutation DDC_vs_DDWB.mutect2.filtered_VEP.ann.chr1.vcf
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_EPITOPEPREDICTION:EPITOPEPREDICTION:EPYTOPE_PEPTIDE_PREDICTION_VAR":
      python: $(python --version 2>&1 | sed 's/Python //g')
      epytope: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('epytope').version)")
      pandas: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
      pyvcf: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('PyVCF3').version)")
      mhcflurry: $(mhcflurry-predict --version 2>&1 | sed 's/^mhcflurry //; s/ .*$//')
      mhcnuggets: $(python -c "import pkg_resources; print(pkg_resources.get_distribution('mhcnuggets').version)")
  END_VERSIONS

Command exit status:
  1

Command output:
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,950 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,951 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,952 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PS not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PID not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDC. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry PL not defined for DDT_DDWB. Skipping.
  2025-02-16 15:15:42,953 - __main__ - WARNING - FORMAT entry GQ not defined for DDT_DDWB. Skipping.

Command error:
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PGT not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PS not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PID not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDC. Skipping.
  WARNING:__main__:FORMAT entry PL not defined for DDT_DDWB. Skipping.
  WARNING:__main__:FORMAT entry GQ not defined for DDT_DDWB. Skipping.
  Traceback (most recent call last):
    File "/homes9/jacobg/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 1310, in 
      __main__()
    File "/homes9/jacobg/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 1059, in __main__
      variant_list, transcripts, metadata = read_vcf(args.somatic_mutations)
    File "/homes9/jacobg/.nextflow/assets/nf-core/epitopeprediction/bin/epaa.py", line 283, in read_vcf
      tpos = int(cds_pos.split("/")[0].split("-")[0]) - 1
  ValueError: invalid literal for int() with base 10: ''

Work dir:
  /michorlab/jacobg/multiomics/epitope/work/a5/38e34c091b21a3a07a91b6bd59e7e7

Relevant files

nextflow.log

I can't attach the full vcf for data privacy reason so I am sending the header.

vcf_header.txt

System information

nextflow version: version 24.04.4
HPC
slurm executor
singularity
OS: CentOS Linux
nf-core/epitopeprediction 2.3.1

@jberg1999 jberg1999 added the bug Something isn't working label Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant