Skip to content

Use ClinVar submissions file (3/N)#277

Open
pj-sullivan wants to merge 14 commits intomainfrom
pj-sullivan/filter-clinvar
Open

Use ClinVar submissions file (3/N)#277
pj-sullivan wants to merge 14 commits intomainfrom
pj-sullivan/filter-clinvar

Conversation

@pj-sullivan
Copy link
Copy Markdown
Collaborator

@pj-sullivan pj-sullivan commented Jan 30, 2026

Purpose/implementation Section

What feature is being added or bug is being addressed?

Replacing the input ClinVar file with the ClinVar submissions file that has been filtered and cleaned in a prior step.

What was your approach?

Combined the cavatica and custom input scripts into one as the differences between them no longer exist.

What GitHub issue does your pull request address?

Closes #275

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Check the output for any new columns we don't want, or missing columns. Currently aware the ClinVar star value is missing as it is not supplied in the submissions file, but I can add that into the select submissions script.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Documentation Checklist

  • The function has examples to showcase the usage
  • Added a vignette

@pj-sullivan pj-sullivan requested a review from rjcorb January 30, 2026 21:28
@pj-sullivan pj-sullivan self-assigned this Jan 30, 2026
@pj-sullivan pj-sullivan changed the base branch from main to pj-sullivan/select-submissions January 30, 2026 21:29
@pj-sullivan pj-sullivan changed the title Use ClinVar submissions file Use ClinVar submissions file (3/N) Jan 30, 2026
Base automatically changed from pj-sullivan/select-submissions to main February 2, 2026 15:36
@pj-sullivan pj-sullivan marked this pull request as ready for review February 2, 2026 20:17
@jharenza jharenza requested a review from rebkau February 5, 2026 18:06
Copy link
Copy Markdown
Collaborator

@rjcorb rjcorb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a few test runs on PBTA samples and only noticed minor discrepancies between variant calls that should be resolved with ticket #278. I pushed a few changes I made during my test runs. I think we'll just want to double-check variant names in these scripts, and will also need to update the README with new example commands.

Comment on lines 12 to 13
## default files
variant_summary_file="$BASEDIR/data/ClinVar-selected-submissions.tsv"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output_tab_abr_file <- paste0(output_name, ".annotations_report.abridged.tsv")

@rjcorb rjcorb self-requested a review February 24, 2026 14:52
Copy link
Copy Markdown
Collaborator

@rjcorb rjcorb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this again using some PBTA samples, and, as before, we only see discrepancies due to including the P/LP and B/LB classifications. I left one minor suggested change, and I also just noticed that there is a typo in the resolve-clinvar-intepretations.R script name :).

I will approve with merge contingent on updates above, and I also think we should have @rebkau review. @rebkau would you be able to test this on some of the NBL samples? Here was my process:

  1. Run sample(s) through AutoGVP on main branch:
bash run_autogvp.sh --vcf=<vcf_file> \
--filter_criteria=<filtering_criteria>' \
--clinvar=data/clinvar_20260104.vcf.gz \
--intervar=<intervar_path> \
--multianno=<multianno_path> \
--autopvs1=<autopvs1_path> \
--outdir=results \
--out=<name> \
--selected_clinvar_submissions=refs/ClinVar-selected-submissions.tsv
  • note you would need to run the select-clinVar-submissions.R Rscript beforehand
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary_20260104.txt.gz --submission_summary data/submission_summary_20260104.txt.gz --conceptID_list refs/clinvar_cancer_concept_ids_20260130.txt --outdir refs/
  1. Switch to this branch and run new resolve-clinvar-intepretations.R script
Rscript scripts/resolve-clinvar-intepretations.R --variant_summary data/variant_summary_20260104.txt.gz --submission_summary data/submission_summary_20260104.txt.gz --conceptID_list refs/clinvar_cancer_concept_ids_20260130.txt --outdir refs/
  1. Run AutoGVP on this branch:
bash run_autogvp.sh --vcf=<vcf_file> \
--filter_criteria=<filtering_criteria>' \
--intervar=<intervar_path> \
--multianno=<multianno_path> \
--autopvs1=<autopvs1_path> \
--outdir=results \
--out=<name> \
--sample_id=<sample_id> \
--resolved_clinvar=refs/resolved-clinvar-interpretations.tsv
  1. Compare pathogenicity calls between runs. The only differences I noticed were cases where a variant was called P or LP or B or LB originally (in main), but called P/LP or B/LB in the branch results. That is an expected change, since we are no longer trying to resolve these calls and just taking the call as is when it's reported in ClinVar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove flagged ClinVar variants from ClinVar file

3 participants