Conversation
rjcorb
left a comment
There was a problem hiding this comment.
I've done a few test runs on PBTA samples and only noticed minor discrepancies between variant calls that should be resolved with ticket #278. I pushed a few changes I made during my test runs. I think we'll just want to double-check variant names in these scripts, and will also need to update the README with new example commands.
| ## default files | ||
| variant_summary_file="$BASEDIR/data/ClinVar-selected-submissions.tsv" |
scripts/02-annotate_variants.R
Outdated
There was a problem hiding this comment.
| output_tab_abr_file <- paste0(output_name, ".annotations_report.abridged.tsv") |
Take ClinSig when not conflicting (4/N)
rjcorb
left a comment
There was a problem hiding this comment.
I tested this again using some PBTA samples, and, as before, we only see discrepancies due to including the P/LP and B/LB classifications. I left one minor suggested change, and I also just noticed that there is a typo in the resolve-clinvar-intepretations.R script name :).
I will approve with merge contingent on updates above, and I also think we should have @rebkau review. @rebkau would you be able to test this on some of the NBL samples? Here was my process:
- Run sample(s) through AutoGVP on main branch:
bash run_autogvp.sh --vcf=<vcf_file> \
--filter_criteria=<filtering_criteria>' \
--clinvar=data/clinvar_20260104.vcf.gz \
--intervar=<intervar_path> \
--multianno=<multianno_path> \
--autopvs1=<autopvs1_path> \
--outdir=results \
--out=<name> \
--selected_clinvar_submissions=refs/ClinVar-selected-submissions.tsv
- note you would need to run the
select-clinVar-submissions.RRscript beforehand
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary_20260104.txt.gz --submission_summary data/submission_summary_20260104.txt.gz --conceptID_list refs/clinvar_cancer_concept_ids_20260130.txt --outdir refs/
- Switch to this branch and run new
resolve-clinvar-intepretations.Rscript
Rscript scripts/resolve-clinvar-intepretations.R --variant_summary data/variant_summary_20260104.txt.gz --submission_summary data/submission_summary_20260104.txt.gz --conceptID_list refs/clinvar_cancer_concept_ids_20260130.txt --outdir refs/
- Run AutoGVP on this branch:
bash run_autogvp.sh --vcf=<vcf_file> \
--filter_criteria=<filtering_criteria>' \
--intervar=<intervar_path> \
--multianno=<multianno_path> \
--autopvs1=<autopvs1_path> \
--outdir=results \
--out=<name> \
--sample_id=<sample_id> \
--resolved_clinvar=refs/resolved-clinvar-interpretations.tsv
- Compare pathogenicity calls between runs. The only differences I noticed were cases where a variant was called P or LP or B or LB originally (in
main), but called P/LP or B/LB in the branch results. That is an expected change, since we are no longer trying to resolve these calls and just taking the call as is when it's reported in ClinVar.
Purpose/implementation Section
What feature is being added or bug is being addressed?
Replacing the input ClinVar file with the ClinVar submissions file that has been filtered and cleaned in a prior step.
What was your approach?
Combined the cavatica and custom input scripts into one as the differences between them no longer exist.
What GitHub issue does your pull request address?
Closes #275
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Check the output for any new columns we don't want, or missing columns. Currently aware the ClinVar star value is missing as it is not supplied in the submissions file, but I can add that into the select submissions script.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Documentation Checklist