Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QC explained by Susie credible set in the region #3464

Open
d0choa opened this issue Sep 13, 2024 · 1 comment · May be fixed by opentargets/gentropy#780
Open

Add QC explained by Susie credible set in the region #3464

d0choa opened this issue Sep 13, 2024 · 1 comment · May be fixed by opentargets/gentropy#780
Assignees
Labels
Data Relates to Open Targets data team Gentropy Relates to the genetics ETL

Comments

@d0choa
Copy link
Contributor

d0choa commented Sep 13, 2024

For the final credible sets datasets, we need to be able to decide which credible sets are included in the outputs and which ones are discarded. For example, the same locus/region might have been fine-mapped through different methods, so we need an algorithm that decides which credible set will be included in the final set.

Because inclusion and validation both involve filtering rows in credible sets, we want to reuse this step to implement this logic. Also, parts of the validation (e.g., duplicate studyLocusId) would only make sense after the inclusion filter.

This ticket only covers one of the required QCs. PICS disambiguation would require a different QC flag.

QC - "Explained by Susie credible set in the region"

  • For every credible set
    • if there is another credible set in the same STUDY and REGION as the credible set LEADVARIANT
      • if the lead has NOT a flag reporting the variant is not in the LD matrix
        • flag credible set as "Explained by SuSie credible set in the region"

QC method needs to be added to StudyLocus and QC reason to the enum.

The method must be called from the study_locus_validation step before checking the duplicated studyLocusIds.
The orchestration repo will then need an additional flag to run this validation.

@d0choa d0choa added Data Relates to Open Targets data team Gentropy Relates to the genetics ETL labels Sep 13, 2024
@addramir addramir self-assigned this Sep 16, 2024
@d0choa d0choa assigned d0choa and unassigned d0choa and addramir Sep 20, 2024
@d0choa
Copy link
Contributor Author

d0choa commented Sep 20, 2024

Implemented logic flags credible sets for later filtering if:

  • any variant in the credible set overlaps with a locus finemapped using SuSiE
  • the credible set is NOT based on SuSiE-Inf
  • the credible set is NOT flagged as UNRESOLVED_LD (by LD clumping)

I'm still trying to figure out the last condition regarding UNRESOLVED_LD. After applying this filter with a realistic set of credible sets, I'd like to see how things look. We might have things with an R^2 0.4 not in LDIndex, but fine-mapping decides they are part of the credible set. We could generate duplicates in these not-so-extreme cases.

@addramir watch this space

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Relates to Open Targets data team Gentropy Relates to the genetics ETL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants