Ancestry matching

Scripts for building cohorts of similar genetic ancestry across biobanks

Repository structure

project-recruitment
|
│-- README.md
|
|---ancestry-matching
│   │-- pca-classification.py
│   │-- ancestry-matching.py
│   │-- aou-ancestry-matching.breakdown.pdf
│   │-- aou-ancestry-matching.pc1x2.pdf

Motivation

Ancestry-matching using PCs built from genotyping data allows us to identify sample sets which share some amount of genetic background. Ancestry-matching is done using genetic similarity instead of racial categories per this report from The National Academies of Sciences, Engineering, and Medicine. Self-reported ethnicities are used to validate that results indicate majority-similar self-reported ethnicities between the UKB and AoU samples (in our case 'White-British' and 'White'), but are not used to filter individuals for inclusion in the matched AoU sample.

Steps

Project both UKB and AoU datasets into the GBMI PC space. Instructions here (requires PLINK). Note that UKB uses reference genome GRCh37 and AoU uses reference genome GRCh38.
Train a machine learning model to classify your individuals as either ancestry-matched to your UKB sample (1) or not ancestry-matched (0) using the pca-classification.py script function --train.
Input is a file with the PCA projections for the UKB individuals plus their classification as either in or out of the sample you are matching to.
Output is the pickle files for the models.
Use the best model from 2. to predict the classification of the AoU individuals using the --predict function of the pca-classification.py script.
Inputs are the PCA projections for the AoU individuals and the pickle file for the model you want to use for projection.
Output is the classifications for the AoU individuals.
Validate that the ancestry matching performed as expected (is largely in line with self-reported ethnicities) using the ancestry-matching.py script.
Inputs are the PC projections (from 1.) and the classifications (from 3.) for the AoU sample.
Output is a figure showing AoU PC1 x PC2 colored by self-reported ethnicity (e.g. aou-ancestry-matching.pc1x2.pdf) and a stacked bar plot of self-reported ethnicity vs. classification by ML model (e.g. aou-ancestry-matching.breakdown.pdf)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ancestry-matching		ancestry-matching
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ancestry matching

Repository structure

Motivation

Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ancestry matching

Repository structure

Motivation

Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages