Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snpInsideRunsCpp(runsChrom, mapKrom, genotypeFile) : Not compatible with requested type: [type=character; target=integer] #33

Open
aigulsharip opened this issue Apr 21, 2022 · 5 comments
Labels

Comments

@aigulsharip
Copy link

Good day,

I am using detectRUNS for identification of ROH from five human whole-genome sequencing.

slidingRuns <- slidingRUNS.run(
genotypeFile = "MS_KAZWG_5.no-tabs.ped",
mapFile = "MS_KAZWG_5.map",
windowSize = 15,
threshold = 0.05,
minSNP = 20,
ROHet = FALSE,
maxOppWindow = 1,
maxMissWindow = 1,
maxGap = 10^6,
minLengthBps = 250000,
minDensity = 1/10^3,
maxOppRun = NULL,
maxMissRun = NULL
)
This command work well. But when I try to run summaryRuns() and plot_SnpsInRuns(), error (snpInsideRunsCpp(runsChrom, mapKrom, genotypeFile) : Not compatible with requested type: [type=character; target=integer]) comes up. I am familar that this issue is not unique and some people already wrote it. I have read all the issues and try to fix it. I realize that I need to convert char chromosomes to numeric value, but I don't know how to do it. Maybe I should modify .map or .ped file, but how?

I would appreciate if you could help.

summaryList <- summaryRuns(
runs = slidingRuns, mapFile = "MS_KAZWG_5.map", genotypeFile = "MS_KAZWG_5.no-tabs.ped",
Class = 6, snpInRuns = TRUE)
Checking files...
Using class: 6
Total genome length: 3099072722
calculating Froh on all genome
Total genome length: 3099072722
calculating Froh chromosome by chromosome
Total genome length: 3099072722
calculating Froh by Class
[1] "Class used: >0"
[1] "Class used: >6"
[1] "Class used: >12"
[1] "Class used: >24"
[1] "Class used: >48"
Class created: 0-6Class created: 6-12Class created: 12-24Class created: 24-48Class created: >48
Calculating SNPs inside ROH
Calculation % SNP in ROH
Chromosome founds: 24
| | 0%Error in snpInsideRunsCpp(runsChrom, mapKrom, genotypeFile) :
Not compatible with requested type: [type=character; target=integer].

genotypeFilePath = "MS_KAZWG_5.no-tabs.ped"
mapFilePath = "MS_KAZWG_5.map"

ANOTHER ERROR:
plot_SnpsInRuns(
runs = slidingRuns[slidingRuns$chrom==2,], genotypeFile = genotypeFilePath,
mapFile = mapFilePath)
1] "Chromosome is: 2"
[1] "N. of runs: 26"
[1] "N.of SNP is 870355"
Error in snpInsideRunsCpp(runsChrom, mapChrom, genotypeFile) :
Not compatible with requested type: [type=character; target=integer].

I tried:

  1. slidingRuns$chrom = as.character(slidingRuns$chrom)
  2. sed 's/./:/' MS_KAZWG_5.map >> MS_KAZWG_5.fix.map
@bunop
Copy link
Contributor

bunop commented Apr 21, 2022

Dear @aigulsharip ,

Thank you for your interest in detectRUNs. Since I don't have a sample of your input files, I try to guess where the problem is. Maybe, since you are working on human genome, the problem are chromosome names: please check if you have chromosomes like X or Y: the package version 0.9.6 you get from CRAN cannot handle char chromosomes. If it is the case, you have to replace your char chromosomes with numbers, for instance 23 for X and 24 for Y. You don't have to cast as.character your runs chromosomes. You have to replace chromosomes in map file, for example:

sed 's/X/23/' MS_KAZWG_5.map >> MS_KAZWG_5.fix.map

The alternative way, it's to install detectRUNS directly from github, since the master branch of this project can handle char chromosomes but is not published on CRAN yet. You require to remove your installed detectRUNS package and install from github using devtools:

remove.packages("detectRUNS")
library(devtools)
install_github("bioinformatics-ptp/detectRUNS", subdir="detectRUNS", ref="master")

You should get the message Using detectRUNS 0.9.6.9000 when loading detectRUNS installed from github. Please, tell us if this works or not.

@bunop bunop added the question label Apr 21, 2022
@aigulsharip
Copy link
Author

aigulsharip commented Apr 21, 2022

Thanks a lot for quick response and help. Your recommendation help me to resolve the issue, I have re-installed detectRUNS from github.

I have another question. As an example, you have used the data from sheep dataset. I have try to slidingRUNS.run with those paramaters as in the example. The number of ROH for my human samples were quite small (using slidingRuns, summaryList, shown below), for consecutiveRuns (summaryListConsRuns) was even smaller. So I think for human WGS data different parameters should be used. Do you have any suggestions on paramaters and values for healthy human WGS data?

When I try to run PLINK with default paramaters and with some modifications, the discrepancy in number of ROH was huge.

summaryList$summary_ROH_count
KAZ_WG2 KAZ_WG4 KAZ_WG5 KAZ_WG6 KAZ_WG7
0-6 229 73 144 175 88

summaryListConsRuns$summary_ROH_count
KAZ_WG2 KAZ_WG5 KAZ_WG6 KAZ_WG7
0-6 2 3 1 3

genotypeFilePath = "MS_KAZWG_5.no-tabs.ped"
mapFilePath = "MS_KAZWG_5.map"

slidingRuns <- slidingRUNS.run(
genotypeFile = genotypeFilePath,
mapFile = mapFilePath,
windowSize = 15,
threshold = 0.05,
minSNP = 20,
ROHet = FALSE,
maxOppWindow = 1,
maxMissWindow = 1,
maxGap = 10^6,
minLengthBps = 250000,
minDensity = 1/10^3,
maxOppRun = NULL,
maxMissRun = NULL
)

@bunop
Copy link
Contributor

bunop commented Apr 22, 2022

Dear @aigulsharip ,

Unfortunately we have little or no-experience with human WGS data, we work mainly with livestock animals, which are characterized by a different genetic pressure respect to humans (selection, inbreeding, ...). Moreover, this package was developed relying on genechip arrays, where the number of SNPs are limited (50K-600K) and more distributed in genome than in WGS data.

In these days my colleagues have started working with WGS goat data, but unfortunately, we haven't identified yet what parameters fit for this type of data.

My suggestion is to raise up parameters like maxOppWindow and maxMissWindow, since I expect more noise in WGS respect to chip data. Also, options like maxOppRun and maxMissRun can have effect since regulate tolerance inside a run. You should check SNP missing rate to figure out what type of maxMiss* parameters can work. Maybe you can start by relaxing parameters (including minLengthBps) to produce many runs and then raise up them to filter out stuff until you see significant results.

Hope it helps

@aigulsharip
Copy link
Author

aigulsharip commented Apr 26, 2022

Thank you very much for suggestions!

@bunop
Copy link
Contributor

bunop commented Apr 27, 2022

Dear @aigulsharip,

I saw you edited your last post before I have time to reply. Regarding read external data using readExternalRuns, you have to specify the software with the program option, for example:

newdata <- readExternalRuns(inputFile="plink_roh.hom", program='plink')

Regarding plink parameters, here you can find help about ROH parameters. Using the parameters you provided, you can obtain a similar result using detectRUNS. Here's a table in which different parameters are compared:

plink plink value detectRUNS detectRUNS value
homozyg ROHet FALSE
homozyg-snp 500 minSNP 500
homozyg-kb 100 minLengthBps 100000
homozyg-window-het 1 maxOppWindow 1
homozyg-window-threshold 0.05 threshold 0.05
homozyg-density 50 minDensity 0.02
homozyg-gap 1000 maxGap 10^6
homozyg-window-snp 50 windowSize 50
homozyg-window-missing 50 maxMissWindow 50

I did a simple test on a WGS chromosome with plink and detectRUNS using the parameters I provided above and the number of ROHs are pretty the same.

All the best,

Paolo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants