Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning during analysis, and no output written #6

Open
CAWarmerdam opened this issue Oct 8, 2019 · 8 comments
Open

Warning during analysis, and no output written #6

CAWarmerdam opened this issue Oct 8, 2019 · 8 comments

Comments

@CAWarmerdam
Copy link

Hi,

As can be derived from my previous 'issue', I am fairly new to the HASE framework. Therefore more problems are likely to have been caused by my lack of experience. However, I would be very grateful if someone could help me resolve the following issue.

After having successfully prepared an experiment with converted and encoded data, creating mapper files and performing partial derivatives calculations, the meta-stage analysis gives the following warnings:

time to read and merge genotype 0.131412982941s
hase/hdgwas/hdregression.py:120: RuntimeWarning: invalid value encountered in divide
  t_stat = np.sqrt(DF) * np.divide(A1_B_full[:, (N_con):N_con+1, :], a44_C_BTA1B)
time to compute GWAS for 1000 phenotypes and 5000 SNPs .... 0.754463911057 sec
Read 5000, processed 5000, total 53453
hase/hdgwas/tools.py:189: RuntimeWarning: invalid value encountered in greater
  mask = np.where(np.abs(self.t_stat) > t_threshold)

Regular regression analysis using just the converted genotypes, the phenotypes and covariates show the same warning message.

After these warnings the program proceeds without further error messages, but there are no .npy output files located in the output directory. Some testing made it clear that the A_inverse function returns a matrix of zeros contrary to the example data.

Could someone help me resolve this problem?

@roshchupkin
Copy link
Owner

Hi @CAWarmerdam, this warning appears when you have "NaN" values for t-statistics. It is difficult to say why, given the information a have, but most probably you have to check you phenotype and covariates files. They should not contain any missing values or NaNs! HASE does not clean datasets and used all available data. Therefore, if you have missing values you can have such warning. And that's why HASE doesn't save any results, because there is no valid t-values.

@CAWarmerdam
Copy link
Author

Hi @roshchupkin, Thank you very much for your prompt response! Some thorough testing has shown that both the phenotype and covariates files and do not contain NaN values. I suppose discrete variables must be coded in integer values for the covariates file?

@CAWarmerdam
Copy link
Author

I would like to emphasize again how much I appreciate your help. Running the analysis with randomly generated covariates (continuous values around 1-100) does not trigger a warning. Unfortunately there are still no output files generated,

@roshchupkin
Copy link
Owner

@CAWarmerdam could you please write here how do you submit the script (parameters)?
It can be the case if you setup t-stat threshold and nothing is significant, so HASE doesn't save any results.

@CAWarmerdam
Copy link
Author

For now I am testing this dataset with the regular regression mode, with a command formatted as follows:

python hase.py -g <converted genotype data> -study_name <study name> -o <output directory> -ph <phenotype data> -mode regression -cov <random covariate data> -th 1

Currently I am working with a dataset consisting of just over 50k SNPs, 358 samples and over 21k phenotype variables.

indeed removing the t-threshold does result in output with the random covariate data, so thank you for your suggestion. However, it does not solve analysis with the discrete covariates (sex, batch, ancestry) that still do not return an output.

@roshchupkin
Copy link
Owner

Well, if there is no NaNs, missing, spaces in the dataset, I can't really think what is the reason.
I can suggest to start with 1 covariate (sex) and add one-by-one to check which one cause the issue.

@CAWarmerdam
Copy link
Author

Thank you very much for your suggestions so far. A lot of problems are now resolved. Unfortunately I have not yet come up with a solution for the particular problem stated above. However, I have tried to find the origin of this issue.

As can be seen in the traceback at the start on the issue, the division warning occurs on what is actually line 121 within the HASE function. The invalid value inside appear to be zeros (the inverse of matrix A exists exclusively of zeros).

Do you think there is a possibility that this problem happens because the inv matrix that is created on lines 76-79 in the same file is singular, and that this matrix thus cannot be inverted, returning a matrix consisting of zeros created on line 83? I am unfortunately not as comfortable with the mathematics involved so it would be great if you could comment on this.

We would greatly appreciate if someone could look into this issue. me and my colleagues would really like to see if this method will work for a large multi-site study.

Thank you very much.

@CAWarmerdam
Copy link
Author

CAWarmerdam commented Nov 7, 2019

After having looked into the process of constructing the A matrix, it has appeared to me that the particular error was partly caused by genotype data with equal genotypes for all individuals (2 for every individual). These equal genotypes unfortunately were caused by incorrect conversion of the VCF file. I suppose another part of the problem was too little variance in the covariates.

The genotype data is as follows:

[[2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 ...
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]
 [2. 2. 2. ... 2. 2. 2.]]

The covariates are as shown below:

id_0	2	2	1
id_1	2	1	1
id_2	1	5	1
id_3	1	2	1
...     ...	...	...
id_n-4	2	5	3
id_n-3	1	3	3
id_n-2	1	1	3
id_n-1	1	2	3

Matrix A is then equal to:

[[ 358.  529. 1169.  899.  716.]
 [ 529.  871. 1774. 1335. 1058.]
 [1169. 1774. 5185. 2902. 2338.]
 [ 899. 1335. 2902. 2697. 1798.]
 [ 716. 1058. 2338. 1798. 1432.]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants