Skip to content

Conversation

@yutuyt01
Copy link
Collaborator

Wrote this quickly before I left Friday afternoon and forgot that I didn't submit the pull request, sorry

We talked about making a more general solution possible, so I believe the changes should allow for the original functionality and the functionality that I need in BEELINE. I also bumped down the python dependencies to a version BEELINE would allow - I'm not sure if this breaks any other function, to my knowledge it doesn't and through some limited testing of the splitting, verification and negative sample generation. I removed the parameters about a graph being undirected/source column, but can put those back in if you're planning to implement that as a feature.

Also, the changes fix (what I believe to be) a minor bug in the random selection of negative samples. Specifically, that for two edges with the same exact target set, that they will always choose the same targets since the sampling is based off a set seed for reproducibility that does not change. For example, TF a, b occur once in a dataset and target the same gene c - the negative sample generated will be the same always - (a, random) = (b, random). Unlikely to be a problem at all in most datasets, but I simply changed the seed per gene pair iterated over. This will still result in reproducibility, should just ensure "randomness".

Let me know if this works, can make any changes. I was also thinking it may be a good idea to set up an auto export to PyPI with a GitHub action, and I can look into doing that if you think it would be easier to maintain. Thanks for the help with these scripts!
Tim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants