Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is a input vector file? And workflow diagram confusion. #20

Open
Rolands-Laucis opened this issue Apr 5, 2021 · 0 comments
Open

Comments

@Rolands-Laucis
Copy link

Hello, i am using your paper for my own thesis research. Looking at the workflow.jpg diagram, the arrows are a bit confusing. I am trying to use the Skip-Gram ngram-ngram method. From what i understand, it seems that i have to go through the steps corpus2vocab -> corpus2pairs -> paris2sgns. But paris2sgns requires an "--input_vector_file" argument. I dont know what that is and the steps didnt generate one. I assume its the resulting word embeddings vectors in a file, but if i have that, then i wouldnt be using the tool. Do i have to run the original word2vec SG method and save a .vec model and use it here? I read the research paper and didnt find an answer to this either. I also tried pairs2vocab, but it also doesnt generate the input vector file.

A separate issue is with the corpus2pairs; it generates 4 different .txt files (pairs.txt_0, pairs.txt_1, pairs.txt_2, pairs.txt_3), when i give the argument "--pairs_file ./pairs.txt". Then later do i have to run paris2sgns for all pairs files? Do i generate different output vector files for each? Do the vector files get overwritten or appended to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant