-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte #12
Comments
of cause the vector should be trained using the proper codec, it seems the model is trained in other coding environment. Can you check that. |
I have come across the same error, anybody help? Thank you ~ |
I came across the same error as well. I changed:
into
It turns out that |
@galuhsahid Thank you so much, it works now. : ) |
I have tried to read the files as you pointed, but I got the next error:
:( |
Same error as @anavaldi . Any solution? |
I solve this error by executing on my own word embeddings with the .sh file. |
I have come across the same error. I changed |
@hinamu it works, Thanks |
What do you mean? |
I solved this issue by degrading my gensim version from 3.6 to 3.0 |
UnpicklingError Traceback (most recent call last) |
@kusumlata123 even i am getting that Unpickling Error |
I am also getting the unpickling error... chinese_model = gensim.models.Word2Vec.load(os.path.join(desktop, 'cc.zh.300.bin.gz')) |
I also tried to save the text file and load it via the function provided by the fasttext official site. I first change the file extension from import io
def load_vectors(fname):
fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
n, d = map(int, fin.readline().split())
data = {}
for line in fin:
tokens = line.rstrip().split(' ')
data[tokens[0]] = map(float, tokens[1:])
return data
model = load_vectors(os.path.join(desktop, 'cc.zh.300.vec.txt')) However, I got the following errors:
|
I tried the above solution but I am getting error as: word2vec_path = 'GoogleNews-vectors-negative300.bin.gz.2' word2vec = models.KeyedVectors.load(word2vec_path) |
When I tried this , I am getting : |
For Korean language, i got this error: |
I get the same error after using:
What am I doing wrong? |
Hi,
I am trying to load Chinese pretrained word2vec,
word_vectors = KeyedVectors.load_word2vec_format(path, binary=True) # C binary format
it throws this error.
The text was updated successfully, but these errors were encountered: