Hi Team, I am trying to test the DRAMA model sample code - link. However, I see mismatch between the expected output and actual output.
Sample code:
from transformers import AutoTokenizer, AutoModel
queries = [
'What percentage of the Earth\'s atmosphere is oxygen?',
'意大利首都是哪里?',
]
documents = [
"The amount of oxygen in the atmosphere has fluctuated over the last 600 million years, reaching a peak of 35% during the Carboniferous period, significantly higher than today's 21%.",
"羅馬是欧洲国家意大利首都和罗马首都广域市的首府及意大利全国的政治、经济、文化和交通中心,位于意大利半島中部的台伯河下游平原地,建城初期在七座小山丘上,故又名“七丘之城”。按城市范围内的人口计算,罗马是意大利人口最多的城市,也是欧盟人口第三多的城市。",
]
model_name = "facebook/drama-base"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(device)
query_embs = model.encode_queries(tokenizer, queries)
doc_embs = model.encode_documents(tokenizer, documents)
scores = query_embs @ doc_embs.T
print(scores.tolist())
Expected output: [[0.5310, 0.0821], [0.1298, 0.6181]]
Actual output - [[0.4584735929965973, 0.24322254955768585], [0.12728893756866455, 0.5092089176177979]]
Colab Notebook link - https://colab.research.google.com/drive/1FkJMGEJBX7BGsoLeGiJdxCKnMBmMG19n?usp=sharing
What's causing this issue? I also tested the sample code with new values
queries = [
'iphone', 'cat food'
]
documents = [
'iphone 16 pro max',
'best cat food'
]
output - [[0.40802454948425293, 0.26841771602630615], [0.27385222911834717, 0.5687180757522583]]
Is this the correct behavior? The relevance seems quite poor
Thanks
Hi Team, I am trying to test the DRAMA model sample code - link. However, I see mismatch between the expected output and actual output.
Sample code:
Actual output - [[0.4584735929965973, 0.24322254955768585], [0.12728893756866455, 0.5092089176177979]]
Colab Notebook link - https://colab.research.google.com/drive/1FkJMGEJBX7BGsoLeGiJdxCKnMBmMG19n?usp=sharing
What's causing this issue? I also tested the sample code with new values
queries = [
'iphone', 'cat food'
]
documents = [
'iphone 16 pro max',
'best cat food'
]
output - [[0.40802454948425293, 0.26841771602630615], [0.27385222911834717, 0.5687180757522583]]
Is this the correct behavior? The relevance seems quite poor
Thanks