-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
bugSomething isn't workingSomething isn't workingmodule-testsetgenModule testset generationModule testset generation
Description
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
A clear and concise description of what the bug is.
https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/
Ragas version: 0.2.5
Python version: 3.12
Code to Reproduce
import os
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.testset import TestsetGenerator
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders.pdf import PyPDFLoader
os.environ["AZURE_OPENAI_API_KEY"] = "xx"
azure_configs = {
"base_url": xx,
"model_deployment":xx
"model_name": "gpt-4o",
"api_version": "2023-03-15-preview",
"embedding_deployment":xx
"embedding_name": "text-embedding-3-large",
}
generator_llm = LangchainLLMWrapper(AzureChatOpenAI(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs["base_url"],
azure_deployment=azure_configs["model_deployment"],
model=azure_configs["model_name"],
validate_base_url=False,
))
# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
generator_embeddings = LangchainEmbeddingsWrapper(AzureOpenAIEmbeddings(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs["base_url"],
azure_deployment=azure_configs["embedding_deployment"],
model=azure_configs["embedding_name"],
))
loader = PyPDFLoader(file_path="./data/sasfd.pdf")
docs = loader.load()
generator = TestsetGenerator(llm=generator_llm)
dataset = generator.generate_with_langchain_docs(docs, testset_size=10, transforms_embedding_model=generator_embeddings)
dataset.to_pandas()
Error trace
D:\software\Anaconda3\envs\ragas\python.exe E:\workspace\tensorlib\EC\ragas\doc_demo_local.py
Applying SummaryExtractor: 10%|█ | 1/10 [00:02<00:23, 2.63s/it]Property 'summary' already exists in node 'bee747'. Skipping!
Applying SummaryExtractor: 50%|█████ | 5/10 [00:03<00:02, 1.98it/s]Property 'summary' already exists in node '04db4c'. Skipping!
Applying SummaryExtractor: 70%|███████ | 7/10 [00:04<00:01, 2.13it/s]Property 'summary' already exists in node '301267'. Skipping!
Property 'summary' already exists in node '79fa1a'. Skipping!
Applying SummaryExtractor: 90%|█████████ | 9/10 [00:04<00:00, 2.97it/s]Property 'summary' already exists in node '2560fc'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 0%| | 0/32 [00:00<?, ?it/s]Property 'summary_embedding' already exists in node '2560fc'. Skipping!
Property 'summary_embedding' already exists in node '301267'. Skipping!
Property 'summary_embedding' already exists in node '04db4c'. Skipping!
Property 'summary_embedding' already exists in node '79fa1a'. Skipping!
Property 'summary_embedding' already exists in node 'bee747'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 47%|████▋ | 15/32 [00:09<00:06, 2.53it/s]Property 'themes' already exists in node 'dbda6b'. Skipping!
Property 'themes' already exists in node '301267'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 66%|██████▌ | 21/32 [00:09<00:02, 4.59it/s]Property 'themes' already exists in node '2560fc'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 75%|███████▌ | 24/32 [00:09<00:01, 4.76it/s]Property 'themes' already exists in node '4a1675'. Skipping!
Property 'themes' already exists in node '04db4c'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 81%|████████▏ | 26/32 [00:10<00:01, 4.87it/s]Property 'themes' already exists in node '79fa1a'. Skipping!
Property 'themes' already exists in node 'e768d8'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 88%|████████▊ | 28/32 [00:10<00:00, 5.78it/s]Property 'themes' already exists in node '23b252'. Skipping!
Property 'themes' already exists in node 'bee747'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 94%|█████████▍| 30/32 [00:10<00:00, 5.98it/s]Property 'themes' already exists in node '1403d7'. Skipping!
Property 'themes' already exists in node 'e584a5'. Skipping!
Generating personas: 100%|██████████| 3/3 [00:05<00:00, 1.83s/it]
Generating Scenarios: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "E:\workspace\tensorlib\EC\ragas\doc_demo_local.py", line 36, in <module>
dataset = generator.generate_with_langchain_docs(docs, testset_size=10, transforms_embedding_model=embeddings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 180, in generate_with_langchain_docs
return self.generate(
^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 396, in generate
raise e
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 393, in generate
scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 200, in results
results = asyncio.run(self._process_jobs())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 140, in _process_jobs
result = await future
^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\tasks.py", line 631, in _wait_for_one
return f.result() # May raise f.exception().
^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
return await coro
^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
raise e
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
result = await callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
scenarios = await self._generate_scenarios(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\single_hop\specific.py", line 74, in _generate_scenarios
raise ValueError("No nodes found with the `entities` property.")
ValueError: No nodes found with the `entities` property.
Task exception was never retrieved
future: <Task finished name='Task-326' coro=<as_completed.<locals>.sema_coro() done, defined at D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py:43> exception=ValueError('No clusters found in the knowledge graph. Use a different Synthesizer.')>
Traceback (most recent call last):
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
return await coro
^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
raise e
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
result = await callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
scenarios = await self._generate_scenarios(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\multi_hop\abstract.py", line 74, in _generate_scenarios
raise ValueError(
ValueError: No clusters found in the knowledge graph. Use a different Synthesizer.
Task exception was never retrieved
future: <Task finished name='Task-327' coro=<as_completed.<locals>.sema_coro() done, defined at D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py:43> exception=ZeroDivisionError('division by zero')>
Traceback (most recent call last):
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
return await coro
^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
raise e
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
result = await callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
scenarios = await self._generate_scenarios(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\multi_hop\specific.py", line 83, in _generate_scenarios
num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
^^^^^^^^^^^^^^^^^^^^^^
ZeroDivisionError: division by zero
Process finished with exit code 1
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
thank you!
dosubot
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmodule-testsetgenModule testset generationModule testset generation