Open
Description
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
A clear and concise description of what the bug is.
https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/
Ragas version: 0.2.5
Python version: 3.12
Code to Reproduce
import os
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.testset import TestsetGenerator
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders.pdf import PyPDFLoader
os.environ["AZURE_OPENAI_API_KEY"] = "xx"
azure_configs = {
"base_url": xx,
"model_deployment":xx
"model_name": "gpt-4o",
"api_version": "2023-03-15-preview",
"embedding_deployment":xx
"embedding_name": "text-embedding-3-large",
}
generator_llm = LangchainLLMWrapper(AzureChatOpenAI(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs["base_url"],
azure_deployment=azure_configs["model_deployment"],
model=azure_configs["model_name"],
validate_base_url=False,
))
# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
generator_embeddings = LangchainEmbeddingsWrapper(AzureOpenAIEmbeddings(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs["base_url"],
azure_deployment=azure_configs["embedding_deployment"],
model=azure_configs["embedding_name"],
))
loader = PyPDFLoader(file_path="./data/sasfd.pdf")
docs = loader.load()
generator = TestsetGenerator(llm=generator_llm)
dataset = generator.generate_with_langchain_docs(docs, testset_size=10, transforms_embedding_model=generator_embeddings)
dataset.to_pandas()
Error trace
D:\software\Anaconda3\envs\ragas\python.exe E:\workspace\tensorlib\EC\ragas\doc_demo_local.py
Applying SummaryExtractor: 10%|█ | 1/10 [00:02<00:23, 2.63s/it]Property 'summary' already exists in node 'bee747'. Skipping!
Applying SummaryExtractor: 50%|█████ | 5/10 [00:03<00:02, 1.98it/s]Property 'summary' already exists in node '04db4c'. Skipping!
Applying SummaryExtractor: 70%|███████ | 7/10 [00:04<00:01, 2.13it/s]Property 'summary' already exists in node '301267'. Skipping!
Property 'summary' already exists in node '79fa1a'. Skipping!
Applying SummaryExtractor: 90%|█████████ | 9/10 [00:04<00:00, 2.97it/s]Property 'summary' already exists in node '2560fc'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 0%| | 0/32 [00:00<?, ?it/s]Property 'summary_embedding' already exists in node '2560fc'. Skipping!
Property 'summary_embedding' already exists in node '301267'. Skipping!
Property 'summary_embedding' already exists in node '04db4c'. Skipping!
Property 'summary_embedding' already exists in node '79fa1a'. Skipping!
Property 'summary_embedding' already exists in node 'bee747'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 47%|████▋ | 15/32 [00:09<00:06, 2.53it/s]Property 'themes' already exists in node 'dbda6b'. Skipping!
Property 'themes' already exists in node '301267'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 66%|██████▌ | 21/32 [00:09<00:02, 4.59it/s]Property 'themes' already exists in node '2560fc'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 75%|███████▌ | 24/32 [00:09<00:01, 4.76it/s]Property 'themes' already exists in node '4a1675'. Skipping!
Property 'themes' already exists in node '04db4c'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 81%|████████▏ | 26/32 [00:10<00:01, 4.87it/s]Property 'themes' already exists in node '79fa1a'. Skipping!
Property 'themes' already exists in node 'e768d8'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 88%|████████▊ | 28/32 [00:10<00:00, 5.78it/s]Property 'themes' already exists in node '23b252'. Skipping!
Property 'themes' already exists in node 'bee747'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]: 94%|█████████▍| 30/32 [00:10<00:00, 5.98it/s]Property 'themes' already exists in node '1403d7'. Skipping!
Property 'themes' already exists in node 'e584a5'. Skipping!
Generating personas: 100%|██████████| 3/3 [00:05<00:00, 1.83s/it]
Generating Scenarios: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "E:\workspace\tensorlib\EC\ragas\doc_demo_local.py", line 36, in <module>
dataset = generator.generate_with_langchain_docs(docs, testset_size=10, transforms_embedding_model=embeddings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 180, in generate_with_langchain_docs
return self.generate(
^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 396, in generate
raise e
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 393, in generate
scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 200, in results
results = asyncio.run(self._process_jobs())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 140, in _process_jobs
result = await future
^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\tasks.py", line 631, in _wait_for_one
return f.result() # May raise f.exception().
^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
return await coro
^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
raise e
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
result = await callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
scenarios = await self._generate_scenarios(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\single_hop\specific.py", line 74, in _generate_scenarios
raise ValueError("No nodes found with the `entities` property.")
ValueError: No nodes found with the `entities` property.
Task exception was never retrieved
future: <Task finished name='Task-326' coro=<as_completed.<locals>.sema_coro() done, defined at D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py:43> exception=ValueError('No clusters found in the knowledge graph. Use a different Synthesizer.')>
Traceback (most recent call last):
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
return await coro
^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
raise e
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
result = await callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
scenarios = await self._generate_scenarios(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\multi_hop\abstract.py", line 74, in _generate_scenarios
raise ValueError(
ValueError: No clusters found in the knowledge graph. Use a different Synthesizer.
Task exception was never retrieved
future: <Task finished name='Task-327' coro=<as_completed.<locals>.sema_coro() done, defined at D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py:43> exception=ZeroDivisionError('division by zero')>
Traceback (most recent call last):
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
return await coro
^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
raise e
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
result = await callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
scenarios = await self._generate_scenarios(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\multi_hop\specific.py", line 83, in _generate_scenarios
num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
^^^^^^^^^^^^^^^^^^^^^^
ZeroDivisionError: division by zero
Process finished with exit code 1
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
thank you!