Skip to content

The example in ’https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/‘ is not available #1677

Open
@Createsnow

Description

@Createsnow

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
A clear and concise description of what the bug is.

https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/

Ragas version: 0.2.5
Python version: 3.12

Code to Reproduce

import os
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.testset import TestsetGenerator
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders.pdf import PyPDFLoader


os.environ["AZURE_OPENAI_API_KEY"] = "xx"

azure_configs = {
        "base_url": xx,
        "model_deployment":xx
        "model_name": "gpt-4o",
        "api_version": "2023-03-15-preview",
        "embedding_deployment":xx
        "embedding_name": "text-embedding-3-large",
    }

generator_llm = LangchainLLMWrapper(AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
))

# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
generator_embeddings = LangchainEmbeddingsWrapper(AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
))


loader = PyPDFLoader(file_path="./data/sasfd.pdf")
docs = loader.load()

generator = TestsetGenerator(llm=generator_llm)
dataset = generator.generate_with_langchain_docs(docs, testset_size=10, transforms_embedding_model=generator_embeddings)

dataset.to_pandas()

Error trace

D:\software\Anaconda3\envs\ragas\python.exe E:\workspace\tensorlib\EC\ragas\doc_demo_local.py 
Applying SummaryExtractor:  10%|█         | 1/10 [00:02<00:23,  2.63s/it]Property 'summary' already exists in node 'bee747'. Skipping!
Applying SummaryExtractor:  50%|█████     | 5/10 [00:03<00:02,  1.98it/s]Property 'summary' already exists in node '04db4c'. Skipping!
Applying SummaryExtractor:  70%|███████   | 7/10 [00:04<00:01,  2.13it/s]Property 'summary' already exists in node '301267'. Skipping!
Property 'summary' already exists in node '79fa1a'. Skipping!
Applying SummaryExtractor:  90%|█████████ | 9/10 [00:04<00:00,  2.97it/s]Property 'summary' already exists in node '2560fc'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/32 [00:00<?, ?it/s]Property 'summary_embedding' already exists in node '2560fc'. Skipping!
Property 'summary_embedding' already exists in node '301267'. Skipping!
Property 'summary_embedding' already exists in node '04db4c'. Skipping!
Property 'summary_embedding' already exists in node '79fa1a'. Skipping!
Property 'summary_embedding' already exists in node 'bee747'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  47%|████▋     | 15/32 [00:09<00:06,  2.53it/s]Property 'themes' already exists in node 'dbda6b'. Skipping!
Property 'themes' already exists in node '301267'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  66%|██████▌   | 21/32 [00:09<00:02,  4.59it/s]Property 'themes' already exists in node '2560fc'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  75%|███████▌  | 24/32 [00:09<00:01,  4.76it/s]Property 'themes' already exists in node '4a1675'. Skipping!
Property 'themes' already exists in node '04db4c'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  81%|████████▏ | 26/32 [00:10<00:01,  4.87it/s]Property 'themes' already exists in node '79fa1a'. Skipping!
Property 'themes' already exists in node 'e768d8'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  88%|████████▊ | 28/32 [00:10<00:00,  5.78it/s]Property 'themes' already exists in node '23b252'. Skipping!
Property 'themes' already exists in node 'bee747'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  94%|█████████▍| 30/32 [00:10<00:00,  5.98it/s]Property 'themes' already exists in node '1403d7'. Skipping!
Property 'themes' already exists in node 'e584a5'. Skipping!
Generating personas: 100%|██████████| 3/3 [00:05<00:00,  1.83s/it]
Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "E:\workspace\tensorlib\EC\ragas\doc_demo_local.py", line 36, in <module>
    dataset = generator.generate_with_langchain_docs(docs, testset_size=10, transforms_embedding_model=embeddings)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 180, in generate_with_langchain_docs
    return self.generate(
           ^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 396, in generate
    raise e
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 393, in generate
    scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
                                                         ^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 200, in results
    results = asyncio.run(self._process_jobs())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 140, in _process_jobs
    result = await future
             ^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\tasks.py", line 631, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
    raise e
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
    result = await callable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
    scenarios = await self._generate_scenarios(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\single_hop\specific.py", line 74, in _generate_scenarios
    raise ValueError("No nodes found with the `entities` property.")
ValueError: No nodes found with the `entities` property.
Task exception was never retrieved
future: <Task finished name='Task-326' coro=<as_completed.<locals>.sema_coro() done, defined at D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py:43> exception=ValueError('No clusters found in the knowledge graph. Use a different Synthesizer.')>
Traceback (most recent call last):
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
    raise e
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
    result = await callable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
    scenarios = await self._generate_scenarios(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\multi_hop\abstract.py", line 74, in _generate_scenarios
    raise ValueError(
ValueError: No clusters found in the knowledge graph. Use a different Synthesizer.
Task exception was never retrieved
future: <Task finished name='Task-327' coro=<as_completed.<locals>.sema_coro() done, defined at D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py:43> exception=ZeroDivisionError('division by zero')>
Traceback (most recent call last):
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
    raise e
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
    result = await callable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
    scenarios = await self._generate_scenarios(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\multi_hop\specific.py", line 83, in _generate_scenarios
    num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
                                         ^^^^^^^^^^^^^^^^^^^^^^
ZeroDivisionError: division by zero

Process finished with exit code 1

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodule-testsetgenModule testset generation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions