Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The example in ’https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/‘ is not available #1677

Open
Createsnow opened this issue Nov 14, 2024 · 1 comment
Labels
bug Something isn't working module-testsetgen Module testset generation

Comments

@Createsnow
Copy link

Createsnow commented Nov 14, 2024

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
A clear and concise description of what the bug is.

https://docs.ragas.io/en/latest/getstarted/rag_testset_generation/

Ragas version: 0.2.5
Python version: 3.12

Code to Reproduce

import os
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.testset import TestsetGenerator
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders.pdf import PyPDFLoader


os.environ["AZURE_OPENAI_API_KEY"] = "xx"

azure_configs = {
        "base_url": xx,
        "model_deployment":xx
        "model_name": "gpt-4o",
        "api_version": "2023-03-15-preview",
        "embedding_deployment":xx
        "embedding_name": "text-embedding-3-large",
    }

generator_llm = LangchainLLMWrapper(AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
))

# init the embeddings for answer_relevancy, answer_correctness and answer_similarity
generator_embeddings = LangchainEmbeddingsWrapper(AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
))


loader = PyPDFLoader(file_path="./data/sasfd.pdf")
docs = loader.load()

generator = TestsetGenerator(llm=generator_llm)
dataset = generator.generate_with_langchain_docs(docs, testset_size=10, transforms_embedding_model=generator_embeddings)

dataset.to_pandas()

Error trace

D:\software\Anaconda3\envs\ragas\python.exe E:\workspace\tensorlib\EC\ragas\doc_demo_local.py 
Applying SummaryExtractor:  10%|█         | 1/10 [00:02<00:23,  2.63s/it]Property 'summary' already exists in node 'bee747'. Skipping!
Applying SummaryExtractor:  50%|█████     | 5/10 [00:03<00:02,  1.98it/s]Property 'summary' already exists in node '04db4c'. Skipping!
Applying SummaryExtractor:  70%|███████   | 7/10 [00:04<00:01,  2.13it/s]Property 'summary' already exists in node '301267'. Skipping!
Property 'summary' already exists in node '79fa1a'. Skipping!
Applying SummaryExtractor:  90%|█████████ | 9/10 [00:04<00:00,  2.97it/s]Property 'summary' already exists in node '2560fc'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/32 [00:00<?, ?it/s]Property 'summary_embedding' already exists in node '2560fc'. Skipping!
Property 'summary_embedding' already exists in node '301267'. Skipping!
Property 'summary_embedding' already exists in node '04db4c'. Skipping!
Property 'summary_embedding' already exists in node '79fa1a'. Skipping!
Property 'summary_embedding' already exists in node 'bee747'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  47%|████▋     | 15/32 [00:09<00:06,  2.53it/s]Property 'themes' already exists in node 'dbda6b'. Skipping!
Property 'themes' already exists in node '301267'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  66%|██████▌   | 21/32 [00:09<00:02,  4.59it/s]Property 'themes' already exists in node '2560fc'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  75%|███████▌  | 24/32 [00:09<00:01,  4.76it/s]Property 'themes' already exists in node '4a1675'. Skipping!
Property 'themes' already exists in node '04db4c'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  81%|████████▏ | 26/32 [00:10<00:01,  4.87it/s]Property 'themes' already exists in node '79fa1a'. Skipping!
Property 'themes' already exists in node 'e768d8'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  88%|████████▊ | 28/32 [00:10<00:00,  5.78it/s]Property 'themes' already exists in node '23b252'. Skipping!
Property 'themes' already exists in node 'bee747'. Skipping!
Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:  94%|█████████▍| 30/32 [00:10<00:00,  5.98it/s]Property 'themes' already exists in node '1403d7'. Skipping!
Property 'themes' already exists in node 'e584a5'. Skipping!
Generating personas: 100%|██████████| 3/3 [00:05<00:00,  1.83s/it]
Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "E:\workspace\tensorlib\EC\ragas\doc_demo_local.py", line 36, in <module>
    dataset = generator.generate_with_langchain_docs(docs, testset_size=10, transforms_embedding_model=embeddings)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 180, in generate_with_langchain_docs
    return self.generate(
           ^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 396, in generate
    raise e
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\generate.py", line 393, in generate
    scenario_sample_list: t.List[t.List[BaseScenario]] = exec.results()
                                                         ^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 200, in results
    results = asyncio.run(self._process_jobs())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 140, in _process_jobs
    result = await future
             ^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\asyncio\tasks.py", line 631, in _wait_for_one
    return f.result()  # May raise f.exception().
           ^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
    raise e
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
    result = await callable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
    scenarios = await self._generate_scenarios(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\single_hop\specific.py", line 74, in _generate_scenarios
    raise ValueError("No nodes found with the `entities` property.")
ValueError: No nodes found with the `entities` property.
Task exception was never retrieved
future: <Task finished name='Task-326' coro=<as_completed.<locals>.sema_coro() done, defined at D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py:43> exception=ValueError('No clusters found in the knowledge graph. Use a different Synthesizer.')>
Traceback (most recent call last):
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
    raise e
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
    result = await callable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
    scenarios = await self._generate_scenarios(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\multi_hop\abstract.py", line 74, in _generate_scenarios
    raise ValueError(
ValueError: No clusters found in the knowledge graph. Use a different Synthesizer.
Task exception was never retrieved
future: <Task finished name='Task-327' coro=<as_completed.<locals>.sema_coro() done, defined at D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py:43> exception=ZeroDivisionError('division by zero')>
Traceback (most recent call last):
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 45, in sema_coro
    return await coro
           ^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 96, in wrapped_callable_async
    raise e
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\executor.py", line 92, in wrapped_callable_async
    result = await callable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\base.py", line 94, in generate_scenarios
    scenarios = await self._generate_scenarios(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\software\Anaconda3\envs\ragas\Lib\site-packages\ragas\testset\synthesizers\multi_hop\specific.py", line 83, in _generate_scenarios
    num_sample_per_cluster = int(np.ceil(n / len(node_clusters)))
                                         ^^^^^^^^^^^^^^^^^^^^^^
ZeroDivisionError: division by zero

Process finished with exit code 1

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

thank you!

@Createsnow Createsnow added the bug Something isn't working label Nov 14, 2024
@dosubot dosubot bot added the module-testsetgen Module testset generation label Nov 14, 2024
@jjmachan
Copy link
Member

jjmachan commented Nov 14, 2024

we have fixed with with #1661 and will be released in a couple of hours 🙂

I'll ping you once it is out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-testsetgen Module testset generation
Projects
None yet
Development

No branches or pull requests

2 participants