Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM test using JS #1

Merged
merged 13 commits into from
Mar 5, 2025
Merged

LLM test using JS #1

merged 13 commits into from
Mar 5, 2025

Conversation

sathish-paramasivam
Copy link
Contributor

@sathish-paramasivam sathish-paramasivam commented Feb 26, 2025

First test that tests LLMs with JavaScript using the Cypress tool.
resolves nearform/hub-draft-issues#515

Copy link

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

Copy link
Member

@simoneb simoneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far so good but surely we will need to expand more on this, it's way too simple for anything that we may want to turn into a blog post eventually.

Let's think about more advanced senarios, at the moment this is only doing something very obvious and in fact not even using cypress in the first place, except as an HTTP client, which is not its native purpose either.

@sathish-paramasivam
Copy link
Contributor Author

sathish-paramasivam commented Feb 26, 2025

So far so good but surely we will need to expand more on this, it's way too simple for anything that we may want to turn into a blog post eventually.

Let's think about more advanced senarios, at the moment this is only doing something very obvious and in fact not even using cypress in the first place, except as an HTTP client, which is not its native purpose either.

Absolutely. This initial commit serves as a foundation, demonstrating the connection and basic testing scenarios for the LLM. I will develop more advanced test scenarios.

Copy link
Member

@simoneb simoneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's expand this a little bit before even merging this PR.

Here are some ideas generated via a quick interaction with Perplexity.

Ragas for Model-Based Evaluation

The Ragas framework enables quantitative assessment of RAG outputs through LLM-judged metrics https://langfuse.com/guides/cookbook/evaluation_of_rag_with_ragas:

import { faithfulness, answer_relevancy } from 'ragas';  

const scores = await evaluate({  
  query: 'JavaScript RAG best practices',  
  contexts: retrievedDocuments,  
  answer: generatedResponse  
});  

console.log(`Faithfulness: ${scores.faithfulness}`);  
console.log(`Answer Relevance: ${scores.answer_relevancy}`);

Key metrics include:

Unit Testing with Jest and DeepEval

Jest provides foundational testing capabilities through snapshot testing and mock implementations https://jestjs.io/docs/snapshot-testing:

*// Verify document chunking logic*  
test('generates proper text chunks', () => {  
  const chunks = splitDocument(testText, 512);  
  expect(chunks).toMatchSnapshot();  
});  

*// Validate retriever query formatting*  
test('builds hybrid search queries', () => {  
  const query = buildHybridQuery('RAG testing methods');  
  expect(query.vectorWeight).toBe(0.7);  
  expect(query.keywords).toContain('testing');  
});

DeepEval extends testing with RAG-specific metrics through integration with CI/CD pipelines https://blog.griffinai.io/news/complete-guide-unit-testing-RAG:

import { assertRetrievalScore } from 'deepeval';  

test('contextual completeness', async () => {  
  const result = await ragQuery('Node.js startup process');  
  assertRetrievalScore(result.contexts, 0.85);  
});

@sathish-paramasivam
Copy link
Contributor Author

Let's expand this a little bit before even merging this PR.

Here are some ideas generated via a quick interaction with Perplexity.

Ragas for Model-Based Evaluation

The Ragas framework enables quantitative assessment of RAG outputs through LLM-judged metrics https://langfuse.com/guides/cookbook/evaluation_of_rag_with_ragas:

import { faithfulness, answer_relevancy } from 'ragas';  

const scores = await evaluate({  
  query: 'JavaScript RAG best practices',  
  contexts: retrievedDocuments,  
  answer: generatedResponse  
});  

console.log(`Faithfulness: ${scores.faithfulness}`);  
console.log(`Answer Relevance: ${scores.answer_relevancy}`);

Key metrics include:

Unit Testing with Jest and DeepEval

Jest provides foundational testing capabilities through snapshot testing and mock implementations https://jestjs.io/docs/snapshot-testing:

*// Verify document chunking logic*  
test('generates proper text chunks', () => {  
  const chunks = splitDocument(testText, 512);  
  expect(chunks).toMatchSnapshot();  
});  

*// Validate retriever query formatting*  
test('builds hybrid search queries', () => {  
  const query = buildHybridQuery('RAG testing methods');  
  expect(query.vectorWeight).toBe(0.7);  
  expect(query.keywords).toContain('testing');  
});

DeepEval extends testing with RAG-specific metrics through integration with CI/CD pipelines https://blog.griffinai.io/news/complete-guide-unit-testing-RAG:

import { assertRetrievalScore } from 'deepeval';  

test('contextual completeness', async () => {  
  const result = await ragQuery('Node.js startup process');  
  assertRetrievalScore(result.contexts, 0.85);  
});

Nice one, let me analysis and add it. Thanks 🙏

Copy link
Member

@simoneb simoneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of comments and so far so good, but as commented previously, this PR at the moment isn't doing anything particularly useful.

what are the next steps?

@simoneb
Copy link
Member

simoneb commented Mar 4, 2025

Thanks, feel free to go ahead and merge this

@sathish-paramasivam
Copy link
Contributor Author

a couple of comments and so far so good, but as commented previously, this PR at the moment isn't doing anything particularly useful.

what are the next steps?

I will be working on this in the next branch
Topic 3
Using Jest-AI:

  1. Reasoning Ability (Multi-Step Question Handling). Testing LLM Reasoning with Chain-of-Thought (CoT)
  2. Hallucination Detection (Consistency Testing). LLMs should return consistent answers for the same question. If they return wildly different responses, hallucinations may exist

@sathish-paramasivam sathish-paramasivam merged commit cfa8320 into master Mar 5, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants