LLM test using JS #1

sathish-paramasivam · 2025-02-26T11:06:27Z

First test that tests LLMs with JavaScript using the Cypress tool.
resolves nearform/hub-draft-issues#515

github-actions · 2025-02-26T11:06:40Z

No linked issues found. Please add the corresponding issues in the pull request description.
Use GitHub automation to close the issue when a PR is merged

simoneb

So far so good but surely we will need to expand more on this, it's way too simple for anything that we may want to turn into a blog post eventually.

Let's think about more advanced senarios, at the moment this is only doing something very obvious and in fact not even using cypress in the first place, except as an HTTP client, which is not its native purpose either.

sathish-paramasivam · 2025-02-26T14:20:45Z

So far so good but surely we will need to expand more on this, it's way too simple for anything that we may want to turn into a blog post eventually.

Let's think about more advanced senarios, at the moment this is only doing something very obvious and in fact not even using cypress in the first place, except as an HTTP client, which is not its native purpose either.

Absolutely. This initial commit serves as a foundation, demonstrating the connection and basic testing scenarios for the LLM. I will develop more advanced test scenarios.

simoneb

Let's expand this a little bit before even merging this PR.

Here are some ideas generated via a quick interaction with Perplexity.

Ragas for Model-Based Evaluation

The Ragas framework enables quantitative assessment of RAG outputs through LLM-judged metrics https://langfuse.com/guides/cookbook/evaluation_of_rag_with_ragas:

import { faithfulness, answer_relevancy } from 'ragas';  

const scores = await evaluate({  
  query: 'JavaScript RAG best practices',  
  contexts: retrievedDocuments,  
  answer: generatedResponse  
});  

console.log(`Faithfulness: ${scores.faithfulness}`);  
console.log(`Answer Relevance: ${scores.answer_relevancy}`);

Key metrics include:

Faithfulness (0-1): Factual consistency with provided context https://langfuse.com/guides/cookbook/evaluation_of_rag_with_ragas
Answer Relevance (0-1): Response alignment with original query https://qdrant.tech/blog/rag-evaluation-guide/
Context Precision: Ranking effectiveness of retrieved documents https://blog.griffinai.io/news/complete-guide-unit-testing-RAG

Unit Testing with Jest and DeepEval

Jest provides foundational testing capabilities through snapshot testing and mock implementations https://jestjs.io/docs/snapshot-testing:

*// Verify document chunking logic*  
test('generates proper text chunks', () => {  
  const chunks = splitDocument(testText, 512);  
  expect(chunks).toMatchSnapshot();  
});  

*// Validate retriever query formatting*  
test('builds hybrid search queries', () => {  
  const query = buildHybridQuery('RAG testing methods');  
  expect(query.vectorWeight).toBe(0.7);  
  expect(query.keywords).toContain('testing');  
});

DeepEval extends testing with RAG-specific metrics through integration with CI/CD pipelines https://blog.griffinai.io/news/complete-guide-unit-testing-RAG:

import { assertRetrievalScore } from 'deepeval';  

test('contextual completeness', async () => {  
  const result = await ragQuery('Node.js startup process');  
  assertRetrievalScore(result.contexts, 0.85);  
});

cypress.config.js

sathish-paramasivam · 2025-02-26T15:47:14Z

Let's expand this a little bit before even merging this PR.

Here are some ideas generated via a quick interaction with Perplexity.

Ragas for Model-Based Evaluation

The Ragas framework enables quantitative assessment of RAG outputs through LLM-judged metrics https://langfuse.com/guides/cookbook/evaluation_of_rag_with_ragas:
import { faithfulness, answer_relevancy } from 'ragas';  

const scores = await evaluate({  
  query: 'JavaScript RAG best practices',  
  contexts: retrievedDocuments,  
  answer: generatedResponse  
});  

console.log(`Faithfulness: ${scores.faithfulness}`);  
console.log(`Answer Relevance: ${scores.answer_relevancy}`);
Key metrics include:

Faithfulness (0-1): Factual consistency with provided context https://langfuse.com/guides/cookbook/evaluation_of_rag_with_ragas

Answer Relevance (0-1): Response alignment with original query https://qdrant.tech/blog/rag-evaluation-guide/

Context Precision: Ranking effectiveness of retrieved documents https://blog.griffinai.io/news/complete-guide-unit-testing-RAG

Unit Testing with Jest and DeepEval

Jest provides foundational testing capabilities through snapshot testing and mock implementations https://jestjs.io/docs/snapshot-testing:
*// Verify document chunking logic*  
test('generates proper text chunks', () => {  
  const chunks = splitDocument(testText, 512);  
  expect(chunks).toMatchSnapshot();  
});  

*// Validate retriever query formatting*  
test('builds hybrid search queries', () => {  
  const query = buildHybridQuery('RAG testing methods');  
  expect(query.vectorWeight).toBe(0.7);  
  expect(query.keywords).toContain('testing');  
});
DeepEval extends testing with RAG-specific metrics through integration with CI/CD pipelines https://blog.griffinai.io/news/complete-guide-unit-testing-RAG:
import { assertRetrievalScore } from 'deepeval';  

test('contextual completeness', async () => {  
  const result = await ragQuery('Node.js startup process');  
  assertRetrievalScore(result.contexts, 0.85);  
});

Nice one, let me analysis and add it. Thanks 🙏

simoneb

a couple of comments and so far so good, but as commented previously, this PR at the moment isn't doing anything particularly useful.

what are the next steps?

.github/workflows/run-cypress.yml

.gitignore

simoneb · 2025-03-04T16:23:20Z

Thanks, feel free to go ahead and merge this

sathish-paramasivam · 2025-03-05T09:06:15Z

a couple of comments and so far so good, but as commented previously, this PR at the moment isn't doing anything particularly useful.

what are the next steps?

I will be working on this in the next branch
Topic 3
Using Jest-AI:

Reasoning Ability (Multi-Step Question Handling). Testing LLM Reasoning with Chain-of-Thought (CoT)
Hallucination Detection (Consistency Testing). LLMs should return consistent answers for the same question. If they return wildly different responses, hallucinations may exist

LLM test using JS

e4d98ae

simoneb reviewed Feb 26, 2025

View reviewed changes

cypress.config.js Outdated Show resolved Hide resolved

sathish-paramasivam and others added 9 commits February 26, 2025 21:51

fix: fixing the build failures

78e3aee

fix: removed lint and prettier

7019494

skip linting

4830f94

skip lint in the workflow

50b53fe

Removed need to test Lint

df67200

chore: fix lint issue

5f79b97

chore: fix issue in package.json

ec4c641

Merge branch 'master' into testllm

7016f0d

chore: fix conflicts

227d0e1

simoneb approved these changes Mar 4, 2025

View reviewed changes

.github/workflows/run-cypress.yml Show resolved Hide resolved

.gitignore Outdated Show resolved Hide resolved

sathish-paramasivam added 3 commits March 4, 2025 16:32

chore: removed unused file in gitignore

31ca188

chore: removed unused in gitignore

1e4b229

chore: Removed the unused video folder

9cf146e

sathish-paramasivam merged commit cfa8320 into master Mar 5, 2025
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM test using JS #1

LLM test using JS #1

Uh oh!

sathish-paramasivam commented Feb 26, 2025 •

edited by bilalshareef

Loading

Uh oh!

github-actions bot commented Feb 26, 2025

Uh oh!

simoneb left a comment

Uh oh!

sathish-paramasivam commented Feb 26, 2025 •

edited

Loading

Uh oh!

simoneb left a comment

Uh oh!

Uh oh!

sathish-paramasivam commented Feb 26, 2025

Ragas for Model-Based Evaluation

Unit Testing with Jest and DeepEval

Uh oh!

simoneb left a comment

Uh oh!

Uh oh!

Uh oh!

simoneb commented Mar 4, 2025

Uh oh!

sathish-paramasivam commented Mar 5, 2025

Uh oh!

Uh oh!

Uh oh!

LLM test using JS #1

LLM test using JS #1

Uh oh!

Conversation

sathish-paramasivam commented Feb 26, 2025 • edited by bilalshareef Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 26, 2025

Uh oh!

simoneb left a comment

Choose a reason for hiding this comment

Uh oh!

sathish-paramasivam commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simoneb left a comment

Choose a reason for hiding this comment

Ragas for Model-Based Evaluation

Unit Testing with Jest and DeepEval

Uh oh!

Uh oh!

sathish-paramasivam commented Feb 26, 2025

Ragas for Model-Based Evaluation

Unit Testing with Jest and DeepEval

Uh oh!

simoneb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

simoneb commented Mar 4, 2025

Uh oh!

sathish-paramasivam commented Mar 5, 2025

Uh oh!

Uh oh!

Uh oh!

sathish-paramasivam commented Feb 26, 2025 •

edited by bilalshareef

Loading

sathish-paramasivam commented Feb 26, 2025 •

edited

Loading