-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM test using JS #1
Conversation
No linked issues found. Please add the corresponding issues in the pull request description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far so good but surely we will need to expand more on this, it's way too simple for anything that we may want to turn into a blog post eventually.
Let's think about more advanced senarios, at the moment this is only doing something very obvious and in fact not even using cypress in the first place, except as an HTTP client, which is not its native purpose either.
Absolutely. This initial commit serves as a foundation, demonstrating the connection and basic testing scenarios for the LLM. I will develop more advanced test scenarios. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's expand this a little bit before even merging this PR.
Here are some ideas generated via a quick interaction with Perplexity.
Ragas for Model-Based Evaluation
The Ragas framework enables quantitative assessment of RAG outputs through LLM-judged metrics https://langfuse.com/guides/cookbook/evaluation_of_rag_with_ragas:
import { faithfulness, answer_relevancy } from 'ragas';
const scores = await evaluate({
query: 'JavaScript RAG best practices',
contexts: retrievedDocuments,
answer: generatedResponse
});
console.log(`Faithfulness: ${scores.faithfulness}`);
console.log(`Answer Relevance: ${scores.answer_relevancy}`);
Key metrics include:
- Faithfulness (0-1): Factual consistency with provided context https://langfuse.com/guides/cookbook/evaluation_of_rag_with_ragas
- Answer Relevance (0-1): Response alignment with original query https://qdrant.tech/blog/rag-evaluation-guide/
- Context Precision: Ranking effectiveness of retrieved documents https://blog.griffinai.io/news/complete-guide-unit-testing-RAG
Unit Testing with Jest and DeepEval
Jest provides foundational testing capabilities through snapshot testing and mock implementations https://jestjs.io/docs/snapshot-testing:
*// Verify document chunking logic*
test('generates proper text chunks', () => {
const chunks = splitDocument(testText, 512);
expect(chunks).toMatchSnapshot();
});
*// Validate retriever query formatting*
test('builds hybrid search queries', () => {
const query = buildHybridQuery('RAG testing methods');
expect(query.vectorWeight).toBe(0.7);
expect(query.keywords).toContain('testing');
});
DeepEval extends testing with RAG-specific metrics through integration with CI/CD pipelines https://blog.griffinai.io/news/complete-guide-unit-testing-RAG:
import { assertRetrievalScore } from 'deepeval';
test('contextual completeness', async () => {
const result = await ragQuery('Node.js startup process');
assertRetrievalScore(result.contexts, 0.85);
});
Nice one, let me analysis and add it. Thanks 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of comments and so far so good, but as commented previously, this PR at the moment isn't doing anything particularly useful.
what are the next steps?
Thanks, feel free to go ahead and merge this |
I will be working on this in the next branch
|
First test that tests LLMs with JavaScript using the Cypress tool.
resolves nearform/hub-draft-issues#515