-
Notifications
You must be signed in to change notification settings - Fork 82
chore(tests): accuracy tests for MongoDB tools exposed by MCP server MCP-39 #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 16372374490Details
💛 - Coveralls |
58bc8a5
to
b557e02
Compare
7791e20
to
79cd26e
Compare
79cd26e
to
6ccaa11
Compare
f666014
to
1cc93f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still going through it - leaving some comments related to storage as I move on to the actual testing framework.
tests/accuracy/sdk/accuracy-result-storage/get-accuracy-result-storage.ts
Outdated
Show resolved
Hide resolved
In some cases, find, aggregate, count, explain, deleteMany, etc we need to grade extra provided arguments depending on the prompt itself. Sometimes additional parameters are fine and sometimes they are not. For example: increasing the keys in filter might lead to a different result hence if any such thing happens, we should grade the accuracy as 0 and not 0.75. To suppor this use-case, this commit introduces the idea of a custom scorer that could be plugged in to accuracy scorer to provided more controlled accuracy grading. Additionally this commit reverts the default behaviour of handling added parameters. Earlier we were marking newly added parameters as hallucinations and hence grading 0.75. But now, after figuring out that most of our tools don't even expect extra parameters, we are flipping the switch and instead will now grade 0 when additional parameters are specified, unless there is a scorer provided to handle the custom scoring logic.
8488144
to
ec52ee5
Compare
); | ||
}); | ||
|
||
return hasNonEmptyAdditions ? 0 : 0.75; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be more lenient and just return 1 instead of 0.75 in case of empty additions but I thought of keeping it this way to be aligned with how we see hallucinations.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
📊 Accuracy Test Results📈 Summary
📎 Download Full HTML Report - Look for the Report generated on: 7/18/2025, 2:00:20 PM |
Motivation and Goal
Design Brief
Detailed Design
Refer to the doc titled -
MCP Tools Accuracy Testing
Current State
For reviewers
tests/accuracy/sdk
. Start withdescribe-accuracy-test.ts
as this is where all the different parts come together and dive further into specific implementation of each parts afterwards.Apologies for the big chunk to be reviewed here but I did not see a way around it.