-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add Mixedbread AI Rerank support #140477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Mixedbread AI Rerank support #140477
Conversation
2683926 to
6133d64
Compare
|
Pinging @elastic/search-inference-team (Team:Search - Inference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for Mixedbread AI's rerank API to Elasticsearch's inference plugin. The implementation follows the established pattern for inference service providers and includes comprehensive test coverage.
Changes:
- Implements Mixedbread rerank service with model, request/response handling, and action creators
- Adds service settings and task settings with configurable parameters (top_n, return_documents)
- Registers the new service in InferencePlugin and InferenceNamedWriteablesProvider
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| MixedbreadService.java | Main service implementation for Mixedbread rerank with configuration and inference methods |
| MixedbreadRerankModel.java | Model class defining rerank-specific configuration and URI building |
| MixedbreadRerankRequest.java | Request builder for Mixedbread rerank API calls |
| MixedbreadRerankResponseEntity.java | Response parser for Mixedbread rerank API responses |
| MixedbreadRerankTaskSettings.java | Task-level settings (top_n, return_documents) |
| MixedbreadRerankServiceSettings.java | Service-level settings (model_id, rate limits) |
| MixedbreadActionCreator.java | Creates executable actions for rerank operations |
| MixedbreadConstants.java | Shared constants for field names and API paths |
| MixedbreadAccount.java | Account credentials and URI management |
| InferencePlugin.java | Registers the Mixedbread service factory |
| InferenceNamedWriteablesProvider.java | Registers named writeables for serialization |
| Test files (8 files) | Comprehensive test coverage for all components |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assertThat(thrownException.getMessage(), containsString("field [top_n] is not of the expected type")); | ||
| } | ||
|
|
||
| public void UpdatedTaskSettings_WithEmptyMap_ReturnsSameSettings() { |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name has inconsistent capitalization. It should start with 'test' (lowercase) to match Java naming conventions and be consistent with other test methods in the same file.
| public void UpdatedTaskSettings_WithEmptyMap_ReturnsSameSettings() { | |
| public void testUpdatedTaskSettings_WithEmptyMap_ReturnsSameSettings() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| import java.util.Map; | ||
|
|
||
| import static org.elasticsearch.xpack.inference.services.jinaai.rerank.JinaAIRerankTaskSettingsTests.getTaskSettingsMap; |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is incorrectly using a JinaAI import for the helper method. This should use the Mixedbread equivalent method 'MixedbreadRerankTaskSettingsTests.getTaskSettingsMap' instead of importing from JinaAI rerank task settings tests.
| import static org.elasticsearch.xpack.inference.services.jinaai.rerank.JinaAIRerankTaskSettingsTests.getTaskSettingsMap; | |
| import static org.elasticsearch.xpack.inference.services.mixedbread.rerank.MixedbreadRerankTaskSettingsTests.getTaskSettingsMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| XContentParser.Token token = parser.currentToken(); | ||
| ensureExpectedToken(XContentParser.Token.START_OBJECT, token, parser); | ||
|
|
||
| positionParserAtTokenAfterField(parser, "data", "FAILED_TO_FIND_FIELD_TEMPLATE"); // TODO error message |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a TODO comment with an incomplete error message placeholder. The error message template should be properly defined, as this is used when parsing fails if the 'data' field is not found in the response.
| positionParserAtTokenAfterField(parser, "data", "FAILED_TO_FIND_FIELD_TEMPLATE"); // TODO error message | |
| positionParserAtTokenAfterField(parser, "data", "Failed to find [data] field in Mixedbread rerank response"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| public Boolean getDoesReturnDocuments() { | ||
| return returnDocuments; | ||
| } | ||
|
|
||
| public Integer getTopNDocumentsOnly() { | ||
| return topNDocumentsOnly; | ||
| } | ||
|
|
||
| public Boolean getReturnDocuments() { | ||
| return returnDocuments; | ||
| } |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class has two redundant getter methods for the same field. Both 'getDoesReturnDocuments()' and 'getReturnDocuments()' return the same 'returnDocuments' field. One of these methods should be removed to avoid confusion and maintain clean API design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, let's remove one of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| @Override | ||
| public int rerankerWindowSize(String modelId) { | ||
| // Cohere rerank model truncates at 4096 tokens https://docs.cohere.com/reference/rerank |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment incorrectly mentions "Cohere rerank model" when this is a Mixedbread service implementation. The comment should be updated to reference Mixedbread's actual model token limits or window size documentation.
| // Cohere rerank model truncates at 4096 tokens https://docs.cohere.com/reference/rerank | |
| // Mixedbread rerank models currently support context windows of up to 4096 tokens (see Mixedbread documentation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pointing to cohere, I think we want the numbers posted here: https://www.mixedbread.com/docs/models/reranking/mxbai-rerank-large-v2
Looks like the older models have a window size of 512. We should make this configurable though. Let's add an optional field to the service settings that can control this value and default it to 8k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added settings field will need to be used in
@Override
public int rerankerWindowSize(String modelId) {
so we probably need to pass a model as a parameter instead of modelId but then I need to make refactoring impacting other services, namely to make the change in TransportGetRerankerWindowSizeAction and services overriding rerankerWindowSize
Should we do it or it's better to revert some changes and make it configurable via model_id like that?
MixedbreadService
```
private static final Map<String, Integer> RERANKERS_INPUT_SIZE = Map.of(
"mixedbread-ai/mxbai-rerank-xsmall-v1",
512,
"mixedbread-ai/mxbai-rerank-base-v1",
512,
"mixedbread-ai/mxbai-rerank-large-v1",
512
// Windows size.
// The v1 models: 512
// The v2 models: at least 8k
// https://www.mixedbread.com/docs/models/reranking/mxbai-rerank-large-v1
);
@Override
public int rerankerWindowSize(String modelId) {
Integer inputSize = RERANKERS_INPUT_SIZE.get(modelId);
return inputSize != null ? inputSize : DEFAULT_RERANKER_INPUT_SIZE_WORDS;
}
@jonathan-buttner @DonalEvans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The window size is not something a user can configure, it's an unchanging property of the model, so we don't need to store it in service settings. The current approach of having a map with the model IDs that don't use the 8k default is fine, but the rerankerWindowSize() method returns the size in words, not in tokens, so we'll need to translate from the 512/8000 values in tokens to smaller values in words by multiplying by 0.75 and rounding down a bit, which is the approach we use for other providers.
For consistency, the PR that originally introduced this feature can be used as a guide, with 512 tokens translating to a window size of 300 words, and 8000 tokens translating to 5500 words.
| TimeValue timeout, | ||
| ActionListener<List<ChunkedInference>> listener | ||
| ) { | ||
|
|
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method 'doChunkedInfer' has an empty implementation. This should either throw an UnsupportedOperationException with a descriptive message, or contain a proper implementation if chunked inference is supported.
| throw new UnsupportedOperationException("Chunked inference is not supported for service [" + NAME + "]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let's throw something like:
throw new UnsupportedOperationException(Strings.format("%s service does not support chunked inference", NAME));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| public class MixedbreadConstants { | ||
| public static final String VERSION_1 = "v1"; | ||
| public static final String RERANK_PATH = "rerank"; |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant RERANK_PATH is defined as "rerank" here but in MixedbreadRerankModel.java it's defined as "reranking". This inconsistency could lead to incorrect API paths being constructed. These should be unified to use the same value.
| public static final String RERANK_PATH = "rerank"; | |
| public static final String RERANK_PATH = "reranking"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixedbread supports both. I left "rerank". Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While both may work, the documentation for Mixedbread reranking uses reranking as the endpoint, so that's what we should be using.
jonathan-buttner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, I left some feedback.
| private static final String INVALID_REQUEST_TYPE_MESSAGE = "Invalid request type: expected Mixedbread %s request but got %s"; | ||
|
|
||
| private static final ResponseHandler RERANK_HANDLER = new MixedbreadRerankResponseHandler("mixedbread rerank", (request, response) -> { | ||
| if ((request instanceof MixedbreadRerankRequest) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe we typically check the the request type unless we need to use it. If the response format is invalid, the parsing logic will throw an error which should be good enough. Let's remove the if-block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I deleted the if-block
| ), | ||
| QueryAndDocsInputs.class | ||
| ); | ||
| var errorMessage = buildErrorMessage(TaskType.RERANK, model.getInferenceEntityId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the helper method constructFailedToSendRequestMessage. Take a look at OpenAiActionCreator for example usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up and used the helper method instead. Thx
| public static void decorateWithAuthHeader(HttpPost request, MixedbreadAccount account) { | ||
| request.setHeader(HttpHeaders.CONTENT_TYPE, XContentType.JSON.mediaType()); | ||
| request.setHeader(createAuthBearerHeader(account.apiKey())); | ||
| request.setHeader(new BasicHeader(REQUEST_SOURCE_HEADER, ELASTIC_REQUEST_SOURCE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to documentation as to why we need this header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used other implementations as references mainly Cohere, Jina AI and the ones you suggested in the comments.
It happened I ended up with an unnecessary header. Since I didn't find the one to be required.
This class is now deleted due to the changes related to other comments and this header is not used in my implementation.
| * Write 360 120 1-minute | ||
| * Update 480 160 1-minute | ||
| * Delete 240 80 1-minute | ||
| * <a href="https://www.mixedbread.com/api-reference/rate-limits">Rate Limiting</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These rate limits are for their storage operations. I'm not really sure what that is. If you go to the pricing page we can see that the free tier is limited to 100 requests per minute: https://www.mixedbread.com/pricing
Can you update the value to 100 and add the url I linked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the url and added the link. Thank you
| public static MixedbreadRerankServiceSettings fromMap(Map<String, Object> map, ConfigurationParseContext context) { | ||
| ValidationException validationException = new ValidationException(); | ||
|
|
||
| String url = extractOptionalString(map, URL, ModelConfigurations.SERVICE_SETTINGS, validationException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mixedbread allow users to spin up deployments? From poking around it seems like requests are only made to https://api.mixedbread.com. Can we remove this? For testing we'll need a way to pass a local URL. For examples of how to do that take a look at https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/services/mistral/MistralModel.java
We basically just need a setter on the base model class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, seems it can be removed. I did it. Also added the the suggested method
| ) { | ||
|
|
||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's override supportsChunkedInfer() and return false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thrown an exception in doChunkedInfer() and returned false in supportsChunkedInfer()
| } | ||
|
|
||
| @Override | ||
| protected void validateInputType(InputType inputType, Model model, ValidationException validationException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the implementation from the method
|
|
||
| @Override | ||
| public TransportVersion getMinimalSupportedVersion() { | ||
| return TransportVersion.minimumCompatible(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch this to use the new style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| @Override | ||
| public Set<TaskType> supportedStreamingTasks() { | ||
| return COMPLETION_ONLY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support streaming tasks for Mixedbread yet so let's remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| @Override | ||
| public int rerankerWindowSize(String modelId) { | ||
| // Cohere rerank model truncates at 4096 tokens https://docs.cohere.com/reference/rerank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pointing to cohere, I think we want the numbers posted here: https://www.mixedbread.com/docs/models/reranking/mxbai-rerank-large-v2
Looks like the older models have a window size of 512. We should make this configurable though. Let's add an optional field to the service settings that can control this value and default it to 8k.
|
Also please add a change log entry. |
…' into Add-Mixedbread-AI-Rerank-support
# Conflicts: # server/src/main/resources/transport/upper_bounds/9.4.csv
|
Currently I got one check failed: Elasticsearch Serverless Checks |
|
Regarding the comment. The change log entry is added |
# Conflicts: # server/src/main/resources/transport/upper_bounds/9.4.csv
| @@ -0,0 +1,5 @@ | |||
| pr: 140477 | |||
| summary: "[ML] Add Mixedbread Rerank support to the Inference Plugin" | |||
| area: Machine Learning | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The area should be "Inference" rather than "Machine Learning"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced: [ML] -> [Inference API]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also a good change, but I was referring to the area field, which should be Inference, not Machine Learning.
| this.query = Objects.requireNonNull(query); | ||
| this.returnDocuments = returnDocuments; | ||
| this.topN = topN; | ||
| taskSettings = model.getTaskSettings(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but rather than having a taskSettings field, it would be a little neater and more consistent to just get the task settings from model when we need them, like we do with the model ID and URI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. Thank you. Done
| import static org.elasticsearch.xpack.inference.external.request.RequestUtils.buildUri; | ||
|
|
||
| public class MixedbreadRerankModel extends MixedbreadModel { | ||
| private URI uri = buildUri( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having this be a mutable field, it could be final and set in the constructor using a constant value if the provided argument is null, similar to what's done in the JinaAI*Model classes, for example. This makes the class immutable, but still allows a non-default URL to be passed in for testing purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did it, also deleted the setter
| } | ||
|
|
||
| // should only be used for testing | ||
| public MixedbreadRerankModel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be default visibility rather than public, since it's not intended to be used outside this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| public MixedbreadRerankModel(MixedbreadRerankModel model, MixedbreadRerankServiceSettings serviceSettings) { | ||
| super(model, serviceSettings); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor is unused, do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Deleted. Thx
| } | ||
| } | ||
|
|
||
| public void testInfer_Rerank_Get_Response_NoReturnDocuments_NoTopN() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these tests need the "Get_Response" in their names? I'm not sure what it actually means in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming came from Jina AI service I used as a reference. We're checking if we get the results from the response we mocked so both having and not having Get_Response in the name is probably okay. I made it simpler
| null, | ||
| null, | ||
| INPUT, | ||
| false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This false should be extracted to a variable along with the corresponding value in the assertion at the end of this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| try (var service = new MixedbreadService(senderFactory, createWithEmptySettings(threadPool), mockClusterServiceEmpty())) { | ||
| webServer.enqueue(new MockResponse().setResponseCode(200).setBody(responseJson)); | ||
| var model = MixedbreadRerankModelTests.createModel(MODEL_NAME_VALUE, "secret", 3, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3 and true values should be extracted to variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| private Map<String, Object> getRequestConfigMap( | ||
| Map<String, Object> serviceSettings, | ||
| Map<String, Object> taskSettings, | ||
| Map<String, Object> secretSettings | ||
| ) { | ||
| var builtServiceSettings = new HashMap<>(); | ||
| builtServiceSettings.putAll(serviceSettings); | ||
| builtServiceSettings.putAll(secretSettings); | ||
|
|
||
| return new HashMap<>( | ||
| Map.of(ModelConfigurations.SERVICE_SETTINGS, builtServiceSettings, ModelConfigurations.TASK_SETTINGS, taskSettings) | ||
| ); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An identical method already exists in the Utils test class, so we should use that one instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| private Map<String, Object> getRequestConfigMap(Map<String, Object> serviceSettings, Map<String, Object> secretSettings) { | ||
| var builtServiceSettings = new HashMap<>(); | ||
| builtServiceSettings.putAll(serviceSettings); | ||
| builtServiceSettings.putAll(secretSettings); | ||
|
|
||
| return new HashMap<>(Map.of(ModelConfigurations.SERVICE_SETTINGS, builtServiceSettings)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method necessary? Would it be possible to just use the three-argument version of Utils.getRequestConfigMap() and pass Map.of() for the second argument (the task settings map)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's better. I re-used the method we got in Utils. Also cleaned up the test class. Thanks
# Conflicts: # server/src/main/resources/transport/upper_bounds/9.4.csv
| this.returnDocuments = returnDocuments; | ||
| this.topN = topN; | ||
| taskSettings = model.getTaskSettings(); | ||
| model.getTaskSettings(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| @Override | ||
| public int rerankerWindowSize(String modelId) { | ||
| // Cohere rerank model truncates at 4096 tokens https://docs.cohere.com/reference/rerank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The window size is not something a user can configure, it's an unchanging property of the model, so we don't need to store it in service settings. The current approach of having a map with the model IDs that don't use the 8k default is fine, but the rerankerWindowSize() method returns the size in words, not in tokens, so we'll need to translate from the 512/8000 values in tokens to smaller values in words by multiplying by 0.75 and rounding down a bit, which is the approach we use for other providers.
For consistency, the PR that originally introduced this feature can be used as a guide, with 512 tokens translating to a window size of 300 words, and 8000 tokens translating to 5500 words.
| ModelConfigurations.TASK_SETTINGS, | ||
| validationException | ||
| ); | ||
| Integer topNDocumentsOnly = extractOptionalPositiveInteger(map, TOP_N, ModelConfigurations.TASK_SETTINGS, validationException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable name can also be updated to remove the "DocumentsOnly" part, since it doesn't seem to have any relevance here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| * } | ||
| * <p> | ||
| * The response will look like (without whitespace): | ||
| * <pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request JSON example should also have <pre> tags to make it readable when rendered by the IDE or as HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| @Nullable ApiKeySecrets apiKeySecrets, | ||
| RateLimitSettings rateLimitServiceSettings | ||
| RateLimitSettings rateLimitServiceSettings, | ||
| URI uri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The uri field should be made private final instead of protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| * TransportVersion indicating when Mixedbread features were added. | ||
| */ | ||
| public static final TransportVersion ML_INFERENCE_MIXEDBREAD_ADDED = TransportVersion.fromName("ml_inference_mixedbread_added"); | ||
| public static final TransportVersion INFERENCE_MIXEDBREAD_ADDED = TransportVersion.fromName("ml_inference_mixedbread_added"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but can we remove the "ml" from the start of the TransportVersion name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Done
| @Nullable Integer topN, | ||
| @Nullable Boolean returnDocuments | ||
| @Nullable Boolean returnDocuments, | ||
| String uri |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be annotated with @Nullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Sorry, I forgot to include this in the previous review, but one other thing that should be added to this PR are some tests of the behaviour when there are unexpected fields in the maps passed to |
RERANK
request put {{base-url}}/_inference/rerank/mixedbread
{ "service": "mixedbread", "service_settings": { "api_key": "{{mb-api-key}}", "model_id": "mixedbread-ai/mxbai-rerank-xsmall-v1" }, "task_settings": { "return_documents": true, "top_k": 1 } }response
{ "inference_id": "mixedbread", "task_type": "rerank", "service": "mixedbread", "service_settings": { "model_id": "mixedbread-ai/mxbai-rerank-xsmall-v1", "rate_limit": { "requests_per_minute": 240 } }, "task_settings": { "return_documents": true } }request post {{base-url}}/_inference/rerank/mixedbread
{ "input": ["Luke", "like", "leia", "chewy","r2d2", "star", "wars"], "query": "star wars main character", "top_n": 2, "return_documents": true }response
{ "rerank": [ { "index": 0, "relevance_score": 0.083740234, "text": "Luke" }, { "index": 2, "relevance_score": 0.06994629, "text": "leia" } ] }direct request post https://api.mixedbread.com/v1/reranking
{ "model": "mixedbread-ai/mxbai-rerank-xsmall-v1", "query": "Who is the author of To Kill a Mockingbird?", "input": [ "To Kill a Mockingbird is a novel by Harper Lee", "The novel Moby-Dick was written by Herman Melville", "Harper Lee, an American novelist", "Jane Austen was an English novelist", "The Harry Potter series written by British author J.K. Rowling", "The Great Gatsby, a novel written by American author F. Scott Fitzgerald" ], "top_k": 3, "return_input": true }response
}