Proposal:
We should add the possibility to set the max allowed threads when sending the request to generate embeddings.
As discussed, the best way to do it is to add an extra parameter to convert methods, so each time we send the request to the lib to generate embeddings for text, we pass adapted threads.
The default should be 0 (to maintain original behavior).
After implementation on the columnar embeddings lib side, we should integrate it into Manticore.
The easiest way is to just add an embeddings_batch config. A harder but smarter way is to understand and set threads based on current load and available information about how many workers run.
Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.
Details
Proposal:
We should add the possibility to set the max allowed threads when sending the request to generate embeddings.
As discussed, the best way to do it is to add an extra parameter to convert methods, so each time we send the request to the lib to generate embeddings for text, we pass adapted threads.
The default should be 0 (to maintain original behavior).
After implementation on the columnar embeddings lib side, we should integrate it into Manticore.
The easiest way is to just add an embeddings_batch config. A harder but smarter way is to understand and set threads based on current load and available information about how many workers run.
Checklist:
To be completed by the assignee. Check off tasks that have been completed or are not applicable.
Details