Skip to content

Control the threds while embeddings generation #4573

@donhardman

Description

@donhardman

Proposal:

We should add the possibility to set the max allowed threads when sending the request to generate embeddings.

As discussed, the best way to do it is to add an extra parameter to convert methods, so each time we send the request to the lib to generate embeddings for text, we pass adapted threads.

The default should be 0 (to maintain original behavior).

After implementation on the columnar embeddings lib side, we should integrate it into Manticore.

The easiest way is to just add an embeddings_batch config. A harder but smarter way is to understand and set threads based on current load and available information about how many workers run.

Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

Details
  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • OpenAPI YAML updated and issue created to rebuild clients

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions