Ollama provider for local inference#322
Conversation
19691c7 to
65f1030
Compare
65f1030 to
9eb6b6e
Compare
|
I skimmed through and this looks in good shape. I'll be a bit slow to review and test over the winter break. |
|
Thanks a lot for making this change, this is exciting |
gsabran
left a comment
There was a problem hiding this comment.
This looks really good. I tested it and after figuring out the Ollama configuration it seems to work.
I think adding a page to cmd's doc portal on how to do the configuration (ie you need to load the models in Ollama first + pointer to how to do this) would help. Should probably suggest restarting cmd when new models are added to Ollama, or to add a sync button on the AI provider setting to reload the settings (the later would be better obviously, but fine to leave out of scope as well)
|
|
||
| extension AIModel { | ||
| public var supportsCompletion: Bool { | ||
| static func modelSupportsCompletion(id: String) -> Bool { |
There was a problem hiding this comment.
nit: the getter feels more ergonomic
There was a problem hiding this comment.
the computed property supportsCompletion got refactored into a value property, see L139. the extension functions in L164 are left overs from the previous logic to derive model features from their name. in my understanding, everything about model features should be handled in local server instead but i decided to keep this out of scope for this pr.
| name: "Ollama", | ||
| executableName: "ollama", | ||
| defaultBaseUrl: URL(string: "http://localhost:11434")!, | ||
| installationInstructions: URL(string: "https://docs.ollama.com/quickstart")!, |
There was a problem hiding this comment.
We should add an entry in cmd's doc, that describe how to configure Ollama with cmd, and from where we can link to https://docs.ollama.com/quickstart
docs are in /docs
There was a problem hiding this comment.
docs added in 5fbe3ba
but i left this link here as is. it's the same pattern as with the external agents, linking to the external docs directly.
| */ | ||
| private async fetchModelDetails(baseUrl: string, modelName: string): Promise<OllamaModelDetails | null> { | ||
| try { | ||
| const response = await fetch(`${baseUrl}/show`, { |
There was a problem hiding this comment.
nit: could we add a pointer to the API ref https://github.com/ollama/ollama/blob/main/docs/api.md#show-model-information in a comment?
| "general.parameter_count"?: number | ||
| "general.size_label"?: string | ||
| "general.license"?: string | ||
| [key: string]: string | number | null | undefined |
There was a problem hiding this comment.
nit:
| [key: string]: string | number | null | undefined | |
| // Ollama's API is not fully typed and some parameters get scoped keys such as `qwen3.context_length` / `llama.context_length` | |
| // For this reason this property is a catch all from where the relevant values will be extracted. | |
| [key: string]: string | number | null | undefined |
|
|
||
| // Extract context length from model_info | ||
| function extractContextLength(modelInfo: Record<string, string | number | null | undefined>): number { | ||
| // Search for any key ending with ".context_length" |
There was a problem hiding this comment.
| // Search for any key ending with ".context_length" | |
| // Search for any key ending with ".context_length". This is because Ollama uses scoped keys (e.g. `qwen3.context_length` / `llama.context_length`) |
contributing.md
Outdated
| export GROQ_LOCAL_SERVER_PROXY="http://localhost:10004/openai/v1" | ||
| export GEMINI_LOCAL_SERVER_PROXY="http://localhost:10005/v1beta" | ||
| export GITHUB_COPILOT_PROXY="http://localhost:9090" | ||
| export OLLAMA_LOCAL_SERVER_PROXY="http://localhost:1006" |
There was a problem hiding this comment.
nit: 10006 to remain consistent.
Thanks for review. I will pick up on your feedback the other day. |
i agree that we should add docs on how to configure the new provider and i will prepare something.
this seems to be a bit of a misunderstanding. ollama will load the respective model automatically when it receives a completion request. the prerequisites to use this provider with cmd are quite simple: install ollama, make sure it's running (default install will setup autostart), install one or more models. i will explain this in the docs.
in fact, there already is an easy way to reload models, i believe. but it's not very obvious to the users. by disabling and enabling the provider again, the model discovery gets triggered and a current list of models gets retrieved. in my humble opinion, the whole provider and model settings could benefit from a good revisit. i just did the minimum for the new provider to fit into the existing mechanics. |
gsabran
left a comment
There was a problem hiding this comment.
Thanks a lot, and sorry for the slow review
|
Thanks for review. 👍 How to fulfill the pending required "Mintlify Deployment" check? |
|
🤷♂️ not sure, merged! |
|
This should be included in the new release |
This introduces an Ollama provider for local inference.
Ollama can be run locally or on a remote endpoint, e.g. a host providing AI in a private network.
This implementation makes use of the Ollama Provider V2 for Vercel AI SDK for the AI interactions and plain HTTP to retrieve available models and their details from a given Ollama endpoint.
It was tested succesfully with the following models:
What's missing here is making capable models available for code completion and maybe filtering models which are not capable of chatting from the user.