Skip to content

Commit da95bcf

Browse files
jacoobesapage43manyosoqnixsynapsecebtenzzre
authored
vulkan support for typescript bindings, gguf support (#1390)
* adding some native methods to cpp wrapper * gpu seems to work * typings and add availibleGpus method * fix spelling * fix syntax * more * normalize methods to conform to py * remove extra dynamic linker deps when building with vulkan * bump python version (library linking fix) * Don't link against libvulkan. * vulkan python bindings on windows fixes * Bring the vulkan backend to the GUI. * When device is Auto (the default) then we will only consider discrete GPU's otherwise fallback to CPU. * Show the device we're currently using. * Fix up the name and formatting. * init at most one vulkan device, submodule update fixes issues w/ multiple of the same gpu * Update the submodule. * Add version 2.4.15 and bump the version number. * Fix a bug where we're not properly falling back to CPU. * Sync to a newer version of llama.cpp with bugfix for vulkan. * Report the actual device we're using. * Only show GPU when we're actually using it. * Bump to new llama with new bugfix. * Release notes for v2.4.16 and bump the version. * Fallback to CPU more robustly. * Release notes for v2.4.17 and bump the version. * Bump the Python version to python-v1.0.12 to restrict the quants that vulkan recognizes. * Link against ggml in bin so we can get the available devices without loading a model. * Send actual and requested device info for those who have opt-in. * Actually bump the version. * Release notes for v2.4.18 and bump the version. * Fix for crashes on systems where vulkan is not installed properly. * Release notes for v2.4.19 and bump the version. * fix typings and vulkan build works on win * Add flatpak manifest * Remove unnecessary stuffs from manifest * Update to 2.4.19 * appdata: update software description * Latest rebase on llama.cpp with gguf support. * macos build fixes * llamamodel: metal supports all quantization types now * gpt4all.py: GGUF * pyllmodel: print specific error message * backend: port BERT to GGUF * backend: port MPT to GGUF * backend: port Replit to GGUF * backend: use gguf branch of llama.cpp-mainline * backend: use llamamodel.cpp for StarCoder * conversion scripts: cleanup * convert scripts: load model as late as possible * convert_mpt_hf_to_gguf.py: better tokenizer decoding * backend: use llamamodel.cpp for Falcon * convert scripts: make them directly executable * fix references to removed model types * modellist: fix the system prompt * backend: port GPT-J to GGUF * gpt-j: update inference to match latest llama.cpp insights - Use F16 KV cache - Store transposed V in the cache - Avoid unnecessary Q copy Co-authored-by: Georgi Gerganov <[email protected]> ggml upstream commit 0265f0813492602fec0e1159fe61de1bf0ccaf78 * chatllm: grammar fix * convert scripts: use bytes_to_unicode from transformers * convert scripts: make gptj script executable * convert scripts: add feed-forward length for better compatiblilty This GGUF key is used by all llama.cpp models with upstream support. * gptj: remove unused variables * Refactor for subgroups on mat * vec kernel. * Add q6_k kernels for vulkan. * python binding: print debug message to stderr * Fix regenerate button to be deterministic and bump the llama version to latest we have for gguf. * Bump to the latest fixes for vulkan in llama. * llamamodel: fix static vector in LLamaModel::endTokens * Switch to new models2.json for new gguf release and bump our version to 2.5.0. * Bump to latest llama/gguf branch. * chat: report reason for fallback to CPU * chat: make sure to clear fallback reason on success * more accurate fallback descriptions * differentiate between init failure and unsupported models * backend: do not use Vulkan with non-LLaMA models * Add q8_0 kernels to kompute shaders and bump to latest llama/gguf. * backend: fix build with Visual Studio generator Use the $<CONFIG> generator expression instead of CMAKE_BUILD_TYPE. This is needed because Visual Studio is a multi-configuration generator, so we do not know what the build type will be until `cmake --build` is called. Fixes #1470 * remove old llama.cpp submodules * Reorder and refresh our models2.json. * rebase on newer llama.cpp * python/embed4all: use gguf model, allow passing kwargs/overriding model * Add starcoder, rift and sbert to our models2.json. * Push a new version number for llmodel backend now that it is based on gguf. * fix stray comma in models2.json Signed-off-by: Aaron Miller <[email protected]> * Speculative fix for build on mac. * chat: clearer CPU fallback messages * Fix crasher with an empty string for prompt template. * Update the language here to avoid misunderstanding. * added EM German Mistral Model * make codespell happy * issue template: remove "Related Components" section * cmake: install the GPT-J plugin (#1487) * Do not delete saved chats if we fail to serialize properly. * Restore state from text if necessary. * Another codespell attempted fix. * llmodel: do not call magic_match unless build variant is correct (#1488) * chatllm: do not write uninitialized data to stream (#1486) * mat*mat for q4_0, q8_0 * do not process prompts on gpu yet * python: support Path in GPT4All.__init__ (#1462) * llmodel: print an error if the CPU does not support AVX (#1499) * python bindings should be quiet by default * disable llama.cpp logging unless GPT4ALL_VERBOSE_LLAMACPP envvar is nonempty * make verbose flag for retrieve_model default false (but also be overridable via gpt4all constructor) should be able to run a basic test: ```python import gpt4all model = gpt4all.GPT4All('/Users/aaron/Downloads/rift-coder-v0-7b-q4_0.gguf') print(model.generate('def fib(n):')) ``` and see no non-model output when successful * python: always check status code of HTTP responses (#1502) * Always save chats to disk, but save them as text by default. This also changes the UI behavior to always open a 'New Chat' and setting it as current instead of setting a restored chat as current. This improves usability by not requiring the user to wait if they want to immediately start chatting. * Update README.md Signed-off-by: umarmnaq <[email protected]> * fix embed4all filename https://discordapp.com/channels/1076964370942267462/1093558720690143283/1161778216462192692 Signed-off-by: Aaron Miller <[email protected]> * Improves Java API signatures maintaining back compatibility * python: replace deprecated pkg_resources with importlib (#1505) * Updated chat wishlist (#1351) * q6k, q4_1 mat*mat * update mini-orca 3b to gguf2, license Signed-off-by: Aaron Miller <[email protected]> * convert scripts: fix AutoConfig typo (#1512) * publish config https://docs.npmjs.com/cli/v9/configuring-npm/package-json#publishconfig (#1375) merge into my branch * fix appendBin * fix gpu not initializing first * sync up * progress, still wip on destructor * some detection work * untested dispose method * add js side of dispose * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <[email protected]> Signed-off-by: Jacob Nguyen <[email protected]> * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <[email protected]> Signed-off-by: Jacob Nguyen <[email protected]> * Update gpt4all-bindings/typescript/index.cc Co-authored-by: cebtenzzre <[email protected]> Signed-off-by: Jacob Nguyen <[email protected]> * Update gpt4all-bindings/typescript/src/gpt4all.d.ts Co-authored-by: cebtenzzre <[email protected]> Signed-off-by: Jacob Nguyen <[email protected]> * Update gpt4all-bindings/typescript/src/gpt4all.js Co-authored-by: cebtenzzre <[email protected]> Signed-off-by: Jacob Nguyen <[email protected]> * Update gpt4all-bindings/typescript/src/util.js Co-authored-by: cebtenzzre <[email protected]> Signed-off-by: Jacob Nguyen <[email protected]> * fix tests * fix circleci for nodejs * bump version --------- Signed-off-by: Aaron Miller <[email protected]> Signed-off-by: umarmnaq <[email protected]> Signed-off-by: Jacob Nguyen <[email protected]> Co-authored-by: Aaron Miller <[email protected]> Co-authored-by: Adam Treat <[email protected]> Co-authored-by: Akarshan Biswas <[email protected]> Co-authored-by: Cebtenzzre <[email protected]> Co-authored-by: Jan Philipp Harries <[email protected]> Co-authored-by: umarmnaq <[email protected]> Co-authored-by: Alex Soto <[email protected]> Co-authored-by: niansa/tuxifan <[email protected]>
1 parent 64101d3 commit da95bcf

17 files changed

+5882
-4347
lines changed

.circleci/continue_config.yml

+3-1
Original file line numberDiff line numberDiff line change
@@ -856,6 +856,7 @@ jobs:
856856
- node/install-packages:
857857
app-dir: gpt4all-bindings/typescript
858858
pkg-manager: yarn
859+
override-ci-command: yarn install
859860
- run:
860861
command: |
861862
cd gpt4all-bindings/typescript
@@ -885,6 +886,7 @@ jobs:
885886
- node/install-packages:
886887
app-dir: gpt4all-bindings/typescript
887888
pkg-manager: yarn
889+
override-ci-command: yarn install
888890
- run:
889891
command: |
890892
cd gpt4all-bindings/typescript
@@ -994,7 +996,7 @@ jobs:
994996
command: |
995997
cd gpt4all-bindings/typescript
996998
npm set //registry.npmjs.org/:_authToken=$NPM_TOKEN
997-
npm publish --access public --tag alpha
999+
npm publish
9981000
9991001
workflows:
10001002
version: 2
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
nodeLinker: node-modules

gpt4all-bindings/typescript/README.md

-3
Original file line numberDiff line numberDiff line change
@@ -75,15 +75,12 @@ cd gpt4all-bindings/typescript
7575
```sh
7676
yarn
7777
```
78-
7978
* llama.cpp git submodule for gpt4all can be possibly absent. If this is the case, make sure to run in llama.cpp parent directory
8079

8180
```sh
8281
git submodule update --init --depth 1 --recursive
8382
```
8483

85-
**AS OF NEW BACKEND** to build the backend,
86-
8784
```sh
8885
yarn build:backend
8986
```

gpt4all-bindings/typescript/index.cc

+121-15
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
#include "index.h"
22

3-
Napi::FunctionReference NodeModelWrapper::constructor;
43

54
Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
65
Napi::Function self = DefineClass(env, "LLModel", {
@@ -13,14 +12,64 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
1312
InstanceMethod("embed", &NodeModelWrapper::GenerateEmbedding),
1413
InstanceMethod("threadCount", &NodeModelWrapper::ThreadCount),
1514
InstanceMethod("getLibraryPath", &NodeModelWrapper::GetLibraryPath),
15+
InstanceMethod("initGpuByString", &NodeModelWrapper::InitGpuByString),
16+
InstanceMethod("hasGpuDevice", &NodeModelWrapper::HasGpuDevice),
17+
InstanceMethod("listGpu", &NodeModelWrapper::GetGpuDevices),
18+
InstanceMethod("memoryNeeded", &NodeModelWrapper::GetRequiredMemory),
19+
InstanceMethod("dispose", &NodeModelWrapper::Dispose)
1620
});
1721
// Keep a static reference to the constructor
1822
//
19-
constructor = Napi::Persistent(self);
20-
constructor.SuppressDestruct();
23+
Napi::FunctionReference* constructor = new Napi::FunctionReference();
24+
*constructor = Napi::Persistent(self);
25+
env.SetInstanceData(constructor);
2126
return self;
27+
}
28+
Napi::Value NodeModelWrapper::GetRequiredMemory(const Napi::CallbackInfo& info)
29+
{
30+
auto env = info.Env();
31+
return Napi::Number::New(env, static_cast<uint32_t>( llmodel_required_mem(GetInference(), full_model_path.c_str()) ));
32+
33+
}
34+
Napi::Value NodeModelWrapper::GetGpuDevices(const Napi::CallbackInfo& info)
35+
{
36+
auto env = info.Env();
37+
int num_devices = 0;
38+
auto mem_size = llmodel_required_mem(GetInference(), full_model_path.c_str());
39+
llmodel_gpu_device* all_devices = llmodel_available_gpu_devices(GetInference(), mem_size, &num_devices);
40+
if(all_devices == nullptr) {
41+
Napi::Error::New(
42+
env,
43+
"Unable to retrieve list of all GPU devices"
44+
).ThrowAsJavaScriptException();
45+
return env.Undefined();
46+
}
47+
auto js_array = Napi::Array::New(env, num_devices);
48+
for(int i = 0; i < num_devices; ++i) {
49+
auto gpu_device = all_devices[i];
50+
/*
51+
*
52+
* struct llmodel_gpu_device {
53+
int index = 0;
54+
int type = 0; // same as VkPhysicalDeviceType
55+
size_t heapSize = 0;
56+
const char * name;
57+
const char * vendor;
58+
};
59+
*
60+
*/
61+
Napi::Object js_gpu_device = Napi::Object::New(env);
62+
js_gpu_device["index"] = uint32_t(gpu_device.index);
63+
js_gpu_device["type"] = uint32_t(gpu_device.type);
64+
js_gpu_device["heapSize"] = static_cast<uint32_t>( gpu_device.heapSize );
65+
js_gpu_device["name"]= gpu_device.name;
66+
js_gpu_device["vendor"] = gpu_device.vendor;
67+
68+
js_array[i] = js_gpu_device;
69+
}
70+
return js_array;
2271
}
23-
72+
2473
Napi::Value NodeModelWrapper::getType(const Napi::CallbackInfo& info)
2574
{
2675
if(type.empty()) {
@@ -29,15 +78,41 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
2978
return Napi::String::New(info.Env(), type);
3079
}
3180

81+
Napi::Value NodeModelWrapper::InitGpuByString(const Napi::CallbackInfo& info)
82+
{
83+
auto env = info.Env();
84+
uint32_t memory_required = info[0].As<Napi::Number>();
85+
86+
std::string gpu_device_identifier = info[1].As<Napi::String>();
87+
88+
size_t converted_value;
89+
if(memory_required <= std::numeric_limits<size_t>::max()) {
90+
converted_value = static_cast<size_t>(memory_required);
91+
} else {
92+
Napi::Error::New(
93+
env,
94+
"invalid number for memory size. Exceeded bounds for memory."
95+
).ThrowAsJavaScriptException();
96+
return env.Undefined();
97+
}
98+
99+
auto result = llmodel_gpu_init_gpu_device_by_string(GetInference(), converted_value, gpu_device_identifier.c_str());
100+
return Napi::Boolean::New(env, result);
101+
}
102+
Napi::Value NodeModelWrapper::HasGpuDevice(const Napi::CallbackInfo& info)
103+
{
104+
return Napi::Boolean::New(info.Env(), llmodel_has_gpu_device(GetInference()));
105+
}
106+
32107
NodeModelWrapper::NodeModelWrapper(const Napi::CallbackInfo& info) : Napi::ObjectWrap<NodeModelWrapper>(info)
33108
{
34109
auto env = info.Env();
35110
fs::path model_path;
36111

37-
std::string full_weight_path;
38-
//todo
39-
std::string library_path = ".";
40-
std::string model_name;
112+
std::string full_weight_path,
113+
library_path = ".",
114+
model_name,
115+
device;
41116
if(info[0].IsString()) {
42117
model_path = info[0].As<Napi::String>().Utf8Value();
43118
full_weight_path = model_path.string();
@@ -56,13 +131,14 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
56131
} else {
57132
library_path = ".";
58133
}
134+
device = config_object.Get("device").As<Napi::String>();
59135
}
60136
llmodel_set_implementation_search_path(library_path.c_str());
61137
llmodel_error e = {
62138
.message="looks good to me",
63139
.code=0,
64140
};
65-
inference_ = std::make_shared<llmodel_model>(llmodel_model_create2(full_weight_path.c_str(), "auto", &e));
141+
inference_ = llmodel_model_create2(full_weight_path.c_str(), "auto", &e);
66142
if(e.code != 0) {
67143
Napi::Error::New(env, e.message).ThrowAsJavaScriptException();
68144
return;
@@ -74,18 +150,45 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
74150
Napi::Error::New(env, "Had an issue creating llmodel object, inference is null").ThrowAsJavaScriptException();
75151
return;
76152
}
153+
if(device != "cpu") {
154+
size_t mem = llmodel_required_mem(GetInference(), full_weight_path.c_str());
155+
if(mem == 0) {
156+
std::cout << "WARNING: no memory needed. does this model support gpu?\n";
157+
}
158+
std::cout << "Initiating GPU\n";
159+
std::cout << "Memory required estimation: " << mem << "\n";
160+
161+
auto success = llmodel_gpu_init_gpu_device_by_string(GetInference(), mem, device.c_str());
162+
if(success) {
163+
std::cout << "GPU init successfully\n";
164+
} else {
165+
std::cout << "WARNING: Failed to init GPU\n";
166+
}
167+
}
77168

78169
auto success = llmodel_loadModel(GetInference(), full_weight_path.c_str());
79170
if(!success) {
80171
Napi::Error::New(env, "Failed to load model at given path").ThrowAsJavaScriptException();
81172
return;
82173
}
174+
83175
name = model_name.empty() ? model_path.filename().string() : model_name;
176+
full_model_path = full_weight_path;
84177
};
85-
//NodeModelWrapper::~NodeModelWrapper() {
86-
//GetInference().reset();
87-
//}
88178

179+
// NodeModelWrapper::~NodeModelWrapper() {
180+
// if(GetInference() != nullptr) {
181+
// std::cout << "Debug: deleting model\n";
182+
// llmodel_model_destroy(inference_);
183+
// std::cout << (inference_ == nullptr);
184+
// }
185+
// }
186+
// void NodeModelWrapper::Finalize(Napi::Env env) {
187+
// if(inference_ != nullptr) {
188+
// std::cout << "Debug: deleting model\n";
189+
//
190+
// }
191+
// }
89192
Napi::Value NodeModelWrapper::IsModelLoaded(const Napi::CallbackInfo& info) {
90193
return Napi::Boolean::New(info.Env(), llmodel_isModelLoaded(GetInference()));
91194
}
@@ -193,8 +296,9 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
193296
std::string copiedQuestion = question;
194297
PromptWorkContext pc = {
195298
copiedQuestion,
196-
std::ref(inference_),
299+
inference_,
197300
copiedPrompt,
301+
""
198302
};
199303
auto threadSafeContext = new TsfnContext(env, pc);
200304
threadSafeContext->tsfn = Napi::ThreadSafeFunction::New(
@@ -210,7 +314,9 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
210314
threadSafeContext->nativeThread = std::thread(threadEntry, threadSafeContext);
211315
return threadSafeContext->deferred_.Promise();
212316
}
213-
317+
void NodeModelWrapper::Dispose(const Napi::CallbackInfo& info) {
318+
llmodel_model_destroy(inference_);
319+
}
214320
void NodeModelWrapper::SetThreadCount(const Napi::CallbackInfo& info) {
215321
if(info[0].IsNumber()) {
216322
llmodel_setThreadCount(GetInference(), info[0].As<Napi::Number>().Int64Value());
@@ -233,7 +339,7 @@ Napi::Function NodeModelWrapper::GetClass(Napi::Env env) {
233339
}
234340

235341
llmodel_model NodeModelWrapper::GetInference() {
236-
return *inference_;
342+
return inference_;
237343
}
238344

239345
//Exports Bindings

gpt4all-bindings/typescript/index.h

+12-3
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,33 @@
66
#include <atomic>
77
#include <memory>
88
#include <filesystem>
9+
#include <set>
910
namespace fs = std::filesystem;
1011

12+
1113
class NodeModelWrapper: public Napi::ObjectWrap<NodeModelWrapper> {
1214
public:
1315
NodeModelWrapper(const Napi::CallbackInfo &);
14-
//~NodeModelWrapper();
16+
//virtual ~NodeModelWrapper();
1517
Napi::Value getType(const Napi::CallbackInfo& info);
1618
Napi::Value IsModelLoaded(const Napi::CallbackInfo& info);
1719
Napi::Value StateSize(const Napi::CallbackInfo& info);
20+
//void Finalize(Napi::Env env) override;
1821
/**
1922
* Prompting the model. This entails spawning a new thread and adding the response tokens
2023
* into a thread local string variable.
2124
*/
2225
Napi::Value Prompt(const Napi::CallbackInfo& info);
2326
void SetThreadCount(const Napi::CallbackInfo& info);
27+
void Dispose(const Napi::CallbackInfo& info);
2428
Napi::Value getName(const Napi::CallbackInfo& info);
2529
Napi::Value ThreadCount(const Napi::CallbackInfo& info);
2630
Napi::Value GenerateEmbedding(const Napi::CallbackInfo& info);
31+
Napi::Value HasGpuDevice(const Napi::CallbackInfo& info);
32+
Napi::Value ListGpus(const Napi::CallbackInfo& info);
33+
Napi::Value InitGpuByString(const Napi::CallbackInfo& info);
34+
Napi::Value GetRequiredMemory(const Napi::CallbackInfo& info);
35+
Napi::Value GetGpuDevices(const Napi::CallbackInfo& info);
2736
/*
2837
* The path that is used to search for the dynamic libraries
2938
*/
@@ -37,10 +46,10 @@ class NodeModelWrapper: public Napi::ObjectWrap<NodeModelWrapper> {
3746
/**
3847
* The underlying inference that interfaces with the C interface
3948
*/
40-
std::shared_ptr<llmodel_model> inference_;
49+
llmodel_model inference_;
4150

4251
std::string type;
4352
// corresponds to LLModel::name() in typescript
4453
std::string name;
45-
static Napi::FunctionReference constructor;
54+
std::string full_model_path;
4655
};

gpt4all-bindings/typescript/package.json

+6-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "gpt4all",
3-
"version": "2.2.0",
3+
"version": "3.0.0",
44
"packageManager": "[email protected]",
55
"main": "src/gpt4all.js",
66
"repository": "nomic-ai/gpt4all",
@@ -47,5 +47,10 @@
4747
},
4848
"jest": {
4949
"verbose": true
50+
},
51+
"publishConfig": {
52+
"registry": "https://registry.npmjs.org/",
53+
"access": "public",
54+
"tag": "latest"
5055
}
5156
}

gpt4all-bindings/typescript/prompt.cc

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ void threadEntry(TsfnContext* context) {
3030
context->tsfn.BlockingCall(&context->pc,
3131
[](Napi::Env env, Napi::Function jsCallback, PromptWorkContext* pc) {
3232
llmodel_prompt(
33-
*pc->inference_,
33+
pc->inference_,
3434
pc->question.c_str(),
3535
&prompt_callback,
3636
&response_callback,

gpt4all-bindings/typescript/prompt.h

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
#include <memory>
1111
struct PromptWorkContext {
1212
std::string question;
13-
std::shared_ptr<llmodel_model>& inference_;
13+
llmodel_model inference_;
1414
llmodel_prompt_context prompt_params;
1515
std::string res;
1616

gpt4all-bindings/typescript/spec/chat.mjs

+10-6
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY, loadModel } from '../src/gpt4all.js'
22

33
const model = await loadModel(
4-
'orca-mini-3b-gguf2-q4_0.gguf',
5-
{ verbose: true }
4+
'mistral-7b-openorca.Q4_0.gguf',
5+
{ verbose: true, device: 'gpu' }
66
);
77
const ll = model.llm;
88

@@ -26,7 +26,9 @@ console.log("name " + ll.name());
2626
console.log("type: " + ll.type());
2727
console.log("Default directory for models", DEFAULT_DIRECTORY);
2828
console.log("Default directory for libraries", DEFAULT_LIBRARIES_DIRECTORY);
29-
29+
console.log("Has GPU", ll.hasGpuDevice());
30+
console.log("gpu devices", ll.listGpu())
31+
console.log("Required Mem in bytes", ll.memoryNeeded())
3032
const completion1 = await createCompletion(model, [
3133
{ role : 'system', content: 'You are an advanced mathematician.' },
3234
{ role : 'user', content: 'What is 1 + 1?' },
@@ -40,23 +42,25 @@ const completion2 = await createCompletion(model, [
4042

4143
console.log(completion2.choices[0].message)
4244

45+
//CALLING DISPOSE WILL INVALID THE NATIVE MODEL. USE THIS TO CLEANUP
46+
model.dispose()
4347
// At the moment, from testing this code, concurrent model prompting is not possible.
4448
// Behavior: The last prompt gets answered, but the rest are cancelled
4549
// my experience with threading is not the best, so if anyone who is good is willing to give this a shot,
4650
// maybe this is possible
4751
// INFO: threading with llama.cpp is not the best maybe not even possible, so this will be left here as reference
4852

4953
//const responses = await Promise.all([
50-
// createCompletion(ll, [
54+
// createCompletion(model, [
5155
// { role : 'system', content: 'You are an advanced mathematician.' },
5256
// { role : 'user', content: 'What is 1 + 1?' },
5357
// ], { verbose: true }),
54-
// createCompletion(ll, [
58+
// createCompletion(model, [
5559
// { role : 'system', content: 'You are an advanced mathematician.' },
5660
// { role : 'user', content: 'What is 1 + 1?' },
5761
// ], { verbose: true }),
5862
//
59-
//createCompletion(ll, [
63+
//createCompletion(model, [
6064
// { role : 'system', content: 'You are an advanced mathematician.' },
6165
// { role : 'user', content: 'What is 1 + 1?' },
6266
//], { verbose: true })

0 commit comments

Comments
 (0)