Generic adapter support in the grpc server #32

joerunde · 2024-05-24T22:53:39Z

Adds support for multi-lora adapters.

Passing tests added over in this PR: https://github.ibm.com/ai-foundation/tgis-deploy-tests/pull/25/files

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2024-05-29T17:50:47Z

vllm/tgis_utils/args.py

@@ -82,6 +82,8 @@ def add_tgis_args(parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
    parser.add_argument('--tls-key-path', type=str)
    # map to ssl_ca_certs
    parser.add_argument('--tls-client-ca-cert-path', type=str)
+    # add a path when lora adapters will be loaded from
+    parser.add_argument('--lora-adapter-cache', type=str)


open to ideas on naming here

vllm/entrypoints/grpc/grpc_server.py

njhill

Thanks @joerunde this looks great! Just minor comments

proto/generation.proto

vllm/entrypoints/grpc/adapters.py

dtrifiro · 2024-05-30T10:34:19Z

vllm/entrypoints/grpc/adapters.py

+    lora_id = request.lora_id
+    if lora_id:
+        if not lora_adapter_store:
+            # using raise/format instead of .error so mypy knows this raises
+            raise ValueError(TGISValidationError.LoraDisabled.value.format())
+
+        local_lora_path = os.path.join(lora_adapter_store.cache_path, lora_id)
+
+        # Do a bit of up-front validation so that we don't ask the engine
+        # to try to load an invalid adapter
+        if not os.path.exists(local_lora_path):
+            TGISValidationError.LoraAdapterNotFound.error(
+                lora_id, "directory does not exist")
+        if not os.path.exists(
+                os.path.join(local_lora_path, "adapter_config.json")):
+            TGISValidationError.LoraAdapterNotFound.error(
+                lora_id, "invalid adapter: no adapter_config.json found")
+
+        # We need to track a unique integer for vLLM to identify the lora
+        # adapters
+        if lora_id not in lora_adapter_store.unique_id_map:
+            lora_adapter_store.unique_id_map[
+                lora_id] = lora_adapter_store.next_unique_id
+            lora_adapter_store.next_unique_id += 1
+        unique_id = lora_adapter_store.unique_id_map[lora_id]
+        lora_request = LoRARequest(lora_name=lora_id,
+                                   lora_int_id=unique_id,
+                                   lora_local_path=local_lora_path)
+    else:
+        lora_request = None
+
+    if request.prefix_id:
+        # TODO: hook up PromptAdapterRequest once implemented in the engine
+        raise ValueError("prefix_id not implemented yet")
+
+    # Second return slot left here for the incoming PromptAdapterRequest
+    # See https://github.com/vllm-project/vllm/pull/4645/files
+    return lora_request, None


How about flattening this a bit?

Suggested change

lora_id = request.lora_id

if lora_id:

if not lora_adapter_store:

# using raise/format instead of .error so mypy knows this raises

raise ValueError(TGISValidationError.LoraDisabled.value.format())

local_lora_path = os.path.join(lora_adapter_store.cache_path, lora_id)

# Do a bit of up-front validation so that we don't ask the engine

# to try to load an invalid adapter

if not os.path.exists(local_lora_path):

TGISValidationError.LoraAdapterNotFound.error(

lora_id, "directory does not exist")

if not os.path.exists(

os.path.join(local_lora_path, "adapter_config.json")):

TGISValidationError.LoraAdapterNotFound.error(

lora_id, "invalid adapter: no adapter_config.json found")

# We need to track a unique integer for vLLM to identify the lora

# adapters

if lora_id not in lora_adapter_store.unique_id_map:

lora_adapter_store.unique_id_map[

lora_id] = lora_adapter_store.next_unique_id

lora_adapter_store.next_unique_id += 1

unique_id = lora_adapter_store.unique_id_map[lora_id]

lora_request = LoRARequest(lora_name=lora_id,

lora_int_id=unique_id,

lora_local_path=local_lora_path)

else:

lora_request = None

if request.prefix_id:

# TODO: hook up PromptAdapterRequest once implemented in the engine

raise ValueError("prefix_id not implemented yet")

# Second return slot left here for the incoming PromptAdapterRequest

# See https://github.com/vllm-project/vllm/pull/4645/files

return lora_request, None

if request.prefix_id:

# TODO: hook up PromptAdapterRequest once implemented in the engine

raise ValueError("prefix_id not implemented yet")

lora_id = request.lora_id

if not lora_id:

return None, None

if not lora_adapter_store:

# using raise/format instead of .error so mypy knows this raises

raise ValueError(TGISValidationError.LoraDisabled.value.format())

local_lora_path = os.path.join(lora_adapter_store.cache_path, lora_id)

# Do a bit of up-front validation so that we don't ask the engine

# to try to load an invalid adapter

if not os.path.exists(local_lora_path):

TGISValidationError.LoraAdapterNotFound.error(

lora_id, "directory does not exist")

if not os.path.exists(

os.path.join(local_lora_path, "adapter_config.json")):

TGISValidationError.LoraAdapterNotFound.error(

lora_id, "invalid adapter: no adapter_config.json found")

# We need to track a unique integer for vLLM to identify the lora

# adapters

if lora_id not in lora_adapter_store.unique_id_map:

lora_adapter_store.unique_id_map[

lora_id] = lora_adapter_store.next_unique_id

lora_adapter_store.next_unique_id += 1

unique_id = lora_adapter_store.unique_id_map[lora_id]

lora_request = LoRARequest(lora_name=lora_id,

lora_int_id=unique_id,

lora_local_path=local_lora_path)

# Second return slot left here for the incoming PromptAdapterRequest

# See https://github.com/vllm-project/vllm/pull/4645/files

return lora_request, None

Hah, I un-nested but then re-nested so that the file checking and opening will only happen if the adapter wasn't already loaded

Signed-off-by: Joe Runde <[email protected]>

njhill · 2024-06-03T16:10:44Z

vllm/entrypoints/grpc/grpc_server.py

@@ -224,7 +224,7 @@ async def GenerateStream(
            sampling_params, truncate_input_tokens, request.request.text,
            context)

-        lora_request, _ = await self._validate_adapters(request, context)
+        adapter_kwargs, _ = await self._validate_adapters(request, context)


Not a tuple now right?

oh yeah, totally not. Interestingly python seems totally fine with the unpacking mismatch if you leave an underscore, TIL

njhill · 2024-06-03T16:14:14Z

vllm/entrypoints/grpc/adapters.py

+            TGISValidationError.AdapterNotFound.error(
+                adapter_id, "invalid adapter: no adapter_config.json found")
+
+        # NB: blocks event loop


I think this will be important to address - to remove the all the file access from the event loop

Yeah, I looked into this a bit and it sounds like the asyncio file access in third party libs is... not very good.

I'm not 100% up to speed on event loops, would we want to make a new executor for this sorta like

file_load_executor = ThreadPoolExecutor(max_workers=n) task = _load_the_config_json_file(...) await loop.run_in_exeuctor(task, file_load_executor)

or would that just also block the loop?

Yeah exactly .. probably should just make that function be the all the code that's run if we don't find adapter in the dict (i.e. checking on disk, loading it etc).

There's a default asyncio executor that can be used for this kind of thing, or we may want a static one rather than creating one on the fly (not that you were necessarily suggesting that).

Cool, I'll see if I can get that working quickly

@njhill can I get a run from your static analysis on this change?

Signed-off-by: Joe Runde <[email protected]>

maxdebayser · 2024-06-11T20:20:09Z

vllm/entrypoints/grpc/adapters.py

+    # If not already cached, we need to validate that files exist and
+    # grab the type out of the adapter_config.json file
+    if (adapter_metadata := adapter_store.adapters.get(adapter_id)) is None:
+        local_adapter_path = os.path.join(adapter_store.cache_path, adapter_id)


I think we should sanitize the adapter_id here to make sure that the user can't send funny things like ../../../etc/passwd.

maxdebayser

I've left a comment suggesting a security improvement, but otherwise it looks good to me.

Signed-off-by: Joe Runde <[email protected]>

This PR simplifies the model loading taking advantage of the new functionality of `get_model()` from `fms.models`. The current implementation automatically infers `architecture` and `variant` from a given `model_path` pointing to directory with weights in **hf** (hugging face) format. ### Changes: - replacing as_fms_model() by get_model() for **hf** models. - removing if condition for **meta** weights Note: make sure to use the **hf** format of the weights for model **7B-F** (checkpoint trained by meta) from now on...

joerunde added 3 commits May 28, 2024 16:04

⚗️ draft lora changes

0e0f149

Signed-off-by: Joe Runde <[email protected]>

✨ add validation for LoRA requests

8ae710f

Signed-off-by: Joe Runde <[email protected]>

🔊 add lora_id to request logs

a4116d9

Signed-off-by: Joe Runde <[email protected]>

joerunde force-pushed the lora-stuff branch from 5fdb31a to a4116d9 Compare May 28, 2024 22:38

joerunde commented May 29, 2024

View reviewed changes

joerunde marked this pull request as ready for review May 29, 2024 17:52

prashantgupta24 reviewed May 29, 2024

View reviewed changes

vllm/entrypoints/grpc/grpc_server.py Show resolved Hide resolved

prashantgupta24 reviewed May 29, 2024

View reviewed changes

vllm/entrypoints/grpc/grpc_server.py Outdated Show resolved Hide resolved

njhill reviewed May 29, 2024

View reviewed changes

proto/generation.proto Outdated Show resolved Hide resolved

vllm/entrypoints/grpc/adapters.py Outdated Show resolved Hide resolved

dtrifiro reviewed May 30, 2024

View reviewed changes

joerunde changed the title ~~Lora stuff~~ Generic adapter support in the grpc server Jun 3, 2024

joerunde added 2 commits June 3, 2024 09:14

♻️ lora_id -> adapter_id

b9c7a45

Signed-off-by: Joe Runde <[email protected]>

♻️ refactoring suggestions from Nick

5c1b09a

Signed-off-by: Joe Runde <[email protected]>

njhill reviewed Jun 3, 2024

View reviewed changes

joerunde added 2 commits June 3, 2024 11:24

♻️ move file loads into separate threadpool

7ebed78

Signed-off-by: Joe Runde <[email protected]>

Merge branch 'main' into lora-main

e2be418

maxdebayser reviewed Jun 11, 2024

View reviewed changes

maxdebayser approved these changes Jun 11, 2024

View reviewed changes

🔒 deny path traversal

c2bb957

Signed-off-by: Joe Runde <[email protected]>

joerunde merged commit 79b7364 into main Jun 11, 2024
15 checks passed

njhill deleted the lora-stuff branch June 13, 2024 16:42

Uh oh!

Generic adapter support in the grpc server #32

Generic adapter support in the grpc server #32

Uh oh!

Conversation

joerunde commented May 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

joerunde commented May 24, 2024 •

edited

Loading