Skip to content

One-click toolkit for provisioning servers to deploy and serve Large Language Models (LLMs).

License

Notifications You must be signed in to change notification settings

ritual-net/prime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prime

About | Architecture | Usage | Customization

About

Prime is a one-click toolkit for provisioning servers to deploy and serve Large Language Models (LLMs).

  • Cloud provider agnostic for flexible deployments
  • Fine-grained access controls with ACL permission system
  • Out-of-the-box inference playground support with client-side PII redaction
  • Inference engine powered by text-generation-inference

Architecture

Generally, the idea behind Prime is simple—to act as a managed interface that connects to other inference providing solutions (a la Paperspace, CoreWeave, Fluidstack, etc.).

Technology

System

  1. Users authenticate via passwordless email magic links
  2. Users are approved by admin users
  3. There is a hierarchy of permission ACLs
  4. Users can create inference servers across any ML inference provider that is implemented (extending BaseProvider) a. Currently, only Paperspace is supported.

Supported cloud providers

Paperspace

Paperspace is the only cloud GPU provider currently supported out-of-the-box.

Account

To use Paperspace, you will need an account (email, password) and API key. Note that:

  1. Credentials for each ML provider are shared across all users of a Prime server. In other words, each user cannot use their individual account to deploy machines; only one account is used. Therefore, we highly recommended you only add users you trust.
  2. Paperspace restricts the amount and type of machines a new account is allowed to deploy. Therefore, you will have to request access and limit increases for the types of machines you want to deploy, depending on your use case.

Usage

Environment setup

See .env.example for setting up your environment variables correctly. Put your variables in a file named .env.

cp .env.sample .env
vim .env

For running Prime, you will need: (skip if using Docker)

  • A Postgres database for prime (or use the test one, see Locally)
  • SendGrid credentials (or use the test ones, see Locally)

For deploying TGI to Paperspace, you will need:

  • A Postgres database for TGI logs
  • Optionally, Hugging Face credentials for running gated models (e.g. Llama2) or private ones (i.e. your own org's finetuned models)

Run with Docker

  1. Install Docker
  2. Install Tilt
  3. Run tilt up
  4. (optionally) Press space to open the tilt manager
  5. To shut down all containers, run tilt down

Services will be started at:

If this is your first time logging in, a new user will be created with email [email protected] with ADMIN privileges.

Run without Docker

The docker container makes a new NextJS production build on each save (although optimized for dependency diffing, not as instantaneous as true HMR).

To make use of the NextJS hot-reload and bring your iteration cycles down from ~5s/change -> ~50ms/change, it is better to develop locally.

  1. Either install Postgres (great utility for MAC users is postgres.app) or selectively run the Postgres container. a. Run npx prisma generate to generate the db schema using Prisma
  2. Either slot in SendGrid credentials or selectively run the Mailhog SMTP container.
  3. If you don't have pnpm install via npm i -g pnpm.
  4. Run pnpm install and pnpm run dev

Setup Postgres

Use the following commands to setup your Postgres database, whether running with or without Docker.

Using Prisma: b. Run npx prisma migrate dev --name init to generate the initial migration file. c. Run npx prisma migrate deploy to deploy the migration, creating the necessary tables etc. d. Run npx prisma db seed to create initial admin user for testing.

[Optional] Setup log forwarding database

You can optionally instrument your deployment of text-generation-inference to add logs via fluentbit. You can host a log database anywhere you like, and provide the connection parameters here. After creating the database, execute the following query to create a "fluentbit" table:

-- Table: public.fluentbit
-- DROP TABLE IF EXISTS public.fluentbit;
CREATE TABLE IF NOT EXISTS public.fluentbit
(
  tag character varying COLLATE pg_catalog."default",
  "time" timestamp without time zone,
  data jsonb
)
TABLESPACE pg_default;

Customization

You can easily extend the Prime user interface:

  • We show a limited selection of HuggingFace models that can be deployed via the UI. Depending on the use case, you may want to add or remove some. To enable deploying a public (or private, assuming your key has access to it) model via the UI, add it to the whitelist.
  • We show a limited selection of inference server parameters that can be configured via the UI. Depending on the use case, you may want to add or remove some. To modify or extend the inference server parameters available in the UI, see INFERENCE_OPTIONS and RUN_OPTIONS in tgi.
  • We show a limited selection of Paperspace machines that can be deployed via the UI. Depending on the use case, you may want to add or remove some. To enable deploying other machine types, add them to the whitelist by GPU name.
  • We show a limited selection of Paperspace OS templates that can be deployed via the UI. Depending on the use case, you may want to add or remove some. To enable deploying other Operating Systems templates, add them to the whitelist by id.

License

BSD 3-clause Clear