About | Architecture | Usage | Customization
Prime is a one-click toolkit for provisioning servers to deploy and serve Large Language Models (LLMs).
- Cloud provider agnostic for flexible deployments
- Fine-grained access controls with ACL permission system
- Out-of-the-box inference playground support with client-side PII redaction
- Inference engine powered by text-generation-inference
Generally, the idea behind Prime is simple—to act as a managed interface that connects to other inference providing solutions (a la Paperspace, CoreWeave, Fluidstack, etc.).
- TypeScript
- Frontend:
- NextJS as React framework
- TailwindCSS as CSS framework
- shadcn/ui as component library
- Backend:
- NextJS serverless functions
- Prisma as database ORM connecting to Postgres
- Users authenticate via passwordless email magic links
- Users are approved by admin users
- There is a hierarchy of permission ACLs
- Users can create inference servers across any ML inference provider that is implemented (extending BaseProvider) a. Currently, only Paperspace is supported.
Paperspace is the only cloud GPU provider currently supported out-of-the-box.
To use Paperspace, you will need an account (email, password) and API key. Note that:
- Credentials for each ML provider are shared across all users of a Prime server. In other words, each user cannot use their individual account to deploy machines; only one account is used. Therefore, we highly recommended you only add users you trust.
- Paperspace restricts the amount and type of machines a new account is allowed to deploy. Therefore, you will have to request access and limit increases for the types of machines you want to deploy, depending on your use case.
See .env.example for setting up your environment variables correctly. Put your variables in a file named .env
.
cp .env.sample .env
vim .env
For running Prime, you will need: (skip if using Docker)
- A Postgres database for prime (or use the test one, see Locally)
- SendGrid credentials (or use the test ones, see Locally)
For deploying TGI to Paperspace, you will need:
- A Postgres database for TGI logs
- See Log Database setup below
- Optionally, Hugging Face credentials for running gated models (e.g. Llama2) or private ones (i.e. your own org's finetuned models)
- Install Docker
- Install Tilt
- Run
tilt up
- (optionally) Press space to open the tilt manager
- To shut down all containers, run
tilt down
Services will be started at:
- Tilt manager: localhost:10350
- Frontend: localhost:3000
- Backend: localhost:3000/api
- MailHog UI: localhost:8025
- Postgres:
localhost:5432
(user:postgres
, pw:postgres
)
If this is your first time logging in, a new user will be created with email [email protected]
with ADMIN privileges.
The docker container makes a new NextJS production build on each save (although optimized for dependency diffing, not as instantaneous as true HMR).
To make use of the NextJS hot-reload and bring your iteration cycles down from ~5s/change -> ~50ms/change, it is better to develop locally.
- Either install Postgres (great utility for MAC users is postgres.app) or selectively run the Postgres container.
a. Run
npx prisma generate
to generate the db schema using Prisma - Either slot in SendGrid credentials or selectively run the Mailhog SMTP container.
- If you don't have
pnpm
install vianpm i -g pnpm
. - Run
pnpm install
andpnpm run dev
Use the following commands to setup your Postgres database, whether running with or without Docker.
Using Prisma:
b. Run npx prisma migrate dev --name init
to generate the initial migration file.
c. Run npx prisma migrate deploy
to deploy the migration, creating the necessary tables etc.
d. Run npx prisma db seed
to create initial admin user for testing.
You can optionally instrument your deployment of text-generation-inference to add logs via fluentbit. You can host a log database anywhere you like, and provide the connection parameters here. After creating the database, execute the following query to create a "fluentbit"
table:
-- Table: public.fluentbit
-- DROP TABLE IF EXISTS public.fluentbit;
CREATE TABLE IF NOT EXISTS public.fluentbit
(
tag character varying COLLATE pg_catalog."default",
"time" timestamp without time zone,
data jsonb
)
TABLESPACE pg_default;
You can easily extend the Prime user interface:
- We show a limited selection of HuggingFace models that can be deployed via the UI. Depending on the use case, you may want to add or remove some. To enable deploying a public (or private, assuming your key has access to it) model via the UI, add it to the whitelist.
- We show a limited selection of inference server parameters that can be configured via the UI. Depending on the use case, you may want to add or remove some. To modify or extend the inference server parameters available in the UI, see
INFERENCE_OPTIONS
andRUN_OPTIONS
in tgi. - We show a limited selection of Paperspace machines that can be deployed via the UI. Depending on the use case, you may want to add or remove some. To enable deploying other machine types, add them to the whitelist by GPU name.
- We show a limited selection of Paperspace OS templates that can be deployed via the UI. Depending on the use case, you may want to add or remove some. To enable deploying other Operating Systems templates, add them to the whitelist by id.