Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
serve.yaml		serve.yaml

README.md

Gemma: Open-source Gemini

Google released Gemma and has made a big wave in the AI community. It opens the opportunity for the open-source community to serve and finetune private Gemini.

Serve Gemma on any Cloud

Serving Gemma on any cloud is easy with SkyPilot. With serve.yaml in this directory, you host the model on any cloud with a single command.

Prerequsites

Apply for access to the Gemma model

Go to the application page and click Acknowledge license to apply for access to the model weights.

Get the access token from huggingface

Generate a read-only access token on huggingface here, and make sure your huggingface account can access the Gemma models here.

Install SkyPilot

pip install "skypilot-nightly[all]"

For detailed installation instructions, please refer to the installation guide.

Host on a Single Instance

We can host the model with a single instance:

HF_TOKEN="xxx" sky launch -c gemma serve.yaml --env HF_TOKEN

After the cluster is launched, we can access the model with the following command:

IP=$(sky status --ip gemma)

curl http://$IP:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "google/gemma-7b-it",
      "prompt": "My favourite condiment is",
      "max_tokens": 25
  }' | jq .

Chat API is also supported:

IP=$(sky status --ip gemma)

curl http://$IP:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "google/gemma-7b-it",
      "messages": [
        {
          "role": "user",
          "content": "Hello! What is your name?"
        }
      ],
      "max_tokens": 25
  }'

Scale the Serving with SkyServe

Using the same YAML, we can easily scale the model serving across multiple instances, regions and clouds with SkyServe:

HF_TOKEN="xxx" sky serve up -n gemma serve.yaml --env HF_TOKEN

Notice the only change is from sky launch to sky serve up. The same YAML can be used without changes.

After the cluster is launched, we can access the model with the following command:

ENDPOINT=$(sky serve status --endpoint gemma)

curl http://$ENDPOINT/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "google/gemma-7b-it",
      "prompt": "My favourite condiment is",
      "max_tokens": 25
  }' | jq .

Chat API is also supported:

curl http://$ENDPOINT/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "google/gemma-7b-it",
      "messages": [
        {
          "role": "user",
          "content": "Hello! What is your name?"
        }
      ],
      "max_tokens": 25
  }'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemma

gemma

README.md

Gemma: Open-source Gemini

Serve Gemma on any Cloud

Prerequsites

Host on a Single Instance

Scale the Serving with SkyServe

Files

gemma

Directory actions

More options

Directory actions

More options

Latest commit

History

gemma

Folders and files

parent directory

README.md

Gemma: Open-source Gemini

Serve Gemma on any Cloud

Prerequsites

Host on a Single Instance

Scale the Serving with SkyServe