Can we use Litellm to make a call to API BASE which calls llama3 for text generation? #4359

Neel-Shah-29 · 2024-06-22T16:38:22Z

Neel-Shah-29
Jun 22, 2024

Flask App:

from flask import Flask, request, jsonify
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

app = Flask(__name__)

# Load the tokenizer and model using the pipeline object
model_name = 'meta-llama/Meta-Llama-3-8B'  # Adjust this path to the correct model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
text_generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

@app.route('/')
def index():
    return 'Hello from Flask!'

@app.route('/v1/completions', methods=['POST'])
def completion():
    data = request.get_json()
    prompt = data['messages'][0].get('content', '')

    if not prompt:
        return jsonify({"error": "Prompt is required"}), 400

    # Generate text using the pipeline object
    generated_texts = text_generator(prompt, max_length=50, num_return_sequences=1)
    generated_text = generated_texts[0]['generated_text']

    # Format response in OpenAI's API format
    response = {
        "model": model_name,
        "choices": [{
            "text": generated_text,
            "index": 0,
            "logprobs": None,
            "finish_reason": "length"
        }],
        "usage": {
            "prompt_tokens": len(tokenizer(prompt)['input_ids']),
            "completion_tokens": len(tokenizer(generated_text)['input_ids']),
            "total_tokens": len(tokenizer(prompt)['input_ids']) + len(tokenizer(generated_text)['input_ids'])
        }
    }
    return jsonify(response)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

API Call using litellm

import os
from litellm import completion

# Set environment variables if needed (not used for proxy)
os.environ["OPENAI_API_KEY"] = "anything"

# Define the messages
messages = [{"content": "Once upon a time", "role": "user"}]

# Make the request using LiteLLM
response = completion(
    model="meta-llama/Meta-Llama-3-8B",
    messages=messages,
    api_base="http://0.0.0.0:5000/v1",
    custom_llm_provider="openai"
)

# Print the response
print(response)

The above call to litellm gives an error:

TypeError: APIError.__init__() missing 1 required keyword-only argument: 'body'

Whereas if i make a POST request using CURL it works alright

curl -X POST http://0.0.0.0:5000/v1/completions -H "Content-Type: application/json" -d '{"messages":[{"content": "Once upon a time","role": "user"}]}'

Output:

{"choices":[{"finish_reason":"length","index":0,"logprobs":null,"text":"Once upon a time, there was a man who was so poor, he didn't even have a roof over his head. He was so poor, he didn't even have a bed to sleep on. He was so poor, he didn't even"}],"model":"meta-llama/Meta-Llama-3-8B","usage":{"completion_tokens":51,"prompt_tokens":5,"total_tokens":56}}

I am refering the documentation provided here for making a call over custom OPEN AI proxy using litellm:
https://docs.litellm.ai/docs/providers/custom_openai_proxy#call-chatcompletions

I am not sure whether the same thing works for huggingface models as well. If yes, any help or assistance is appreciated.

Answered by ishaan-jaff

Jun 24, 2024

Hi @Neel-Shah-29 your endpoint looks like a text-completion endpoint - do this instead

import os
from litellm import completion

# Set environment variables if needed (not used for proxy)
os.environ["OPENAI_API_KEY"] = "anything"

# Define the messages
messages = [{"content": "Once upon a time", "role": "user"}]

# Make the request using LiteLLM
response = completion(
    model="meta-llama/Meta-Llama-3-8B",
    messages=messages,
    api_base="http://0.0.0.0:5000/v1",
    custom_llm_provider="text-completion-openai"
)

# Print the response
print(response)

View full answer

ishaan-jaff · 2024-06-24T22:26:14Z

ishaan-jaff
Jun 24, 2024
Maintainer

Hi @Neel-Shah-29 your endpoint looks like a text-completion endpoint - do this instead

import os
from litellm import completion

# Set environment variables if needed (not used for proxy)
os.environ["OPENAI_API_KEY"] = "anything"

# Define the messages
messages = [{"content": "Once upon a time", "role": "user"}]

# Make the request using LiteLLM
response = completion(
    model="meta-llama/Meta-Llama-3-8B",
    messages=messages,
    api_base="http://0.0.0.0:5000/v1",
    custom_llm_provider="text-completion-openai"
)

# Print the response
print(response)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we use Litellm to make a call to API BASE which calls llama3 for text generation? #4359

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Can we use Litellm to make a call to API BASE which calls llama3 for text generation? #4359

Neel-Shah-29 Jun 22, 2024

Replies: 1 comment

ishaan-jaff Jun 24, 2024 Maintainer

Neel-Shah-29
Jun 22, 2024

ishaan-jaff
Jun 24, 2024
Maintainer