Can we use Litellm to make a call to API BASE which calls llama3 for text generation? #4359
-
Flask App: from flask import Flask, request, jsonify
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
app = Flask(__name__)
# Load the tokenizer and model using the pipeline object
model_name = 'meta-llama/Meta-Llama-3-8B' # Adjust this path to the correct model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
text_generator = pipeline('text-generation', model=model, tokenizer=tokenizer)
@app.route('/')
def index():
return 'Hello from Flask!'
@app.route('/v1/completions', methods=['POST'])
def completion():
data = request.get_json()
prompt = data['messages'][0].get('content', '')
if not prompt:
return jsonify({"error": "Prompt is required"}), 400
# Generate text using the pipeline object
generated_texts = text_generator(prompt, max_length=50, num_return_sequences=1)
generated_text = generated_texts[0]['generated_text']
# Format response in OpenAI's API format
response = {
"model": model_name,
"choices": [{
"text": generated_text,
"index": 0,
"logprobs": None,
"finish_reason": "length"
}],
"usage": {
"prompt_tokens": len(tokenizer(prompt)['input_ids']),
"completion_tokens": len(tokenizer(generated_text)['input_ids']),
"total_tokens": len(tokenizer(prompt)['input_ids']) + len(tokenizer(generated_text)['input_ids'])
}
}
return jsonify(response)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000) API Call using litellm import os
from litellm import completion
# Set environment variables if needed (not used for proxy)
os.environ["OPENAI_API_KEY"] = "anything"
# Define the messages
messages = [{"content": "Once upon a time", "role": "user"}]
# Make the request using LiteLLM
response = completion(
model="meta-llama/Meta-Llama-3-8B",
messages=messages,
api_base="http://0.0.0.0:5000/v1",
custom_llm_provider="openai"
)
# Print the response
print(response) The above call to litellm gives an error:
Whereas if i make a POST request using CURL it works alright
Output:
I am refering the documentation provided here for making a call over custom OPEN AI proxy using litellm: I am not sure whether the same thing works for huggingface models as well. If yes, any help or assistance is appreciated. |
Beta Was this translation helpful? Give feedback.
Answered by
ishaan-jaff
Jun 24, 2024
Replies: 1 comment
-
Hi @Neel-Shah-29 your endpoint looks like a import os
from litellm import completion
# Set environment variables if needed (not used for proxy)
os.environ["OPENAI_API_KEY"] = "anything"
# Define the messages
messages = [{"content": "Once upon a time", "role": "user"}]
# Make the request using LiteLLM
response = completion(
model="meta-llama/Meta-Llama-3-8B",
messages=messages,
api_base="http://0.0.0.0:5000/v1",
custom_llm_provider="text-completion-openai"
)
# Print the response
print(response) |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
ishaan-jaff
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @Neel-Shah-29 your endpoint looks like a
text-completion
endpoint - do this instead