Skip to content

Conversation

@NISH1001
Copy link
Collaborator

@NISH1001 NISH1001 commented Dec 12, 2024

Major Changes

  • Add evalem.nlp.metrics.llm.LLMAsJudgeMetric that uses LLM to evaluate and compare. Can be installed via [llm] namespace.

Minor Changes

  • Add evalem._base.structures.SequenceType to encapsulate Union[list, tuple, set]
  • Add imports to dunder init methods for easy imports of NLP metrics.
  • Upgrade major dependencies

Usage

from evalem.nlp import LLMAsJudgeMetric

model = "ollama/llama3.2:3b"
api_base = "http://localhost:11434/v1"

references = [...]
predictions = [...]

LLMAsJudgeMetric(
    model=MODEL,
    api_base=API_BASE,
    api_key=os.environ.get("OPENAI_API_KEY"),
    # api_key=None,
    n_tries=3,
    prompt=PROMPT,
    debug=True,
).compute(
    references=references,
    predictions=predictions,
)

@NISH1001 NISH1001 merged commit e3a9c46 into develop Dec 12, 2024
2 of 5 checks passed
@NISH1001 NISH1001 deleted the feature/llm-as-judge branch December 12, 2024 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants