Releases: NASA-IMPACT/evalem
nlp and cv namespace segregation
Disclaimer: This creates a breaking changes but only at namespace level. So, all the previous
evalem.models,evalem.metrics, etc are now residing atevalem.nlp.models,evalem.nlp.metrics.
With this new release, now evalem has both nlp as well as cv namespace segregation:
evalem.nlpevalem.cv
Both of these have:
- models
- metrics
- evaluation pipeline
All these are derived from bases at evalem._base.
v0.0.3-alpha.1
This release fixes few setup related misconfigurations. See #16
v0.0.3-alpha
This release adds a simple pipeline abstraction for existing ModelWrapper, Metric and Evaluator.
Changelog
Major
evalem.pipelines.SimpleEvaluationPipelineis added that wraps existing model wrappers, metrics and evaluators to run in single coherent abstraction. see PR- More semantic metrics like Bleu, ROUGE, METEOR are added. see PR
Minor
- Test suites are refactored. For example, the model and pipeline tests suites are parameterized through
conftest.pyparadigm.
Usage
from evalem.pipelines import SimpleEvaluationPipeline
from evalem.models import TextClassificationHFPipelineWrapper
from evalem.evaluators import TextClassificationEvaluator
# can switch to any implemented wrapper
model = TextClassificationHFPipelineWrapper()
# can switch to other evaluator implementation
evaluator = TextClassificationEvaluator()
# initialize
eval_pipe = SimpleEvaluationPipeline(model=model, evaluators=evaluator)
results = pipe(inputs, references)
# or
results = pipe.run(inputs, references)[alpha] Initial release
This release adds initial metrics and model components as:
1) Metrics
We can import various metrics from evalem.metrics
BasicMetricsandSemanticMetricscan be used- basic metrics are:
-F1Metric
-RecallMetric
-PrecisionMetric
-ConfusionMatrix
-AccuracyMetric
-ExactMatchMetric - semantic metrics include
BertScoreandBartScore
These metrics can be used independently to evaluate the predictions from upstream models using references/ground-truths.
2) ModelWrapper
evalem.models include various model wrapper implementation. See PRs this and this
-
evalem.models.QuestionAnsweringHFPipelineWrapperandevalem.models.TextClassificationHFPipelineWrapperare now the main wrappers for QA and Text Classification tasks respectively.- These also have better parameter initialization, allowing any suitable models and tokenizers to be used along with device types.
hf_paramsdictis also provided as a parameter that will be used for initializing the HF pipeline
-
The model wrappers utilize 2 distinct processing parameters (one for pre-preocessing and one for post-processing) which should be
Callable(lambda function, external modules that can be called, etc.) and can be modified accordingly to pre/post processing.
3) Evaluator
evaluators provide abstraction/containerization of metrics to evaluate in group.
See PRs this, this and this
We have 2 different evaluator implementation:
evalem.evaluators.QAEvaluatorfor evaluating QA metricsevalem.evaluators.TextClassificationEvaluatorfor text classification
We can also directly use evalem.evaluators._base.Evaluator to create our own custom evaluator object.