diff --git a/cohort_2/week2/1. Synthetic-Transactions-Logfire.ipynb b/cohort_2/week2/1. Synthetic-Transactions-Logfire.ipynb
new file mode 100644
index 0000000..dd1c1d9
--- /dev/null
+++ b/cohort_2/week2/1. Synthetic-Transactions-Logfire.ipynb
@@ -0,0 +1,1015 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%load_ext autoreload\n",
+ "%autoreload 2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Week 2: Fine-tuning Embeddings for RAG Applications\n",
+ "\n",
+ "> **Prerequisites**: Please complete Week 1's notebooks before starting this one. The concepts build directly on that foundation.\n",
+ "\n",
+ "Most teams avoid fine-tuning their embedding models, thinking they need tons of data and complex setups. But our experience shows that with just 100-200 thoughtfully created examples, you can significantly improve your model's performance over general-purpose embeddings.\n",
+ "\n",
+ "## Why This Matters\n",
+ "\n",
+ "Fine-tuning embedding models gives several key advantages over using general-purpose models. While it takes more upfront work to create training data and run the fine-tuning process, this investment typically leads to:\n",
+ "\n",
+ "1. Much better accuracy on your specific tasks\n",
+ "2. Lower running costs through using smaller, more efficient models\n",
+ "3. Better handling of domain-specific language and context\n",
+ "\n",
+ "## What You'll Learn\n",
+ "\n",
+ "Through this tutorial, you'll discover how to:\n",
+ "\n",
+ "1. Build Quality Training Data\n",
+ "\n",
+ "- Generate synthetic transactions systematically\n",
+ "- Review and validate examples manually\n",
+ "- Create diverse, representative samples\n",
+ "\n",
+ "2. Structure Your Dataset\n",
+ "\n",
+ "- Define clear schema for transactions\n",
+ "- Add meaningful metadata\n",
+ "- Ensure data consistency\n",
+ "\n",
+ "3. Establish Performance Baselines\n",
+ "\n",
+ "- Measure initial embedding performance\n",
+ "- Calculate retrieval metrics\n",
+ "- Set up evaluation pipelines\n",
+ "\n",
+ "By the end of this notebook, you'll know how to create high-quality synthetic data for fine-tuning embedding models. This prepares you for the hands-on fine-tuning work in notebooks 2 and 3, where we'll use both Cohere's managed service and open-source tools to improve retrieval performance.\n",
+ "\n",
+ "## Case Study: Ramp's Transaction Categorization\n",
+ "\n",
+ "> Read about Ramp's succesful case study using fine-tuned embeddings [here](https://engineering.ramp.com/transaction-embeddings)\n",
+ "\n",
+ "We'll follow Ramp's successful approach to fine-tuning embeddings for transaction categorization. Their team demonstrated that even with unique customer categories, a fine-tuned model could effectively generalize to new customers and scenarios.\n",
+ "\n",
+ "Using synthetic financial data, we'll walk through their process step by step:\n",
+ "\n",
+ "1. **Data Understanding**: Learn what transaction data looks like and how to structure it for embedding\n",
+ "2. **Synthetic Data Generation**: Create realistic, challenging test cases using large language models\n",
+ "3. **Model Fine-tuning**: Compare performance between base and fine-tuned models\n",
+ "\n",
+ "Throughout this process, we'll use Logfire to track experiments and measure improvements systematically. This workflow will give you a practical foundation for fine-tuning embeddings in your own applications.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Understanding Transactions\n",
+ "\n",
+ "To fine-tune our model effectively, we need to understand the transaction data we're working with.\n",
+ "\n",
+ "Typical Transaction Fields:\n",
+ "\n",
+ "- Merchant Name: The vendor or service provider's name.\n",
+ "- Merchant Category Code (MCC): General category of the transaction (e.g., Restaurants).\n",
+ "- Department Name: The company department responsible for the transaction.\n",
+ "- Location: Where the transaction took place.\n",
+ "- Amount: The transaction's monetary value.\n",
+ "- Spend Program Name: Specific budget or spend limit allocated.\n",
+ "- Trip Name: If the transaction occurred during travel.\n",
+ "\n",
+ "We can see an example below\n",
+ "\n",
+ "```\n",
+ "Name : Beirut Bakery\n",
+ "Category: Restaurants, Cafeteria\n",
+ "Department: Engineering\n",
+ "Location: Calgary, AB, Canada\n",
+ "Amount: 56.67 CAD\n",
+ "Card: Ramp's Physical Card\n",
+ "Trip Name: unknown\n",
+ "```\n",
+ "\n",
+ "This is a difficult task because there's very little information. Additionally since each company has unique categories that have some implicit rules, it's difficult for a general embedding model to classify these transactions without fine-tuning.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Generating Synthetic Transactions\n",
+ "\n",
+ "We'll generate a synthetic transaction dataset in 3 steps\n",
+ "\n",
+ "1. **Initial Examples** : We'll create a batch of basic transactions using `gpt-4o-mini` and manually review them to create a small dataset of good and bad examples.\n",
+ "\n",
+ "2. **Refine Data** : We'll then randomly select a subset of these initial examples and use them to generate new examples that are more challenging by adding them to the prompt as few shot examples.\n",
+ "\n",
+ "3. **Logfire Evaluation** : We'll then use `Logfire` to evaluate the recall@1,3,5 and mrr@1,3,5 of our initial and refined examples.\n",
+ "\n",
+ "We recommend using `ChatGPT` to evaluate these examples and get a sense for what makes a good or bad example during this process. We'll iterate on this until we've generated at least 300 examples. This will ensure that we have enough examples to fine-tune a cohere re-ranker ( requires min 256 examples) or a sentence transformer model while also having enough examples to create a held out evaluation set.\n",
+ "\n",
+ "There are two key advantages to iteratively generating our dataset in small batches\n",
+ "\n",
+ "1. **Practicality** : A small amount of data is easy to label\n",
+ "2. **Randomness** : By constantly sampling from our growing number of examples, we're able to have enough randomness to create a diverse dataset. This helps us to avoid potential issues with diversity and quality that doing a single pass of data generation can introduce.\n",
+ "\n",
+ "### Step 1 : Generating our initial transactions.\n",
+ "\n",
+ "We'll start by generating our initial transactions using a simple prompt. These are going to be very simple examples that will not be great but they are useful as an initial starting point.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/Users/ivanleo/Documents/coding/systematically-improving-rag/cohort_2/.venv/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020\n",
+ " warnings.warn(\n"
+ ]
+ }
+ ],
+ "source": [
+ "from pydantic import BaseModel, field_validator, ValidationInfo\n",
+ "from openai import AsyncOpenAI\n",
+ "import instructor\n",
+ "from typing import Optional\n",
+ "from textwrap import dedent\n",
+ "import random\n",
+ "import json\n",
+ "import asyncio\n",
+ "\n",
+ "# Load in pre-defined categories\n",
+ "categories = json.load(open(\"data/categories.json\"))\n",
+ "\n",
+ "\n",
+ "# Define a Pydantic model that can represent the same transaction data that Ramp was using\n",
+ "class Transaction(BaseModel):\n",
+ " merchant_name: str\n",
+ " merchant_category: list[str]\n",
+ " department: str\n",
+ " location: str\n",
+ " amount: float\n",
+ " spend_program_name: str\n",
+ " trip_name: Optional[str] = None\n",
+ " expense_category: str\n",
+ "\n",
+ " def format_transaction(self):\n",
+ " return dedent(\n",
+ " f\"\"\"\n",
+ " Name : {self.merchant_name}\n",
+ " Category: {\", \".join(self.merchant_category)}\n",
+ " Department: {self.department}\n",
+ " Location: {self.location}\n",
+ " Amount: {self.amount}\n",
+ " Card: {self.spend_program_name}\n",
+ " Trip Name: {self.trip_name if self.trip_name else \"unknown\"}\n",
+ " \"\"\"\n",
+ " )\n",
+ "\n",
+ " @field_validator(\"expense_category\")\n",
+ " @classmethod\n",
+ " def validate_expense_category(cls, v, info: ValidationInfo):\n",
+ " # We use this later to read in the generated transactions\n",
+ " if not info.context:\n",
+ " return v\n",
+ "\n",
+ " # Validate that we've generated the right expense category\n",
+ " assert v == info.context[\"category\"][\"category\"], (\n",
+ " f\"The transaction must have an expense category of {info.context['category']['category']} instead of {v}\"\n",
+ " )\n",
+ " return v\n",
+ "\n",
+ "\n",
+ "client = instructor.from_openai(AsyncOpenAI())\n",
+ "\n",
+ "\n",
+ "async def generate_transaction(category):\n",
+ " return await client.chat.completions.create(\n",
+ " model=\"gpt-4o-mini\",\n",
+ " messages=[\n",
+ " {\n",
+ " \"role\": \"system\",\n",
+ " \"content\": \"\"\"Generate a transaction for a tech company that could be filed under the category of {{ category }}. This should be distinct from the sample_transactions provided in the categories.json file\n",
+ "\n",
+ " - The spend program is a specific spending authority or allocation that has defined limits, rules, and permissions. It's like a virtual card or spending account set up for a specific purpose.\n",
+ " - Merchant Category Name is a label that best describes the merchant of the transaction.\n",
+ " - Merchant name should be realistic and not obviously made up.\n",
+ " \"\"\",\n",
+ " }\n",
+ " ],\n",
+ " context={\"category\": category},\n",
+ " response_model=Transaction,\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Generate 5 initial transactions and choose the category randomly\n",
+ "coros = []\n",
+ "for _ in range(5):\n",
+ " coros.append(generate_transaction(random.choice(categories)))\n",
+ "\n",
+ "transactions = await asyncio.gather(*coros)\n",
+ "with open(\"./data/generated_transactions.jsonl\", \"a\") as f:\n",
+ " for transaction in transactions:\n",
+ " f.write(transaction.model_dump_json() + \"\\n\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
Transaction ( \n",
+ " merchant_name ='TechBank USA' ,\n",
+ " merchant_category =[ 'Bank & Transaction Fees' ] ,\n",
+ " department ='Finance' ,\n",
+ " location ='San Francisco, CA' ,\n",
+ " amount =150.0 ,\n",
+ " spend_program_name ='Operational Expenses' ,\n",
+ " trip_name =None ,\n",
+ " expense_category ='Bank & Transaction Fees' \n",
+ ") \n",
+ " \n"
+ ],
+ "text/plain": [
+ "\u001b[1;35mTransaction\u001b[0m\u001b[1m(\u001b[0m\n",
+ " \u001b[33mmerchant_name\u001b[0m=\u001b[32m'TechBank USA'\u001b[0m,\n",
+ " \u001b[33mmerchant_category\u001b[0m=\u001b[1m[\u001b[0m\u001b[32m'Bank & Transaction Fees'\u001b[0m\u001b[1m]\u001b[0m,\n",
+ " \u001b[33mdepartment\u001b[0m=\u001b[32m'Finance'\u001b[0m,\n",
+ " \u001b[33mlocation\u001b[0m=\u001b[32m'San Francisco, CA'\u001b[0m,\n",
+ " \u001b[33mamount\u001b[0m=\u001b[1;36m150\u001b[0m\u001b[1;36m.0\u001b[0m,\n",
+ " \u001b[33mspend_program_name\u001b[0m=\u001b[32m'Operational Expenses'\u001b[0m,\n",
+ " \u001b[33mtrip_name\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
+ " \u001b[33mexpense_category\u001b[0m=\u001b[32m'Bank & Transaction Fees'\u001b[0m\n",
+ "\u001b[1m)\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from rich import print\n",
+ "\n",
+ "print(transactions[0])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Step 2 : Labeling Transactions\n",
+ "\n",
+ "Now that we've generated a small set of initial transactions, please run `streamlit run label.py` to manually select transactions that you think might be difficult to classify.\n",
+ "\n",
+ "> You can modify and edit the transaction details before approving them. Hot keys of ctrl + e ( approve ) and ctrl + r ( reject ) make this process much faster. Only approved transactions will be saved to `generated_transactions.jsonl` below. We'll then use these examples to generate a new set of transactions that are more challenging.\n",
+ "\n",
+ "You can also manually override transaction details in the streamlit application. We recommend using `ChatGPT` or `Claude` to discuss and generate good default and examples. A prompt that I used to prompt the chat UI was\n",
+ "\n",
+ "```\n",
+ "I'd like to generate a transaction for a tech company that is challenging to classify into a specific category. Here are the details\n",
+ "\n",
+ "\n",
+ "\n",
+ "I'd like you to help rewrite some of the details to make it more realistic. Please stick to the following rules\n",
+ "\n",
+ "- MCCs should be realistic. If possible, let's try to use a MCC that will cover a superset of the given category\n",
+ "- Let's try to suggest a non-uniform number (Eg. not 1500 ) so that it seems more realistic\n",
+ "- The Spend Program name should be a specific spending authority or allocation that has defined limits, rules, and permissions. It's like a virtual card or spending account set up for a specific purpose. In our case, this spend program name should not be a name that directly mentions the category or merchant\n",
+ "```\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "async def generate_transaction_with_examples(category, examples: list[Transaction]):\n",
+ " return await client.chat.completions.create(\n",
+ " model=\"gpt-4o\",\n",
+ " messages=[\n",
+ " {\n",
+ " \"role\": \"system\",\n",
+ " \"content\": \"\"\"\n",
+ " Generate a potentially ambiguous business transaction that could reasonably be categorized as {{ category }} or another similar category. The goal is to create transactions that challenge automatic categorization systems by having characteristics that could fit multiple categories.\n",
+ "\n",
+ "\n",
+ " Available categories in the system.:\n",
+ " \n",
+ " {% for category_option in categories %}\n",
+ " {{ category_option[\"category\"] }}\n",
+ " {% endfor %}\n",
+ " \n",
+ "\n",
+ " \n",
+ " The transaction should:\n",
+ " 1. Use a realistic but non-obvious merchant name (international names welcome), don't use names that are obviously made u \n",
+ " 2. Include a plausible but non-rounded amount with decimals (e.g., $1247.83)\n",
+ " 3. Be difficult to categorize definitively (could fit in multiple categories)\n",
+ " 4. Merchant Category Name(s) should not reference the category at all and should be able to be used for other similar categories if possible.\n",
+ "\n",
+ " Here are some good examples of transactions that were previously generated for other categories.\n",
+ "\n",
+ " {% for example in examples %}\n",
+ " {{ example.model_dump_json() }}\n",
+ " {% endfor %}\n",
+ " \"\"\",\n",
+ " }\n",
+ " ],\n",
+ " context={\"category\": category, \"examples\": examples, \"categories\": categories},\n",
+ " response_model=Transaction,\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 240,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "with open(\"./data/cleaned.jsonl\", \"r\") as f:\n",
+ " sample_transactions = []\n",
+ " for line in f:\n",
+ " sample_transactions.append(Transaction(**json.loads(line)))\n",
+ "\n",
+ "\n",
+ "coros = []\n",
+ "for _ in range(20):\n",
+ " coros.append(\n",
+ " generate_transaction_with_examples(\n",
+ " random.choice(categories), random.sample(sample_transactions, 10)\n",
+ " )\n",
+ " )\n",
+ "\n",
+ "transactions = await asyncio.gather(*coros)\n",
+ "\n",
+ "with open(\"./data/generated_transactions.jsonl\", \"w\") as f:\n",
+ " for transaction in transactions:\n",
+ " f.write(transaction.model_dump_json() + \"\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Step 3 : Evaluating Recall and MRR Performance.\n",
+ "\n",
+ "> Make sure that you've setup Logfire before continuing with the following section. For instructions on how to do so, please refer to `week-0/3. Using Pydantic Evals`\n",
+ "\n",
+ "Remember that we're building a model that can suggest transaction categories to a user. To do so, we'll only be able to show the top 3-5 results and we want to make sure that the correct result is ranked as highly as possible.\n",
+ "\n",
+ "Therefore, we'll be using recall and mrr to evaluate our model's performance here.\n",
+ "\n",
+ "- `recall` : This measures whether the correct category is in the top k retrieved results.\n",
+ "- `mrr` : This measures how highly is the correct category ranked in the retrieved results.\n",
+ "\n",
+ "Ideally we want a model with a high recall and mrr. This means that when we display the results, they're likely to be relevant to the user. By measuring the recall and mrr, we're able to ensure that we're conssitently generating questions that the model finds challenging.\n",
+ "\n",
+ "We're using `lancedb` here since it provides an easy way to perform these evaluations with automatic batching of embeddings for queries and data along with a single api for vector search and reranking.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import lancedb\n",
+ "from lancedb.pydantic import LanceModel, Vector\n",
+ "from lancedb.embeddings import get_registry\n",
+ "\n",
+ "func = get_registry().get(\"openai\").create(name=\"text-embedding-3-small\")\n",
+ "categories = json.load(open(\"data/categories.json\"))\n",
+ "\n",
+ "\n",
+ "class Category(LanceModel):\n",
+ " text: str = func.SourceField()\n",
+ " embedding: Vector(func.ndims()) = func.VectorField()\n",
+ "\n",
+ "\n",
+ "db = lancedb.connect(\"./lancedb\")\n",
+ "table = db.create_table(\"categories\", schema=Category, mode=\"overwrite\")\n",
+ "\n",
+ "\n",
+ "table.add(\n",
+ " [\n",
+ " {\n",
+ " \"text\": category[\"category\"],\n",
+ " }\n",
+ " for category in categories\n",
+ " ]\n",
+ ")\n",
+ "\n",
+ "table.create_fts_index(field_names=[\"text\"], replace=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can then read in our transactions that we've manually annotated using the `label.py` streamlit application"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "Transaction ( \n",
+ " merchant_name ='Tooly' ,\n",
+ " merchant_category =[ 'Survey Software' , 'SaaS' ] ,\n",
+ " department ='Marketing' ,\n",
+ " location ='San Francisco, CA' ,\n",
+ " amount =2000.0 ,\n",
+ " spend_program_name ='Annual Marketing Technology Budget' ,\n",
+ " trip_name =None ,\n",
+ " expense_category ='Subscriptions & Memberships' \n",
+ ") \n",
+ " \n"
+ ],
+ "text/plain": [
+ "\u001b[1;35mTransaction\u001b[0m\u001b[1m(\u001b[0m\n",
+ " \u001b[33mmerchant_name\u001b[0m=\u001b[32m'Tooly'\u001b[0m,\n",
+ " \u001b[33mmerchant_category\u001b[0m=\u001b[1m[\u001b[0m\u001b[32m'Survey Software'\u001b[0m, \u001b[32m'SaaS'\u001b[0m\u001b[1m]\u001b[0m,\n",
+ " \u001b[33mdepartment\u001b[0m=\u001b[32m'Marketing'\u001b[0m,\n",
+ " \u001b[33mlocation\u001b[0m=\u001b[32m'San Francisco, CA'\u001b[0m,\n",
+ " \u001b[33mamount\u001b[0m=\u001b[1;36m2000\u001b[0m\u001b[1;36m.0\u001b[0m,\n",
+ " \u001b[33mspend_program_name\u001b[0m=\u001b[32m'Annual Marketing Technology Budget'\u001b[0m,\n",
+ " \u001b[33mtrip_name\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
+ " \u001b[33mexpense_category\u001b[0m=\u001b[32m'Subscriptions & Memberships'\u001b[0m\n",
+ "\u001b[1m)\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from rich import print\n",
+ "\n",
+ "print(transactions[0])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from pydantic_evals import Dataset, Case\n",
+ "\n",
+ "cases = []\n",
+ "for line in open(\"./data/cleaned.jsonl\").readlines():\n",
+ " transaction = Transaction(**json.loads(line))\n",
+ " cases.append(\n",
+ " Case(\n",
+ " inputs=transaction.format_transaction(),\n",
+ " expected_output=[transaction.expense_category],\n",
+ " )\n",
+ " )\n",
+ "\n",
+ "dataset = Dataset(cases=cases)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 55,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from helpers import get_metrics_at_k, task\n",
+ "from dataclasses import dataclass\n",
+ "from pydantic_evals.evaluators import Evaluator, EvaluatorContext\n",
+ "from lancedb.table import Table\n",
+ "from concurrent.futures import ThreadPoolExecutor\n",
+ "from functools import partial\n",
+ "import logfire\n",
+ "\n",
+ "\n",
+ "@dataclass\n",
+ "class RagMetricsEvaluator(Evaluator):\n",
+ " async def evaluate(self, ctx: EvaluatorContext[str, str]) -> dict[str, float]:\n",
+ " predictions = ctx.output\n",
+ " labels = ctx.expected_output\n",
+ " metrics = get_metrics_at_k(metrics=[\"mrr\", \"recall\"], sizes=[1, 3, 5])\n",
+ " return {\n",
+ " metric: score_fn(predictions, labels)\n",
+ " for metric, score_fn in metrics.items()\n",
+ " }\n",
+ "\n",
+ "\n",
+ "async def retrieve_results(\n",
+ " question: str,\n",
+ " table: Table,\n",
+ " pool: ThreadPoolExecutor,\n",
+ " max_k=25,\n",
+ " reranker=None,\n",
+ "):\n",
+ " loop = asyncio.get_running_loop()\n",
+ " return await loop.run_in_executor(\n",
+ " pool,\n",
+ " partial(task, user_query=question, table=table, max_k=max_k, reranker=reranker),\n",
+ " )\n",
+ "\n",
+ "\n",
+ "logfire.configure(\n",
+ " send_to_logfire=True,\n",
+ " environment=\"experimentation\",\n",
+ " service_name=\"synthetic-transactions\",\n",
+ " console=False,\n",
+ ")\n",
+ "\n",
+ "dataset.add_evaluator(RagMetricsEvaluator())\n",
+ "with ThreadPoolExecutor(max_workers=10) as executor:\n",
+ " evaluation_result = await dataset.evaluate(\n",
+ " partial(retrieve_results, table=table, pool=executor)\n",
+ " )\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@1 0.3885 \n",
+ "mrr@3 0.4981 \n",
+ "mrr@5 0.5254 \n",
+ "recall@1 0.3885 \n",
+ "recall@3 0.6462 \n",
+ "recall@5 0.7692 \n",
+ " \n"
+ ],
+ "text/plain": [
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@\u001b[1;36m1\u001b[0m \u001b[1;36m0.3885\u001b[0m\n",
+ "mrr@\u001b[1;36m3\u001b[0m \u001b[1;36m0.4981\u001b[0m\n",
+ "mrr@\u001b[1;36m5\u001b[0m \u001b[1;36m0.5254\u001b[0m\n",
+ "recall@\u001b[1;36m1\u001b[0m \u001b[1;36m0.3885\u001b[0m\n",
+ "recall@\u001b[1;36m3\u001b[0m \u001b[1;36m0.6462\u001b[0m\n",
+ "recall@\u001b[1;36m5\u001b[0m \u001b[1;36m0.7692\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from tabulate import tabulate\n",
+ "\n",
+ "def format_results(result):\n",
+ " return tabulate(\n",
+ " [[item, round(value,4)] for item, value in result.averages().scores.items()],\n",
+ " headers=[\"Metric\", \"Score\"],\n",
+ " )\n",
+ "\n",
+ "print(format_results(evaluation_result))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "At this point, we've generated a large dataset of synthetic transactions that we can use to fine-tune a model on. However, it's important here to call out that synthetic data has its challenges.\n",
+ "\n",
+ "1. Quality and Diversity : It's difficult to ensure that the synthetic data is of high quality and diverse. We've done so by manually reviewing and selecting good examples but ultimately we need real production data to ensure that our model is able to generalise.\n",
+ "\n",
+ "2. Human Error : Manual review is great to ensure the quality of transactions but is expensive and error prone. This is not something that scales well, especially if you're trying to generate thousands of examples which you'd like humans to manually label.\n",
+ "\n",
+ "We want to treat this synthetic data as a starting point and iteratively make it better using the techniques we've discussed in this notebook. But you will need to eventually mix in production data and continue generating synthetic data in order to adequately evaluate and test the generalisation capabilities of your model.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Creating a Dataset\n",
+ "\n",
+ "We want to segregate our data into a train and evaluation set because it allows us to evaluate the performance of our model on data that it hasn't seen before. \n",
+ "\n",
+ "We use `pydantic_evals` here to segregate our dataset into two separate yml files and store them in our `data` folder. This allows us to easily share and run evaluations on our model in the subsequent notebooks later on.\n",
+ "\n",
+ "\n",
+ "If we fine-tuned our model on the same data that we evaluated it on, it would be difficult to tell if the improvements we made were due to the model generalizing better or due to overfitting. In this case, we're just going to split our data by selecting the first 80% as our training set and the remaining 20% as our evaluation set.\n",
+ "\n",
+ "In practice, you'd want to think carefully about these splits - using the category as a way to ensure that we have a diverse set of examples or generating new labels for the evaluation set based on the training labels. (Eg. Restaurants -> Dining Establishments or randomly grouping categories together )\n",
+ "\n",
+ "Before we start fine-tuning our models here, we also need to make sure that the evaluation set and training set are similar. We do so by measuring the recall and mrr and verifying that they have similar values.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_ratio = 0.8 * len(cases)\n",
+ "\n",
+ "train_cases = cases[: int(train_ratio)]\n",
+ "eval_cases = cases[int(train_ratio) :]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_dataset_path = \"./data/train_transactions.yml\"\n",
+ "eval_dataset_path = \"./data/eval_transactions.yml\"\n",
+ "\n",
+ "train_dataset = Dataset(cases=train_cases)\n",
+ "train_dataset.to_file(train_dataset_path)\n",
+ "\n",
+ "eval_dataset = Dataset(cases=eval_cases)\n",
+ "eval_dataset.to_file(eval_dataset_path)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can validate that the dataset is formatted correctly by printing out the first item.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "Case ( \n",
+ " name =None ,\n",
+ " inputs ='\\nName : Tooly\\nCategory: Survey Software, SaaS\\nDepartment: Marketing\\nLocation: San Francisco, \n",
+ "CA\\nAmount: 2000.0\\nCard: Annual Marketing Technology Budget\\nTrip Name: unknown\\n' ,\n",
+ " metadata =None ,\n",
+ " expected_output =[ 'Subscriptions & Memberships' ] ,\n",
+ " evaluators =[] \n",
+ ") \n",
+ " \n"
+ ],
+ "text/plain": [
+ "\u001b[1;35mCase\u001b[0m\u001b[1m(\u001b[0m\n",
+ " \u001b[33mname\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
+ " \u001b[33minputs\u001b[0m=\u001b[32m'\\nName : Tooly\\nCategory: Survey Software, SaaS\\nDepartment: Marketing\\nLocation: San Francisco, \u001b[0m\n",
+ "\u001b[32mCA\\nAmount: 2000.0\\nCard: Annual Marketing Technology Budget\\nTrip Name: unknown\\n'\u001b[0m,\n",
+ " \u001b[33mmetadata\u001b[0m=\u001b[3;35mNone\u001b[0m,\n",
+ " \u001b[33mexpected_output\u001b[0m=\u001b[1m[\u001b[0m\u001b[32m'Subscriptions & Memberships'\u001b[0m\u001b[1m]\u001b[0m,\n",
+ " \u001b[33mevaluators\u001b[0m=\u001b[1m[\u001b[0m\u001b[1m]\u001b[0m\n",
+ "\u001b[1m)\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from rich import print\n",
+ "\n",
+ "dataset = Dataset.from_file(train_dataset_path)\n",
+ "print(dataset.cases[0])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can see that the dataset has the input formatted correctly as well as the expected output. Now let's see if our evaluation set and training set are similar.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "260 66 \n",
+ " \n"
+ ],
+ "text/plain": [
+ "\u001b[1;36m260\u001b[0m \u001b[1;36m66\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "train_dataset = Dataset.from_file(train_dataset_path)\n",
+ "eval_dataset = Dataset.from_file(eval_dataset_path)\n",
+ "\n",
+ "print(len(train_dataset.cases), len(eval_dataset.cases))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from lancedb import connect\n",
+ "import concurrent\n",
+ "\n",
+ "datasets = [[\"train\", train_dataset], [\"eval\", eval_dataset]]\n",
+ "\n",
+ "db = connect(\"./lancedb\")\n",
+ "table = db.open_table(\"categories\")\n",
+ "\n",
+ "results = []\n",
+ "for dataset_name, dataset_partition in datasets:\n",
+ " with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:\n",
+ " dataset_partition.add_evaluator(RagMetricsEvaluator())\n",
+ " result = await dataset_partition.evaluate(\n",
+ " partial(retrieve_results, table=table, pool=executor)\n",
+ " )\n",
+ " results.append(result)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@1 0.3885 \n",
+ "mrr@3 0.4981 \n",
+ "mrr@5 0.5254 \n",
+ "recall@1 0.3885 \n",
+ "recall@3 0.6462 \n",
+ "recall@5 0.7692 \n",
+ " \n"
+ ],
+ "text/plain": [
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@\u001b[1;36m1\u001b[0m \u001b[1;36m0.3885\u001b[0m\n",
+ "mrr@\u001b[1;36m3\u001b[0m \u001b[1;36m0.4981\u001b[0m\n",
+ "mrr@\u001b[1;36m5\u001b[0m \u001b[1;36m0.5254\u001b[0m\n",
+ "recall@\u001b[1;36m1\u001b[0m \u001b[1;36m0.3885\u001b[0m\n",
+ "recall@\u001b[1;36m3\u001b[0m \u001b[1;36m0.6462\u001b[0m\n",
+ "recall@\u001b[1;36m5\u001b[0m \u001b[1;36m0.7692\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "print(format_results(results[0]))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@1 0.3788 \n",
+ "mrr@3 0.4798 \n",
+ "mrr@5 0.5131 \n",
+ "recall@1 0.3788 \n",
+ "recall@3 0.6061 \n",
+ "recall@5 0.7424 \n",
+ " \n"
+ ],
+ "text/plain": [
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@\u001b[1;36m1\u001b[0m \u001b[1;36m0.3788\u001b[0m\n",
+ "mrr@\u001b[1;36m3\u001b[0m \u001b[1;36m0.4798\u001b[0m\n",
+ "mrr@\u001b[1;36m5\u001b[0m \u001b[1;36m0.5131\u001b[0m\n",
+ "recall@\u001b[1;36m1\u001b[0m \u001b[1;36m0.3788\u001b[0m\n",
+ "recall@\u001b[1;36m3\u001b[0m \u001b[1;36m0.6061\u001b[0m\n",
+ "recall@\u001b[1;36m5\u001b[0m \u001b[1;36m0.7424\u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "print(format_results(results[1]))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Results and Analysis\n",
+ "\n",
+ "Now let's visualise the scores for our evaluation and training datasets below"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 69,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " mrr@1 \n",
+ " mrr@3 \n",
+ " mrr@5 \n",
+ " recall@1 \n",
+ " recall@3 \n",
+ " recall@5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Train \n",
+ " 0.39 \n",
+ " 0.50 \n",
+ " 0.53 \n",
+ " 0.39 \n",
+ " 0.65 \n",
+ " 0.77 \n",
+ " \n",
+ " \n",
+ " Eval \n",
+ " 0.38 \n",
+ " 0.48 \n",
+ " 0.51 \n",
+ " 0.38 \n",
+ " 0.61 \n",
+ " 0.74 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " mrr@1 mrr@3 mrr@5 recall@1 recall@3 recall@5\n",
+ "Train 0.39 0.50 0.53 0.39 0.65 0.77\n",
+ "Eval 0.38 0.48 0.51 0.38 0.61 0.74"
+ ]
+ },
+ "execution_count": 69,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "scores = []\n",
+ "\n",
+ "for result in results:\n",
+ " result_scores = {}\n",
+ " for score_name, score in result.averages().scores.items():\n",
+ " result_scores[score_name] = score\n",
+ " scores.append(result_scores)\n",
+ "\n",
+ "df = pd.DataFrame(scores, index=[\"Train\", \"Eval\"])\n",
+ "df.round(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 70,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "\n",
+ "# Create figure with two subplots side by side\n",
+ "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 5))\n",
+ "\n",
+ "# Get MRR and Recall columns\n",
+ "mrr_cols = [\"mrr@1\", \"mrr@3\", \"mrr@5\"]\n",
+ "recall_cols = [\"recall@1\", \"recall@3\", \"recall@5\"]\n",
+ "x = np.arange(len(mrr_cols))\n",
+ "width = 0.35\n",
+ "\n",
+ "# Plot MRR bars\n",
+ "ax1.bar(x - width / 2, df.loc[\"Train\", mrr_cols], width, label=\"Train\")\n",
+ "ax1.bar(x + width / 2, df.loc[\"Eval\", mrr_cols], width, label=\"Eval\")\n",
+ "ax1.set_title(\"Mean Reciprocal Rank (MRR)\")\n",
+ "ax1.set_xticks(x)\n",
+ "ax1.set_xticklabels(mrr_cols)\n",
+ "ax1.set_ylabel(\"Score\")\n",
+ "ax1.legend()\n",
+ "ax1.grid(True, alpha=0.3)\n",
+ "\n",
+ "# Plot Recall bars\n",
+ "ax2.bar(x - width / 2, df.loc[\"Train\", recall_cols], width, label=\"Train\")\n",
+ "ax2.bar(x + width / 2, df.loc[\"Eval\", recall_cols], width, label=\"Eval\")\n",
+ "ax2.set_title(\"Recall\")\n",
+ "ax2.set_xticks(x)\n",
+ "ax2.set_xticklabels(recall_cols)\n",
+ "ax2.set_ylabel(\"Score\")\n",
+ "ax2.legend()\n",
+ "ax2.grid(True, alpha=0.3)\n",
+ "\n",
+ "plt.tight_layout()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this notebook, we've laid the foundation for fine-tuning by creating high-quality synthetic training data. Our analysis shows that our train and eval sets have similar performance metrics, suggesting they represent the same underlying patterns and will give us reliable estimates of real-world performance.\n",
+ "\n",
+ "While our approach works well for getting started, production systems need more robust testing - like generating similar but different labels, using multiple test sets, and carefully preventing data leakage. We'll explore some of these ideas in Week 4 when we look at handling different types of queries.\n",
+ "\n",
+ "In the next two notebooks, we'll put this data to work: first using Cohere's managed re-ranker service as an easy starting point, then exploring open-source fine-tuning for more control. This builds on Week 1's evaluation framework while preparing us for more advanced query handling in future weeks.\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.6"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/cohort_2/week2/2. Finetune Cohere Logfire.ipynb b/cohort_2/week2/2. Finetune Cohere Logfire.ipynb
new file mode 100644
index 0000000..d96d224
--- /dev/null
+++ b/cohort_2/week2/2. Finetune Cohere Logfire.ipynb
@@ -0,0 +1,751 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%load_ext autoreload\n",
+ "%autoreload 2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Week 2: Getting Started with Re-ranker Fine-tuning\n",
+ "\n",
+ "> **Prerequisites**: Complete the `1. Synthetic Transactions.ipynb` notebook first to generate the evaluation dataset we'll use for fine-tuning. You'll also need a Cohere API Key which you can get by signing up for a free account on [Cohere](https://cohere.com/).\n",
+ "\n",
+ "When improving RAG systems, starting with a managed re-ranker service can provide quick wins with minimal engineering overhead. This notebook demonstrates how to fine-tune a Cohere re-ranker for better retrieval performance using just a few hundred examples.\n",
+ "\n",
+ "## Why This Matters\n",
+ "\n",
+ "Re-rankers offer several advantages when you're just getting started:\n",
+ "\n",
+ "1. **Data Efficiency**\n",
+ " - Work effectively with limited data (as few as 256 examples)\n",
+ " - Only need to learn ranking within small candidate sets \n",
+ " - Can achieve significant gains without massive training sets\n",
+ "\n",
+ "2. **Easy Integration** \n",
+ " - Drop-in addition to existing retrieval pipelines\n",
+ " - Progressive improvement without system overhauls\n",
+ " - No need to re-embed your entire document collection\n",
+ "\n",
+ "3. **Quick Implementation**\n",
+ " - Hosted services handle infrastructure complexity\n",
+ " - No hyperparameter tuning required \n",
+ " - Focus on experimentation rather than deployment\n",
+ "\n",
+ "## What You'll Learn\n",
+ "\n",
+ "Through hands-on examples, you'll discover how to:\n",
+ "\n",
+ "1. **Prepare Training Data**\n",
+ " - Format data for re-ranker fine-tuning\n",
+ " - Create effective training examples\n",
+ " - Generate hard negatives\n",
+ "\n",
+ "2. **Fine-tune Models**\n",
+ " - Configure training parameters\n",
+ " - Monitor training progress\n",
+ " - Validate model improvements\n",
+ "\n",
+ "3. **Evaluate Performance**\n",
+ " - Compare against baseline retrieval\n",
+ " - Measure recall and MRR improvements\n",
+ " - Analyze result quality\n",
+ "\n",
+ "By the end of this notebook, you'll have a fine-tuned re-ranker improving your retrieval results. This builds on Week 1's evaluation framework while preparing you for more advanced fine-tuning using open-source models in notebook 3.\n",
+ "\n",
+ "## Preparing Our Training Data\n",
+ "\n",
+ "In this notebook, we'll be using the synthetic transactions dataset we created in the previous notebook to fine-tune a Cohere re-ranker. [Cohere will use the most recent base model for fine-tuning](https://docs.cohere.com/v2/docs/rerank-starting-the-training#parameters) so do check their website to see which model version is the latest one that they've released.\n",
+ "\n",
+ "We'll then create hard negatives so that our re-ranker learns the difference between similar categories.\n",
+ "\n",
+ "### Hard Negatives\n",
+ "\n",
+ "A key aspect of effective re-ranker training is the selection of hard negatives - examples that are similar to the correct answer but shouldn't be ranked highly. For instance:\n",
+ "\n",
+ "Query:\n",
+ "```\n",
+ "Name: Ayden\n",
+ "Category: Financial Software\n",
+ "Department: Finance\n",
+ "Location: Berlin, DE\n",
+ "Amount: 1273.45\n",
+ "```\n",
+ "\n",
+ "Positive Example:\n",
+ "```\n",
+ "Subscription & Revenue Infrastructure\n",
+ "```\n",
+ "\n",
+ "Hard Negatives:\n",
+ "```\n",
+ "Office Equipment Maintenance\n",
+ "Office Supplies & Stationery\n",
+ "Human Resources\n",
+ "```\n",
+ "\n",
+ "Hard negatives help your model by:\n",
+ "1. Improving discriminative ability between similar categories\n",
+ "2. Building robustness against noisy results\n",
+ "3. Making efficient use of limited training data\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Creating Our Dataset\n",
+ "\n",
+ "In this section, we'll use the examples that we saved to our local data directory to create a dataset to train our Cohere re-ranker.\n",
+ "\n",
+ "We'll use the `train` examples to train our re-ranker and the `eval` examples to benchmark our model. Once we've formatted our data in the correct format, we'll save it to a file and upload it to Cohere."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/Users/ivanleo/Documents/coding/systematically-improving-rag/cohort_2/.venv/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020\n",
+ " warnings.warn(\n",
+ "/Users/ivanleo/Documents/coding/systematically-improving-rag/cohort_2/.venv/lib/python3.9/site-packages/pydantic_evals/dataset.py:390: UserWarning: Could not determine the generic parameters for ; using `Any` for each. You should explicitly set the generic parameters via `Dataset[MyInputs, MyOutput, MyMetadata]` when serializing or deserializing.\n",
+ " warnings.warn(\n"
+ ]
+ }
+ ],
+ "source": [
+ "from pydantic_evals import Dataset\n",
+ "\n",
+ "train_dataset: Dataset = Dataset.from_file(\"./data/train_transactions.yml\")\n",
+ "eval_dataset: Dataset = Dataset.from_file(\"./data/eval_transactions.yml\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from pydantic import BaseModel\n",
+ "\n",
+ "\n",
+ "# Define Pydantic model to store our finetuning data\n",
+ "class FinetuneItem(BaseModel):\n",
+ " query: str\n",
+ " relevant_passages: list[str]\n",
+ " hard_negatives: list[str]\n",
+ "\n",
+ "\n",
+ "# Get all the labels in our dataset\n",
+ "labels = set([item.expected_output[0] for item in train_dataset.cases])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import random\n",
+ "\n",
+ "finetuning_data = [\n",
+ " FinetuneItem(\n",
+ " query=transaction.inputs,\n",
+ " relevant_passages=transaction.expected_output,\n",
+ " hard_negatives=random.sample(\n",
+ " [label for label in labels if label != transaction.expected_output[0]], k=4\n",
+ " ),\n",
+ " )\n",
+ " for transaction in train_dataset.cases\n",
+ " # Generate 2 samples with 4 hard negatives each for each transaction\n",
+ " for _ in range(2)\n",
+ "]\n",
+ "\n",
+ "with open(\"./data/cohere_finetune.jsonl\", \"w\") as f:\n",
+ " for item in finetuning_data:\n",
+ " f.write(item.model_dump_json() + \"\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Finetuning Our Model\n",
+ "\n",
+ "Now that we've created our dataset, we can upload it to Cohere and kick off our fine-tuning job. \n",
+ "\n",
+ "### Uploading Our Dataset\n",
+ "\n",
+ "A reminder that the fine-tuning itself will take around 1 hour to a day so you'll need to come back to this notebook later down the line when this is done.\n",
+ "\n",
+ "Once the dataset has a status `validated`, we can kick off our fine-tuning job."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "...\n",
+ "...\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "'validated'"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import cohere\n",
+ "\n",
+ "co = cohere.ClientV2()\n",
+ "\n",
+ "reranked_dataset = co.datasets.create(\n",
+ " name=\"Synthetic Transactions Finetune\",\n",
+ " data=open(\"./data/cohere_finetune.jsonl\", \"rb\"),\n",
+ " type=\"reranker-finetune-input\",\n",
+ ")\n",
+ "\n",
+ "co.wait(reranked_dataset).dataset.validation_status"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Submitting Our Finetuning Job\n",
+ "\n",
+ "Now that we've uploaded our dataset, we can kick off our fine-tuning job. Make sure to indicate that you're doing a re-ranker finetune when creating the job. You can check the status of your job at any time by calling `co.finetuning.get_finetuned_model` or visiting the Cohere dashboard."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from cohere.finetuning import BaseModel, FinetunedModel, Settings\n",
+ "\n",
+ "finetune_request = co.finetuning.create_finetuned_model(\n",
+ " request=FinetunedModel(\n",
+ " name=\"finetuned-cohere-reranker\",\n",
+ " settings=Settings(\n",
+ " base_model=BaseModel(base_type=\"BASE_TYPE_RERANK\"),\n",
+ " dataset_id=reranked_dataset.id,\n",
+ " ),\n",
+ " )\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'STATUS_READY'"
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "response = co.finetuning.get_finetuned_model(finetune_request.finetuned_model.id)\n",
+ "response.finetuned_model.status"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Benchmarking our Model\n",
+ "\n",
+ "We want to quantify the improvement that fine-tuning a model gets us. In order to do so, we'll be using the same retrieval evals as before to benchmark our fine-tuned model. Since we're building a model here that will suggest relevant categories for a given transaction, we'll be measuring the following two metrics. \n",
+ "\n",
+ "- Recall : Whether the correct category is in the top N results\n",
+ "- MRR : The mean reciprocal rank of the correct category in the top N results\n",
+ "\n",
+ "We want to mainly measure two things\n",
+ "\n",
+ "- How much of an improvement does a fine-tuned model get us over a pure embedding based approach\n",
+ "- How does the fine-tuned model perform against the default Cohere re-ranker\n",
+ "\n",
+ "In order to do so, we'll be benchmarking our fine-tuned model against the default text-embedding-3-small model as well as the default Cohere re-ranker. \n",
+ "\n",
+ "We'll use `Logfire` here to run our evaluations and compare the results between our different configurations since it's where we've stored our evaluation data and provides an easy way to share our results with others.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import lancedb\n",
+ "\n",
+ "db = lancedb.connect(\"./lancedb\")\n",
+ "table = db.open_table(\"categories\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from helpers import get_metrics_at_k, task\n",
+ "from dataclasses import dataclass\n",
+ "from pydantic_evals.evaluators import Evaluator, EvaluatorContext\n",
+ "from lancedb.table import Table\n",
+ "from concurrent.futures import ThreadPoolExecutor\n",
+ "from functools import partial\n",
+ "import logfire\n",
+ "import asyncio\n",
+ "\n",
+ "\n",
+ "@dataclass\n",
+ "class RagMetricsEvaluator(Evaluator):\n",
+ " async def evaluate(self, ctx: EvaluatorContext[str, str]) -> dict[str, float]:\n",
+ " predictions = ctx.output\n",
+ " labels = ctx.expected_output\n",
+ " metrics = get_metrics_at_k(metrics=[\"mrr\", \"recall\"], sizes=[1, 3, 5])\n",
+ " return {\n",
+ " metric: score_fn(predictions, labels)\n",
+ " for metric, score_fn in metrics.items()\n",
+ " }\n",
+ "\n",
+ "\n",
+ "async def retrieve_results(\n",
+ " question: str,\n",
+ " table: Table,\n",
+ " pool: ThreadPoolExecutor,\n",
+ " max_k=25,\n",
+ " reranker=None,\n",
+ "):\n",
+ " loop = asyncio.get_running_loop()\n",
+ " return await loop.run_in_executor(\n",
+ " pool,\n",
+ " partial(task, user_query=question, table=table, max_k=max_k, reranker=reranker),\n",
+ " )\n",
+ "\n",
+ "\n",
+ "logfire.configure(\n",
+ " send_to_logfire=True,\n",
+ " environment=\"experimentation\",\n",
+ " service_name=\"synthetic-transactions\",\n",
+ " console=False,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from lancedb.rerankers import CohereReranker\n",
+ "\n",
+ "\n",
+ "rerankers = [\n",
+ " # Remember to replace this with your fine-tuned model id\n",
+ " CohereReranker(model_name=\"0486d248-8476-40f6-a5bf-1ef6fd8a65dd-ft\"),\n",
+ " CohereReranker(model_name=\"rerank-english-v3.0\"),\n",
+ " None,\n",
+ "]\n",
+ "\n",
+ "results = []\n",
+ "\n",
+ "\n",
+ "# eval_dataset.add_evaluator(RagMetricsEvaluator())\n",
+ "\n",
+ "for reranker in rerankers:\n",
+ " with ThreadPoolExecutor(max_workers=10) as executor:\n",
+ " evaluation_result = await eval_dataset.evaluate(\n",
+ " partial(\n",
+ " retrieve_results,\n",
+ " table=table,\n",
+ " pool=executor,\n",
+ " reranker=reranker,\n",
+ " max_k=25,\n",
+ " )\n",
+ " )\n",
+ " results.append(evaluation_result)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@1 0.803\n",
+ "mrr@3 0.8788\n",
+ "mrr@5 0.8818\n",
+ "recall@1 0.803\n",
+ "recall@3 0.9697\n",
+ "recall@5 0.9848\n",
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@1 0.1515\n",
+ "mrr@3 0.2146\n",
+ "mrr@5 0.2177\n",
+ "recall@1 0.1515\n",
+ "recall@3 0.303\n",
+ "recall@5 0.3182\n",
+ "Metric Score\n",
+ "-------- -------\n",
+ "mrr@1 0.3788\n",
+ "mrr@3 0.4798\n",
+ "mrr@5 0.5131\n",
+ "recall@1 0.3788\n",
+ "recall@3 0.6061\n",
+ "recall@5 0.7424\n"
+ ]
+ }
+ ],
+ "source": [
+ "from tabulate import tabulate\n",
+ "\n",
+ "\n",
+ "def format_results(result):\n",
+ " return tabulate(\n",
+ " [[item, round(value, 4)] for item, value in result.averages().scores.items()],\n",
+ " headers=[\"Metric\", \"Score\"],\n",
+ " )\n",
+ "\n",
+ "\n",
+ "for result in results:\n",
+ " print(format_results(result))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Analysing Our Results\n",
+ "\n",
+ "Now that we've run our evaluations, let's take a closer look at the results. We want to compare the performance of our fine-tuned model against the default Cohere re-ranker as well as the default vector search baseline without a re-ranker."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " mrr@1 \n",
+ " mrr@3 \n",
+ " mrr@5 \n",
+ " recall@1 \n",
+ " recall@3 \n",
+ " recall@5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Fine-Tuned Reranker \n",
+ " 0.83 \n",
+ " 0.88 \n",
+ " 0.89 \n",
+ " 0.83 \n",
+ " 0.94 \n",
+ " 0.98 \n",
+ " \n",
+ " \n",
+ " Default Reranker \n",
+ " 0.24 \n",
+ " 0.29 \n",
+ " 0.30 \n",
+ " 0.24 \n",
+ " 0.35 \n",
+ " 0.41 \n",
+ " \n",
+ " \n",
+ " No Reranker \n",
+ " 0.39 \n",
+ " 0.52 \n",
+ " 0.54 \n",
+ " 0.39 \n",
+ " 0.67 \n",
+ " 0.77 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " mrr@1 mrr@3 mrr@5 recall@1 recall@3 recall@5\n",
+ "Fine-Tuned Reranker 0.83 0.88 0.89 0.83 0.94 0.98\n",
+ "Default Reranker 0.24 0.29 0.30 0.24 0.35 0.41\n",
+ "No Reranker 0.39 0.52 0.54 0.39 0.67 0.77"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "scores = []\n",
+ "\n",
+ "for result in results:\n",
+ " result_scores = {}\n",
+ " for score_name, score in result.summary.scores.items():\n",
+ " result_scores[score_name] = score.score\n",
+ " scores.append(result_scores)\n",
+ "\n",
+ "df = pd.DataFrame(\n",
+ " scores, index=[\"Fine-Tuned Reranker\", \"Default Reranker\", \"No Reranker\"]\n",
+ ")\n",
+ "df.round(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " mrr@1 \n",
+ " mrr@3 \n",
+ " mrr@5 \n",
+ " recall@1 \n",
+ " recall@3 \n",
+ " recall@5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Fine-Tuned Reranker \n",
+ " 0.83 \n",
+ " 0.88 \n",
+ " 0.89 \n",
+ " 0.83 \n",
+ " 0.94 \n",
+ " 0.98 \n",
+ " \n",
+ " \n",
+ " Default Reranker \n",
+ " 0.24 \n",
+ " 0.29 \n",
+ " 0.30 \n",
+ " 0.24 \n",
+ " 0.35 \n",
+ " 0.41 \n",
+ " \n",
+ " \n",
+ " No Reranker \n",
+ " 0.39 \n",
+ " 0.52 \n",
+ " 0.54 \n",
+ " 0.39 \n",
+ " 0.67 \n",
+ " 0.77 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " mrr@1 mrr@3 mrr@5 recall@1 recall@3 recall@5\n",
+ "Fine-Tuned Reranker 0.83 0.88 0.89 0.83 0.94 0.98\n",
+ "Default Reranker 0.24 0.29 0.30 0.24 0.35 0.41\n",
+ "No Reranker 0.39 0.52 0.54 0.39 0.67 0.77"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df = pd.DataFrame(\n",
+ " results.values(), index=[\"Fine-Tuned Reranker\", \"Default Reranker\", \"No Reranker\"]\n",
+ ")\n",
+ "df.round(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "\n",
+ "# Create figure with two subplots side by side\n",
+ "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 5))\n",
+ "\n",
+ "# Plot MRR scores\n",
+ "mrr_cols = [\"mrr@1\", \"mrr@3\", \"mrr@5\"]\n",
+ "x = np.arange(len(mrr_cols))\n",
+ "width = 0.25\n",
+ "\n",
+ "for i, model in enumerate(df.index):\n",
+ " offset = (i - 1) * width\n",
+ " ax1.bar(x + offset, df.loc[model, mrr_cols], width, label=model)\n",
+ "\n",
+ "ax1.set_title(\"Mean Reciprocal Rank (MRR)\")\n",
+ "ax1.set_xticks(x)\n",
+ "ax1.set_xticklabels(mrr_cols)\n",
+ "ax1.set_ylabel(\"Score\")\n",
+ "ax1.legend()\n",
+ "ax1.grid(True, alpha=0.3)\n",
+ "\n",
+ "# Plot Recall scores\n",
+ "recall_cols = [\"recall@1\", \"recall@3\", \"recall@5\"]\n",
+ "x = np.arange(len(recall_cols))\n",
+ "\n",
+ "for i, model in enumerate(df.index):\n",
+ " offset = (i - 1) * width\n",
+ " ax2.bar(x + offset, df.loc[model, recall_cols], width, label=model)\n",
+ "\n",
+ "ax2.set_title(\"Recall\")\n",
+ "ax2.set_xticks(x)\n",
+ "ax2.set_xticklabels(recall_cols)\n",
+ "ax2.set_ylabel(\"Score\")\n",
+ "ax2.legend()\n",
+ "ax2.grid(True, alpha=0.3)\n",
+ "\n",
+ "plt.tight_layout()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this notebook, we demonstrated how fine-tuning a Cohere re-ranker can significantly improve retrieval performance. With just 256 synthetic examples, our fine-tuned model showed a 60% increase in recall and a 49% increase in mean reciprocal rank (MRR) compared to the text-embedding-3-small model.\n",
+ "\n",
+ "This substantial improvement highlights two key considerations for future fine-tuning efforts:\n",
+ "\n",
+ "1. **Model Selection**: Explore various re-ranker options, including English and multi-lingual models from providers like Cohere and Jina. Experiment to find the best fit for your specific use case.\n",
+ "\n",
+ "2. **Dataset Quality**: Move beyond simple random negative selection. Consider more sophisticated approaches like using cosine similarity or leveraging language models to identify hard negatives.\n",
+ "\n",
+ "The fast, objective metrics we established in Week 1 were crucial in our analysis. They allowed us to quickly benchmark our fine-tuned model against both the default Cohere re-ranker and the text-embedding-3-small model. Surprisingly, we discovered that the default Cohere re-ranker actually degraded performance while increasing latency.\n",
+ "\n",
+ "In Week 4, we'll use similar approaches to discover query patterns with BERTopic, and in Week 5, we'll apply these techniques to structured data and metadata. These are applications where a fine-tuned re-ranker can provide a significant improvement for, especially when at the start when we have a limited amount of user data.As you accumulate more user data, that's when you might want to start looking at fine-tuning an open-source model using the Sentence Transformers library which introduces new challenges such as managing hyper-parameters, loss functions and training loops which we'll explore in the next notebook."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.6"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/cohort_2/week2/3. Open Source Models_logfire.ipynb b/cohort_2/week2/3. Open Source Models_logfire.ipynb
new file mode 100644
index 0000000..7da14f8
--- /dev/null
+++ b/cohort_2/week2/3. Open Source Models_logfire.ipynb
@@ -0,0 +1,873 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The autoreload extension is already loaded. To reload it, use:\n",
+ " %reload_ext autoreload\n"
+ ]
+ }
+ ],
+ "source": [
+ "%load_ext autoreload\n",
+ "%autoreload 2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you're running the notebook in colab, make sure that you run the following cell. \n",
+ "\n",
+ "**There is no need to do so if you're cloning the repository locally**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " % Total % Received % Xferd Average Speed Time Time Time Current\n",
+ " Dload Upload Total Spent Left Speed\n",
+ "100 12543 100 12543 0 0 406k 0 --:--:-- --:--:-- --:--:-- 408k\n",
+ " % Total % Received % Xferd Average Speed Time Time Time Current\n",
+ " Dload Upload Total Spent Left Speed\n",
+ "100 97327 100 97327 0 0 1214k 0 --:--:-- --:--:-- --:--:-- 1218k\n",
+ " % Total % Received % Xferd Average Speed Time Time Time Current\n",
+ " Dload Upload Total Spent Left Speed\n",
+ "100 18441 100 18441 0 0 590k 0 --:--:-- --:--:-- --:--:-- 600k\n",
+ " % Total % Received % Xferd Average Speed Time Time Time Current\n",
+ " Dload Upload Total Spent Left Speed\n",
+ "100 68646 100 68646 0 0 107k 0 --:--:-- --:--:-- --:--:-- 107k\n",
+ " % Total % Received % Xferd Average Speed Time Time Time Current\n",
+ " Dload Upload Total Spent Left Speed\n",
+ "100 1450 100 1450 0 0 43896 0 --:--:-- --:--:-- --:--:-- 45312\n"
+ ]
+ }
+ ],
+ "source": [
+ "!mkdir -p data\n",
+ "!curl -L -o ./data/categories.json https://raw.githubusercontent.com/567-labs/systematically-improving-rag/main/cohort_2/week2/data/categories.json\n",
+ "!curl -L -o ./data/cleaned.jsonl https://raw.githubusercontent.com/567-labs/systematically-improving-rag/main/cohort_2/week2/data/cleaned.jsonl\n",
+ "!curl -L -o ./data/eval_transactions.jsonl https://raw.githubusercontent.com/567-labs/systematically-improving-rag/main/cohort_2/week2/data/eval_transactions.jsonl\n",
+ "!curl -L -o ./data/train_transactions.jsonl https://raw.githubusercontent.com/567-labs/systematically-improving-rag/main/cohort_2/week2/data/train_transactions.jsonl\n",
+ "!curl -L -o ./helpers.py https://raw.githubusercontent.com/567-labs/systematically-improving-rag/main/cohort_2/week2/helpers.py"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "ModuleNotFoundError",
+ "evalue": "No module named 'google.colab'",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[6], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mgoogle\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mcolab\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m userdata\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mos\u001b[39;00m\n\u001b[1;32m 4\u001b[0m os\u001b[38;5;241m.\u001b[39menviron[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mHF_TOKEN\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m userdata\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mHF_TOKEN\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
+ "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'google.colab'"
+ ]
+ }
+ ],
+ "source": [
+ "from google.colab import userdata\n",
+ "import os\n",
+ "\n",
+ "os.environ[\"HF_TOKEN\"] = userdata.get(\"HF_TOKEN\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip install sentence-transformers===3.1.1 transformers==4.45.2 pydantic-evals lancedb datasets\n",
+ "!pip uninstall wandb -y"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Week 2 : Fine-tuning Open Source Embedding Models\n",
+ "\n",
+ "> **Prerequisites** : Before running this notebook, make sure that you've completed the previous two notebooks in this week - [1. Synthetic Transactions.ipynb](1. Synthetic Transactions.ipynb) and [2. Finetune Cohere.ipynb](2. Finetune Cohere.ipynb). This notebook will build on top of the previous two notebooks.\n",
+ "> \n",
+ "> You must have a hugging face token with write access set as an environment variable. If you don't have one, you can create one by following the instructions [here](https://huggingface.co/docs/hub/en/security-tokens).\n",
+ "\n",
+ "[](https://colab.research.google.com/github/567-labs/systematically-improving-rag/blob/main/cohort_2/week2/3.%20Open%20Source%20Models.ipynb)\n",
+ "\n",
+ "\n",
+ "After exploring managed services, let's dive into fine-tuning open source embedding models. While this approach requires more setup, it offers greater control and potential cost savings.\n",
+ "\n",
+ "## Why This Matters\n",
+ "\n",
+ "Fine-tuning open source models gives you full control over how your model learns and behaves. While we'll use a small synthetic dataset of 256 examples for this tutorial, real-world applications benefit greatly from larger private datasets - the larger the better. This is because larger datasets allow you to capture specific domain knowledge and relationships that general models might miss.\n",
+ "\n",
+ "The main advantage of open source fine-tuning is the cost savings at inference time. Once trained, you can run these models on your own infrastructure without paying per-query fees. This makes them especially attractive for high-volume applications. We'll use sentence-transformers since it offers robust training options, works seamlessly with popular model hubs, and has strong community support.\n",
+ "\n",
+ "## What You'll Learn\n",
+ "\n",
+ "Though this hands on tutorial, you learn how to\n",
+ "\n",
+ "This notebook walks through:\n",
+ "\n",
+ "1. **Dataset Preparation**\n",
+ " - Creating train/test/eval splits\n",
+ " - Formatting data for triplet loss\n",
+ " - Setting up evaluation metrics\n",
+ "\n",
+ "2. **Model Fine-tuning**\n",
+ " - Configuring training arguments\n",
+ " - Setting up loss functions\n",
+ " - Training and monitoring progress\n",
+ " \n",
+ "3. **Performance Evaluation**\n",
+ " - Comparing against base model\n",
+ " - Measuring recall and MRR improvements\n",
+ " - Analyzing trade-offs\n",
+ "\n",
+ "\n",
+ "By the end of this notebook, you'll have a better understanding of when you might want to consider fine-tuning an open source embedding model, how you can do so using the `sentence-transformers` library and how to evaluate the performance of your fine-tuned model.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Setup\n",
+ "\n",
+ "In this section, we'll be fine-tuning the `BAAI/bge-base-en` model. We'll be using the `BatchSemiHardTripletLoss` loss function to train our model. Let's first understand how this loss function works before we dive into the code\n",
+ "\n",
+ "### Understanding Semi-Hard Triplet Loss\n",
+ "\n",
+ "\n",
+ "A Batch Semi Hard Triplet Loss works by taking a batch of sentence pairs and computing the loss for all possible valid triplets, then identifying semi-hard positives and negatives. A semi-hard negative is an example that is not as close to the anchor as the positive example, but is still close to the anchor than the negative example. \n",
+ "It works with three pieces:\n",
+ "\n",
+ "1. An anchor (your main example) - in our case, a transaction description\n",
+ "2. A positive match (something similar) - the correct category\n",
+ "3. A negative match (something different) - incorrect categories that are close, but not quite right\n",
+ "\n",
+ "We can see an example of this in the image below where we have an achor, a positive match and a negative match. The negative match is not as close to the anchor as the positive match, but is still close to the anchor than the negative match.\n",
+ "\n",
+ " \n",
+ "\n",
+ "For our transaction data, this translates to:\n",
+ "- Anchor: Transaction description\n",
+ "- Positive: Correct category\n",
+ "- Negative: Similar but incorrect categories\n",
+ "\n",
+ "It's important here to note that the reason why we want an example that's **semi-hard** is because we want to find negative examples that are tricky. They need to be different as compared to the anchor, but not too different. This ultimately helps the model learn to distinguish between similar and dissimilar examples.\n",
+ "\n",
+ "\n",
+ " \n",
+ "\n",
+ "Using our Cohere dataset format:\n",
+ "\n",
+ "1. The transaction description becomes our anchor (query)\n",
+ "2. The correct category is our positive match (relevant_passages)\n",
+ "3. Similar but wrong categories are our negative matches (hard_negatives)\n",
+ "\n",
+ "While a single training run can provide a baseline, exploring hyperparameter optimization—through techniques like grid or random search—can significantly enhance model performance, especially when executed on a larger scale with appropriate computational resources. Here's a quick example of [how we can do this using `modal` to run a grid search over all possible parameters](https://modal.com/blog/fine-tuning-embeddings)\n",
+ "\n",
+ "\n",
+ "### Declaring Constants\n",
+ "\n",
+ "When writing fine-tuning code, we want to declare our constants up front. This ensures that we have a consistent set of parameters to use when training our model and makes it easy for us to change them later."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# To resolve the warning from huggingface/tokenizers about parallelism:\n",
+ "# 1. Avoid using `tokenizers` before the fork if possible.\n",
+ "# 2. Explicitly set the environment variable TOKENIZERS_PARALLELISM to either 'true' or 'false'.\n",
+ "import os\n",
+ "\n",
+ "# Set the environment variable to disable parallelism warnings\n",
+ "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
+ "\n",
+ "TEST_SIZE = 0.2\n",
+ "BASE_MODEL_NAME = \"BAAI/bge-base-en\"\n",
+ "FINETUNED_MODEL_NAME = \"ivanleomk/finetuned-bge-base-en\"\n",
+ "\n",
+ "MODEL_OUTPUT_DIR = \"./models/bge-base-en\"\n",
+ "CATEGORIES_PATH = \"data/categories.json\"\n",
+ "TRAIN_EVALUATOR_NAME = \"bge-base-en-train\"\n",
+ "EVAL_EVALUATOR_NAME = \"bge-base-en-eval\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Preparing the Dataset\n",
+ "\n",
+ "While we use the same dataset as before, we need to handle a lot more configuration when fine-tuning open source models in order for us to evaluate our model's performance while training as well as to use the `BatchSemiHardTripletLoss` loss function.\n",
+ "\n",
+ "To do so, we'll format our original train and eval split into a train, test and eval split. \n",
+ "\n",
+ "- Train : This is only used to train the model\n",
+ "- Test : This is used to evaluate the model during training\n",
+ "- Eval : This is used to evaluate the model after training\n",
+ "\n",
+ "We want to have a separate set of data points set aside for testing our model during training or to use when evaluating different versions of our model. This ensures that our model does not overfit to the evaluation dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/Users/ivanleo/Documents/coding/systematically-improving-rag/cohort_2/.venv/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020\n",
+ " warnings.warn(\n",
+ "/Users/ivanleo/Documents/coding/systematically-improving-rag/cohort_2/.venv/lib/python3.9/site-packages/pydantic_evals/dataset.py:390: UserWarning: Could not determine the generic parameters for ; using `Any` for each. You should explicitly set the generic parameters via `Dataset[MyInputs, MyOutput, MyMetadata]` when serializing or deserializing.\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "(52, 208, 66)"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import json\n",
+ "from pydantic_evals import Dataset as EvalsDataset\n",
+ "\n",
+ "categories = json.load(open(CATEGORIES_PATH))\n",
+ "train_data_path = \"./data/train_transactions.yml\"\n",
+ "eval_data_path = \"./data/eval_transactions.yml\"\n",
+ "\n",
+ "train_data = EvalsDataset.from_file(train_data_path)\n",
+ "eval_data = EvalsDataset.from_file(eval_data_path)\n",
+ "\n",
+ "test_data = EvalsDataset(\n",
+ " cases=train_data.cases[: int(len(train_data.cases) * TEST_SIZE)]\n",
+ ")\n",
+ "train_data = EvalsDataset(\n",
+ " cases=train_data.cases[int(len(train_data.cases) * TEST_SIZE) :]\n",
+ ")\n",
+ "len(test_data.cases), len(train_data.cases), len(eval_data.cases)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from collections import defaultdict\n",
+ "import random\n",
+ "from datasets import Dataset\n",
+ "\n",
+ "\n",
+ "def create_labels(data: EvalsDataset):\n",
+ " label_to_example = defaultdict(list)\n",
+ "\n",
+ " for item in data.cases:\n",
+ " label_to_example[item.expected_output[0]].append(item)\n",
+ "\n",
+ " return {label: idx for idx, label in enumerate(label_to_example.keys())}\n",
+ "\n",
+ "\n",
+ "def create_sentence_to_label_dataset(data: EvalsDataset, label_to_idx):\n",
+ " return Dataset.from_dict(\n",
+ " {\n",
+ " \"sentence\": [item.inputs for item in data.cases],\n",
+ " \"label\": [label_to_idx[item.expected_output[0]] for item in data.cases],\n",
+ " }\n",
+ " )\n",
+ "\n",
+ "\n",
+ "def create_triplet_dataset(data: EvalsDataset):\n",
+ " label_to_example = defaultdict(list)\n",
+ "\n",
+ " for item in data.cases:\n",
+ " label_to_example[item.expected_output[0]].append(item)\n",
+ "\n",
+ " labels = set(label_to_example.keys())\n",
+ "\n",
+ " anchors = []\n",
+ " positives = []\n",
+ " negatives = []\n",
+ "\n",
+ " for item in data.cases:\n",
+ " label = item.expected_output[0]\n",
+ " anchor = item.inputs\n",
+ " positive = label\n",
+ " negative = random.choice([item for item in labels if item != label])\n",
+ " anchors.append(anchor)\n",
+ " positives.append(positive)\n",
+ " negatives.append(negative)\n",
+ "\n",
+ " return {\"anchor\": anchors, \"positive\": positives, \"negative\": negatives}\n",
+ "\n",
+ "\n",
+ "labels_to_idx = create_labels(train_data)\n",
+ "\n",
+ "train_triplets = create_triplet_dataset(train_data)\n",
+ "test_triplets = create_triplet_dataset(test_data)\n",
+ "eval_triplets = create_triplet_dataset(eval_data)\n",
+ "\n",
+ "sentence_to_label_train_dataset = create_sentence_to_label_dataset(\n",
+ " train_data, labels_to_idx\n",
+ ")\n",
+ "sentence_to_label_test_dataset = create_sentence_to_label_dataset(\n",
+ " test_data, labels_to_idx\n",
+ ")\n",
+ "sentence_to_label_eval_dataset = create_sentence_to_label_dataset(\n",
+ " eval_data, labels_to_idx\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Fine-Tuning\n",
+ "\n",
+ "Now that we have our training data formatted in the right format, we can start training our model. We'll do so in 3 steps\n",
+ "\n",
+ "1. First we'll declare training arguments - we're using the default arguments provided in their documentation but ideally you'd want to experiment and tinkker with different configurations\n",
+ "\n",
+ "2. Next we'll start a training run with `trainer.train()`\n",
+ "\n",
+ "3. Finally, we'll train our model before uploading it to the Hugging Face model hub."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'bge-base-en-train_cosine_accuracy': 0.8269230769230769,\n",
+ " 'bge-base-en-train_dot_accuracy': 0.17307692307692307,\n",
+ " 'bge-base-en-train_manhattan_accuracy': 0.8269230769230769,\n",
+ " 'bge-base-en-train_euclidean_accuracy': 0.8269230769230769,\n",
+ " 'bge-base-en-train_max_accuracy': 0.8269230769230769}"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sentence_transformers import (\n",
+ " SentenceTransformer,\n",
+ " SentenceTransformerTrainer,\n",
+ " SentenceTransformerTrainingArguments,\n",
+ ")\n",
+ "from sentence_transformers.losses import BatchSemiHardTripletLoss\n",
+ "from sentence_transformers.training_args import BatchSamplers\n",
+ "from sentence_transformers.evaluation import TripletEvaluator\n",
+ "\n",
+ "model = SentenceTransformer(BASE_MODEL_NAME)\n",
+ "loss = BatchSemiHardTripletLoss(model)\n",
+ "args = SentenceTransformerTrainingArguments(\n",
+ " # Required parameter:\n",
+ " output_dir=MODEL_OUTPUT_DIR,\n",
+ " num_train_epochs=5,\n",
+ " per_device_train_batch_size=16,\n",
+ " per_device_eval_batch_size=16,\n",
+ " learning_rate=2e-5,\n",
+ " warmup_ratio=0.1,\n",
+ " fp16=False, # Set to False if you get an error that your GPU can't run on FP16\n",
+ " bf16=False, # Set to True if you have a GPU that supports BF16\n",
+ " batch_sampler=BatchSamplers.NO_DUPLICATES, # MultipleNegativesRankingLoss benefits from no duplicate samples in a batch\n",
+ " eval_strategy=\"steps\",\n",
+ " eval_steps=100,\n",
+ " save_strategy=\"steps\",\n",
+ " save_steps=100,\n",
+ " save_total_limit=2,\n",
+ " logging_steps=100,\n",
+ ")\n",
+ "\n",
+ "train_evaluator = TripletEvaluator(\n",
+ " anchors=train_triplets[\"anchor\"],\n",
+ " positives=train_triplets[\"positive\"],\n",
+ " negatives=train_triplets[\"negative\"],\n",
+ " name=TRAIN_EVALUATOR_NAME,\n",
+ ")\n",
+ "\n",
+ "train_evaluator(model)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "fe44fd0d72f74d9f99ac65c31e9cc1dd",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ " 0%| | 0/65 [00:00, ?it/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "b3fb40ef22404e11b2d56f9ed600030c",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "Computing widget examples: 0%| | 0/1 [00:00, ?example/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{'train_runtime': 10.2381, 'train_samples_per_second': 101.581, 'train_steps_per_second': 6.349, 'train_loss': 4.90222402719351, 'epoch': 5.0}\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "TrainOutput(global_step=65, training_loss=4.90222402719351, metrics={'train_runtime': 10.2381, 'train_samples_per_second': 101.581, 'train_steps_per_second': 6.349, 'total_flos': 0.0, 'train_loss': 4.90222402719351, 'epoch': 5.0})"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trainer = SentenceTransformerTrainer(\n",
+ " model=model,\n",
+ " args=args,\n",
+ " train_dataset=sentence_to_label_train_dataset,\n",
+ " eval_dataset=sentence_to_label_test_dataset,\n",
+ " loss=loss,\n",
+ " evaluator=train_evaluator,\n",
+ ")\n",
+ "\n",
+ "trainer.train()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'bge-base-en-eval_cosine_accuracy': 0.9696969696969697,\n",
+ " 'bge-base-en-eval_dot_accuracy': 0.030303030303030304,\n",
+ " 'bge-base-en-eval_manhattan_accuracy': 0.9696969696969697,\n",
+ " 'bge-base-en-eval_euclidean_accuracy': 0.9696969696969697,\n",
+ " 'bge-base-en-eval_max_accuracy': 0.9696969696969697}"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "test_evaluator = TripletEvaluator(\n",
+ " anchors=eval_triplets[\"anchor\"],\n",
+ " positives=eval_triplets[\"positive\"],\n",
+ " negatives=eval_triplets[\"negative\"],\n",
+ " name=EVAL_EVALUATOR_NAME,\n",
+ ")\n",
+ "test_evaluator(model)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "a47bc8065f934923bd56f93717d3e7f1",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "model.safetensors: 0%| | 0.00/438M [00:00, ?B/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": [
+ "'https://huggingface.co/ivanleomk/finetuned-bge-base-en/commit/e377421e7ddc5908c3a18758c5b9d7681dce5850'"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "model.save_pretrained(f\"models/finetuned-{BASE_MODEL_NAME}\")\n",
+ "model.push_to_hub(FINETUNED_MODEL_NAME, exist_ok=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Evaluation\n",
+ "\n",
+ "Here we use `lancedb` again to evaluate our model. It comes with out of the box support for hugging face models, which makkes it incredibly easy for us to evaluate our fine-tuned model vs the base model. \n",
+ "\n",
+ "We'll similarly use `pydantic-evals` to evaluate our model and compare it against the base model by looking at recall and mrr @1,3,5."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import lancedb\n",
+ "from lancedb.pydantic import LanceModel, Vector\n",
+ "from lancedb.embeddings import get_registry\n",
+ "\n",
+ "\n",
+ "def create_lancedb_table(model_name: str, categories: list[str]):\n",
+ " model = get_registry().get(\"huggingface\").create(name=model_name)\n",
+ "\n",
+ " class Category(LanceModel):\n",
+ " text: str = model.SourceField()\n",
+ " embedding: Vector(model.ndims()) = model.VectorField()\n",
+ "\n",
+ " db = lancedb.connect(\"./lancedb\")\n",
+ " table_name = f\"categories-{model_name.replace('/', '-')}\"\n",
+ " if table_name in db.table_names():\n",
+ " table = db.open_table(table_name)\n",
+ " else:\n",
+ " table = db.create_table(table_name, schema=Category, mode=\"overwrite\")\n",
+ " table.add(\n",
+ " [\n",
+ " {\n",
+ " \"text\": category[\"category\"],\n",
+ " }\n",
+ " for category in categories\n",
+ " ]\n",
+ " )\n",
+ "\n",
+ " return table\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from helpers import get_metrics_at_k, task\n",
+ "from dataclasses import dataclass\n",
+ "from pydantic_evals.evaluators import Evaluator, EvaluatorContext\n",
+ "from lancedb.table import Table\n",
+ "from concurrent.futures import ThreadPoolExecutor\n",
+ "from functools import partial\n",
+ "import logfire\n",
+ "import asyncio\n",
+ "\n",
+ "\n",
+ "@dataclass\n",
+ "class RagMetricsEvaluator(Evaluator):\n",
+ " async def evaluate(self, ctx: EvaluatorContext[str, str]) -> dict[str, float]:\n",
+ " predictions = ctx.output\n",
+ " labels = ctx.expected_output\n",
+ " metrics = get_metrics_at_k(metrics=[\"mrr\", \"recall\"], sizes=[1, 3, 5])\n",
+ " return {\n",
+ " metric: score_fn(predictions, labels)\n",
+ " for metric, score_fn in metrics.items()\n",
+ " }\n",
+ "\n",
+ "\n",
+ "async def retrieve_results(\n",
+ " question: str,\n",
+ " table: Table,\n",
+ " pool: ThreadPoolExecutor,\n",
+ " max_k=25,\n",
+ " reranker=None,\n",
+ "):\n",
+ " loop = asyncio.get_running_loop()\n",
+ " return await loop.run_in_executor(\n",
+ " pool,\n",
+ " partial(task, user_query=question, table=table, max_k=max_k, reranker=reranker),\n",
+ " )\n",
+ "\n",
+ "\n",
+ "logfire.configure(\n",
+ " send_to_logfire=True,\n",
+ " environment=\"experimentation\",\n",
+ " service_name=\"synthetic-transactions\",\n",
+ " console=False,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import concurrent\n",
+ "from helpers import get_metrics_at_k, task\n",
+ "import json\n",
+ "\n",
+ "categories = json.load(open(CATEGORIES_PATH))\n",
+ "base_table = create_lancedb_table(BASE_MODEL_NAME, categories)\n",
+ "finetuned_table = create_lancedb_table(FINETUNED_MODEL_NAME, categories)\n",
+ "\n",
+ "db = lancedb.connect(\"./lancedb\")\n",
+ "\n",
+ "eval_data.evaluators = []\n",
+ "eval_data.add_evaluator(RagMetricsEvaluator())\n",
+ "\n",
+ "results = []\n",
+ "for query_table in [base_table, finetuned_table]:\n",
+ " with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:\n",
+ " result = await eval_data.evaluate(\n",
+ " partial(retrieve_results, table=query_table, pool=executor)\n",
+ " )\n",
+ " results.append(result)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " mrr@1 \n",
+ " mrr@3 \n",
+ " mrr@5 \n",
+ " recall@1 \n",
+ " recall@3 \n",
+ " recall@5 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " Base Model \n",
+ " 0.30 \n",
+ " 0.48 \n",
+ " 0.50 \n",
+ " 0.30 \n",
+ " 0.70 \n",
+ " 0.80 \n",
+ " \n",
+ " \n",
+ " Fine-Tuned Model \n",
+ " 0.55 \n",
+ " 0.69 \n",
+ " 0.71 \n",
+ " 0.55 \n",
+ " 0.86 \n",
+ " 0.94 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " mrr@1 mrr@3 mrr@5 recall@1 recall@3 recall@5\n",
+ "Base Model 0.30 0.48 0.50 0.30 0.70 0.80\n",
+ "Fine-Tuned Model 0.55 0.69 0.71 0.55 0.86 0.94"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "scores = []\n",
+ "\n",
+ "for result in results:\n",
+ " result_scores = {}\n",
+ " for score_name, score in result.averages().scores.items():\n",
+ " result_scores[score_name] = score\n",
+ " scores.append(result_scores)\n",
+ "\n",
+ "df = pd.DataFrame(scores, index=[\"Base Model\", \"Fine-Tuned Model\"])\n",
+ "df.round(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAABjUAAAHqCAYAAABMTMx9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8ekN5oAAAACXBIWXMAAA9hAAAPYQGoP6dpAABhr0lEQVR4nO3dCZiVVd0A8DPsouLGpqTiviWiqIhLtqC4pNJKZoKktChpkqmUsmiJW4R9mX4upJUmaWZ9aViR5oZimEsl7ogbAqmAGIsw3/M/ONMMM4MDAve+M7/f87zPcN/73vueux3e//mfpaKysrIyAQAAAAAAlLkWpS4AAAAAAABAY0hqAAAAAAAAhSCpAQAAAAAAFIKkBgAAAAAAUAiSGgAAAAAAQCFIagAAAAAAAIUgqQEAAAAAABSCpAYAAAAAAFAIkhoAAAAAAEAhSGoAkE444YTUvXv31BQU5bVUVFSkoUOHrvbj33777dS5c+d0ww03pHK2ZMmStOWWW6af/OQnpS4KAACskev4UaNGVd++7rrr8r7p06eXtFwAzYmkBlB4VReRsd1333117q+srMyNqnH/Jz/5yVTOojG+6rXEtv7666d99903/exnPyt10Zqcj370o7Xe6/XWWy/16NEjjRs3Li1btiyVu8suuyxtuOGG6Qtf+EL1vgiu4rW0aNEivfTSS3UeM2/evPw6V0yoRABW872Ix2+66abp8MMPT5MnT67zPFXnqdpat26dv7unnnpqeuutt2odG/cNGzYsff/7308LFy5c4+8DAABNN76LrVWrVqlbt26589Irr7xS6uIBUAZalboAAGtKu3bt0o033pgOPPDAWvv/+te/ppdffjm1bds2FUHPnj3Tt771rfzv1157LV1zzTVp0KBBadGiRWnIkCFr5ZxXX311IRry17QPfehDacyYMfnfc+bMyd+f008/Pc2ePTs3wpfz6IdIakRZW7ZsWef++K7/8pe/TGeeeWat/bfeeutKn/fYY49NRxxxRFq6dGl6+umn8+iKj33sY+nhhx9Ou+++e53jr7jiirTBBhukBQsWpEmTJqX/+Z//SY888kid5OLgwYPT2Wefnd/fL3/5y6v9ugEAaD7OO++8tM022+SOMQ8++GBOdsR15j/+8Y8c+wHQfBmpATQZ0Rh78803p3fffbfW/mhI7dWrV+ratWsqguiF9KUvfSlv3/72t/OFezQc//CHP1xr54ze9Gsy6RON3EWw0UYbVb/X3/zmN9M999yTtt5669w4Hw375er3v/99Trx8/vOfb/C3EEmNFcVv4cgjj2zweffaa6/8XkQSLZI68RyRTIvkRX0++9nP5uO/+tWvpl/96ldpwIAB6f77709TpkypddzGG2+cDj300ByIAgBAY8So4bjWPOmkk3JHrzPOOCM999xz6Xe/+12piwZAiUlqAE1G9DL/97//nf70pz9V71u8eHG65ZZb0he/+MV6HxOjE2K6od122y339unSpUtuoH3zzTdrHffb3/42NwZvscUWufF/u+22S+eff36dhu+Y0ujDH/5w+te//pV7uLdv3z4nKS6++OLVfl2dOnVKO++8c76AX52yhz/84Q/p4IMPztMVdejQIe2zzz65gbuhdSiqpiO69NJLczIlGvpj2qJ4jugZVVM8NpIuUb5oTI9zHHfccdXJjRh1EtN/xfu200475eeMKcFW9Itf/CJPtRXv2SabbJI+8pGPpD/+8Y+r/Bl8EPE+xnszf/78NGvWrOr9jz/+eH6d2267bT4mEmQx4iC+b/VNy/Tss8/m46MxPxInMVLhnXfeed/zf+9738tTP0VSZWVuu+22/HnFe1Cf+L4/+uijadq0adX7Zs6cmf7yl780+Fuoz0EHHZT/rvjdW53jDznkkJyge+ONNxp9fgAAWNm1ZlzvRkebmDo1rtP33nvvepMeMUVqjHKOa+iIJWLE9sCBA/No7aq4ccSIEbkzXFy/xzTAcb677rprHb5CABpLUgNoMuICtU+fPrV6qEdj/ty5c2utO1BTJAFiNMQBBxyQp/OJxudYeLlfv355ip8q0cM8Gu5jbYA4Li5246I3ptRZUSQVDjvssLTHHnukH/zgBzkhcdZZZ+WyrI4YeRLTZ0VD/+qWPZIB0Zg8fPjwdOGFF+YpriZOnPi+5461PH70ox+lU045JT82Ehof//jH0+uvv16njHHeWLg6khaf+cxncuLi6KOPzkmReD/Gjh2bkxpR5ngfaxo9enQ6/vjj84iRGGYetyMREo3wq/MZfBBVCZ1ISFSJRNnzzz+f3+NIOMT36aabbspJnPoSNDGCIhIjMbVV/DvKHq9pZc4555z8ev73f/83feMb31jpsQ888EAeVdGQSAhFoFYzcTVhwoT8/q1spMaKqhY7XPG7tzrHx+cV71WUHQAAVtWK15r//Oc/03777ZeefPLJHBNE7BXJiP79+6ff/OY31Y97++23c4IiruNj9HDEEl/72tdyQiTirKq152I0SHRSu+iii3JnpRgZHTFOdBYCoMxUAhTcT3/602hVrnz44Ycrf/zjH1duuOGGle+8806+73Of+1zlxz72sfzvrbfeuvLII4+sfty9996bH3fDDTfUer6JEyfW2V/1fDV99atfrWzfvn3lwoULq/cdfPDB+bE/+9nPqvctWrSosmvXrpWf+cxn3ve1RBkPPfTQytmzZ+ftiSeeqDz++OPzc55yyimrXPa33norvx+9e/eu/M9//lPr2GXLllX/e9CgQfncVV544YX8POutt17lyy+/XL3/oYceyvtPP/30Wo+NfWeffXat57/tttvy/u9973u19n/2s5+trKioqHz22Wfz7WeeeaayRYsWlZ/61Kcqly5d2mAZG/sZrPhaGhKf1c4771z9Xk+bNq3y29/+di5zze9JQ+f+5S9/mY+95557qveNHDky7/vyl79c69h4bZtttlmtfTU/029961v5Pbjuuuvet9xLlizJ7188ZkVV54/Xc8YZZ1Ruv/321ffts88+lYMHD65z7pqf9+jRo/NjZ86cmb9j8ZjYf/PNN9d7nqeeeiofP3369Mrx48fn70unTp0qFyxYUKdsr776an7MRRdd9L6vEQCA5qsqvvvzn/+crzVfeumlyltuuSVfZ7Zt2zbfDp/4xCcqd99991qxQMQP+++/f+UOO+xQvW/EiBH5+W699dY656qKN959990ct9X05ptvVnbp0qXOtX08V1wPr1jeuKYGYN0wUgNoUqJX/H/+85+85kD0lI+/DU23E+tvxNDimBYnhh1XbdGjPHq01xxqHFMvVYnnjeOit09MKVRzip8Qj425X6u0adMmT6sUPf0bI6ZciimnYovFmX/+85/nEQKXXHLJKpc9RhhEeaPn0oqL6cVohPcTvZxi+qwq8Tp69+6d7rjjjjrHfv3rX691O46JRaxPPfXUWvtjOqqIBapGrsRUSjGVVoxSiKmXGirjqnwGjRWPq3qvY0RNvMcxumTFtR9qnjsWKoxzR6+wEAtjryh6ftUU5YypqqIHWE3xPgwdOjT3Fovpt2Iti/cTI27ice83eiK+9zENVizyXfX3/aaeGjlyZH4vYnqtKHP0eosebzGkvz4x8iaOj1FSMR3X9ttvnz/XmEJsRVXlrRriDwAAK9O3b998rRkjuON6NEZhxNRSMSI5roljVHfVCOmqeCiuuWN0xTPPPJNeeeWV/Dy//vWv8yj6T33qU3XOURVvRNwScVuI2CSeP0ajx3RW9V3vA1BarUp8foA1Ki564+I3pt2Jxu5Yb6GhBtm40I2pqWLKpPrUXFMhhjbH9EBx4bxiw3Q8R01xkb1iwiAadGNdhsaIpEGsrRBlj+me4t8xpVXVRfaqlL1qvtlY52N17LDDDnX27bjjjnlR6JpatWqVX3dNL774Yl7/ItbYqGmXXXapvr+qjJHM2HXXXVdallX5DBorGuOvvvrqHLhEOWJx7BhmvmICKIKamD4qppyq+b1o6NxbbbVVvQ368TnGmiY1p/eK4fCxEHesCbMq6pv2qqY999wzJ2ritxBTaUWiIqYOW5mvfOUr6XOf+1xO3MT7HFOPrWzNkggQ4/XEexbHvvDCC7USQPWVtzHJNAAAuPzyy3PsEdfb48ePT/fcc09eDyNEp524vjz33HPzVp+4bo8OWnGdH9Pjvp/rr78+d+iJjk81p/PdZptt1uCrAmBNkNQAmpzojT5kyJC8MPLhhx9ea22EmqIhO5ICsQ5FQwmSqkXlYoHsaLyN9R5iceZo9I4eO7FWRjxPTdHLZ3Uaoat07NgxJ2ZC9DKKhulPfvKTuTd/1VoUjS37uhLBxYqjLNakVf0MGit6e1W91yHWJ4m1Kr7zne/kRvoq0QMs1oKI9UBiPZIYDRPnjLVC6jt3Y78Dcb6Yo/fHP/5xPkcscPh+4phIDNS3IHx9v4VImERiacCAAe/7GUUSq+r9iO9cvI4Y5ROL3kcvtfrW7ojvazjqqKPyyKJYJH7q1Kl1zlVV3qrjAQBgZWKUeNU1aIwgP/DAA/P17VNPPVV9DX7GGWfkmKk+MYq4sWLU9AknnJDPE9f8EWvFtXCskVdzYXIAyoOkBtDkxLDiWET7wQcfzIsjNyQaxv/85z/nhuWGepeHu+++Ow9jvvXWW3MjbpXolb4uxMLO0aB/wQUX5NcVDfGNLXscF2LEx6pc1NccEbKip59+Oo9weD9bb711LmMMB685WqNqqqi4v6qMEZT861//ygmDUn4GPXr0yFOHxWLdESDFiItojJ80aVIeqRFTZK3svVlV8ZlcfPHFeUHCSJDEeVYc2bKiGBUT71ljXnsEfVHm1157LU9jtqq++93v5pEsMULm/RaWj0RPTF8VU6XFSJ5YTL2mqvJWjdQBAIDGqkowRGeb6BAUU5+G1q1b1+qkVJ+4do54aGVuueWWtO222+Z4o+bI4ri+BaD8WFMDaHKicTV6p48aNSr3Hm9I9IyPqXXOP//8OvfF/KkxOqBmr/uavewXL16cfvKTn6R1JUYjRKN+NDCvStkPPfTQ3EgeAUBMKbSqI0divYuquWjDlClT0kMPPZRHwLyfI444Ipcxgo6afvjDH+ZAoeo5ojdU9OqPERgrjnqoKuO6/AzOPPPMPNx87NixDZ47jBs3bo0lUmL9kVi/Ir6vsSbM++nTp0/629/+9r7HRQAX5YzPP3q6raoY5RSJtDvvvDOPKHk/MUojpiG76KKL6twXozfic4+yAwDAqoqOQHFNG9e3MYI7bkdnpOjAs6KYHrVKTD312GOPpd/85jd1jltZvBFxz+TJk9fSqwHggzBSA2iSGrPgcox+iAbbaPCNBttIAERPn+iBHwtxx3RPsR7H/vvvn9dEiOeMRa+jYTZ6vTd2Oqk1IRIAsS5GNLSfcsopjS57XOxHEuGkk05K++yzT+65H68lLupjzZGYN/b9RhLEMO9YBHzRokU5gNhss81yw//7iQb66EkVvf2nT5+eF+eLRdB/+9vfpm9+85vVo0jiHHFMJGhicepPf/rTeTqrWNg61uSI17guP4NY2yMSMtdcc02enzdeb4wOiREVkeyIeXnjdazJUSKx6Hi8L3He+NwimRSfZ0OOOeaY/Ppj1EzMM7wyp5122gcqWzw+PvcLL7wwrymyMlHmOD6G7MfIjhh9UiUWrY+RRfF+AgDA6ojrzFgD7rrrrstrbkSsElOgxvTDMdLi9ddfz4mIl19+Occ8VY+JkRjxuBjh0atXr7xmXiw6fuWVV+Y4JaZejVEaMeo/RsrHtX7cF7FBrIEHQHkxUgNo1uJC9aqrrsqLyMU6CsOHD88LJMcURNEAG6IR9ve//33afPPN8zQ8l156aTrkkENyI/e6FNMhvfTSS9XraDSm7OHEE0/MF+yR4IjEQYz6iLUoGjPaYuDAgekb3/hGHm0Ri2jvtttu+RzxXryfGH0R540ERrx/8TemmLrkkkuqR0FUiVEasfhfjFKIBEdMmRQLiX/iE58oyWcQgc+CBQvS//zP/+Tbsdh2zNUbgVO8z9F4/4c//GGNnjMW8Y5pmyJhcvzxx690nZBIGMXaFCsu2L42RGIpkmERCDZmPuFYbHyjjTbKSZAqsbhjvK6YpxgAAFZXdICKzlERD+y000559HIkISLJEZ2/IkaKOKTmtLExkv/ee+/NHbVihHR0kooR3/H4GGUc4jo1pvuNREjcHyOVY52N+taVA6D0KirXZVdjAAohRlZss802OQERyRTKTySofvrTn+bROQ0tTF4uYqRHJKAiKbKyNWAAAAAA3o+RGgBQQKeffnoeCv9+U0KVWtX6JDHCRkIDAAAA+KCsqQEABRTD6GPqsXIXU3XNmDGj1MUAAAAAmggjNQAAAAAAgEKwpgYAAAAAAFAIRmoAAAAAAACFIKkBAAAAAAAUQrNbKHzZsmXp1VdfTRtuuGGqqKgodXEAAKCQYhbb+fPnpy222CK1aNG8+0qJMQAAYN3FGM0uqRHBxpZbblnqYgAAQJPw0ksvpQ996EOpORNjAADAuosxml1SI3pPVb0xHTp0KHVxKINedbNnz06dOnVq9j0MgdrUD0BD1A/LzZs3LzfkV11fN2diDGpSRwANUT8ADVE/rFqM0eySGlXDwSPYEHAQFcbChQvzd6E5VxhAXeoHoCHqh9pMtyTGoDZ1BNAQ9QPQEPXDqsUY3iEAAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKodmtqdFYS5cuTUuWLCl1MVgH89XF5xxz1hVhvro2bdoUopwAANQlxmgeihRjtG7dOrVs2bLUxQAAWCWSGiuorKxMM2fOTG+99Vapi8I6+rwj6Jg/f34hFrmMoGibbbbJyQ0AAIpBjNG8FC3G2HjjjVPXrl0LUVYAgCCpsYKqYKNz586pffv2LuyaQcDx7rvvplatWpX9Zx2B0auvvppee+21tNVWW5V9eQEAWE6M0bwUJcaIcr7zzjtp1qxZ+fbmm29e6iIBADSKpMYKw8Grgo3NNtus1MVhHShKwFGlU6dOObERZY6h4gAAlDcxRvNTpBhjvfXWy38jsRHfUVNRAQBFUN4TfK5jVfPbRu8pKEdV005FcAwAQPkTY1Duqr6b1nsBAIpCUqMe5d6bhubLdxMAoJhcx1GufDcBgKKR1AAAAAAAAApBUoMm07votttua/TxJ5xwQurfv/9aLRMAAFBM4gsAgPJlofBG6n727ev0fNMvPHKVjo+L6Ouvv7769qabbpr22WefdPHFF6cePXqkUrnuuuvS4MGD084775yefPLJWvfdfPPN6fOf/3zaeuut0/Tp00tWRgAAWNfEF6tHfAEAgJEaTchhhx2WXnvttbxNmjQptWrVKn3yk58sdbHS+uuvn2bNmpUmT55ca/+1116bttpqq5KVCwAAaJj4AgCAciSp0YS0bds2de3aNW89e/ZMZ599dnrppZfS7Nmzq48566yz0o477pjat2+ftt1223TuueemJUuWVN//2GOPpY997GNpww03TB06dEi9evVKf/vb36rvv++++9JBBx2U1ltvvbTlllumU089NS1YsGCl5Yrg54tf/GIaP3589b6XX3453X333Xn/iq644oq03XbbpTZt2qSddtop/fznP691/zPPPJM+8pGPpHbt2qVdd901/elPf6rzHPG6o5fWxhtvnHuVHXPMMXprAQDAKhBf/Jf4AgCgfEhqNFFvv/12+sUvfpG23377tNlmm1Xvj2Aihmz/61//Spdddlm6+uqr0w9/+MPq+4877rj0oQ99KD388MNp6tSpOXBp3bp1vu+5557LvbU+85nPpMcffzxNmDAhByFDhw593/J8+ctfTr/61a/SO++8k29HGeK5unTpUuu43/zmN+m0005L3/rWt9I//vGP9NWvfjUPL7/rrrvy/cuWLUuf/vSnc0Dy0EMPpSuvvDIHUjVFENWvX7/8Wu+99950//33pw022CCfb/HixR/wnQUAgOZHfCG+AAAoF9bUaEJ+//vf54vrEL2bNt9887yvRYv/5q7OOeec6n937949nXHGGemmm25KZ555Zt43Y8aM9O1vfzvPURt22GGH6uPHjBmTg5JvfvOb1ff96Ec/SgcffHDu/RQ9mxqy55575p5bt9xySzr++ONz0DF27Nj0/PPP1zru0ksvzfP3nnzyyfn2sGHD0oMPPpj3Rw+vP//5z2natGnpzjvvTFtssUU+5oILLkiHH3549XNEMBTByTXXXJMX+As//elPc6+q6L116KGHfqD3GQAAmgPxxXLiCwCA8mKkRhMSF+WPPvpo3qZMmZJ7E8XF+IsvvljrgvyAAw7IQ8gjQIkgJAKNKnGRf9JJJ6W+ffumCy+8MPeeqjl0PIKFeFzVFueIC/wXXnihUb2p4uL/r3/9aw6KjjjiiDrHxGJ/Ub6a4nbVIoDxN4alVwUcoU+fPrWOj3I+++yzuSdVVTljiPjChQtrvR4AAKBh4ov/llN8AQBQPsoiqXH55ZfnXj3RE6d37975grkhH/3oR3PvmBW3I488MjV3sWBeDAePbZ999sk9ieLiPoaAh1hIL3pCxcV+9LD6+9//nr773e/WGjI9atSo9M9//jO/n3/5y1/ynLIxZLtqyHkM164KbGKLC/yYgzbmqH0/ce7oFRXniN5UMRfu2hDljLl6a5YztqeffrreOXYBAIC6xBfLiS8AAMpLyaefip490Xsn5i6NhMa4ceNy75ynnnoqde7cuc7xt956a62L5H//+99pjz32SJ/73OfWccnLXyR7Ymj4f/7zn3z7gQceSFtvvXUONKrU7GVVJRb6i+30009Pxx57bO799KlPfSrttddeea7cCGpWR/RmOvroo/Pct/F512eXXXbJc9QOGjSoel/cjuCn6v5YpO+1117Lw99DBDI1RTnjexXfn1iMEABoYkZtVOICtEipQ4+U5j0eM/KXrhij5pbu3DRL4gvxBQA0WWKMQsUYJR+pEfOeDhkyJC/WFheWcTHavn37NH78+AYvXGNoc9X2pz/9KR8vqZHSokWL0syZM/MWw6i/8Y1v5F5FRx11VPUctTEUPOa4jWHSMV9tVS+pEMFJLMoX88JGMBIX+7GgX1zoh1gwLwKXOCZ6JkUPqt/+9reNWsivSgwvnzNnTvWcuiuK+XbjmJhDN54/vh+RyIq5eUMMW4+AKIKS6MUVC/XVDKKqemx17NgxHXPMMfn+GLoer+nUU09NL7/88mq9twAA0NyIL5YTXwAAlJeSJjVixMXUqVPzhWR1gVq0yLdjKHNjXHvttekLX/hCHhrd3E2cODH3LootRr1EwHDzzTfnKbtC9GKK3lERJPTs2TMHEOeee27141u2bJlHvgwcODBf2H/+85/Pc+aOHj0639+jR488X20Msz7ooIPy4nwjRoyoNf/s+1lvvfXSZptt1uD9/fv3T5dddlleuG+33XZL//u//5t7clW9hvh+RKAUAdK+++6b5+f9/ve/X+s5Isl1zz33pK222ip9+tOfzkHTiSeemOe81bMKAAAaR3yxnPgCAKC8VFRWVlaW6uSvvvpq6tatW774rbkY25lnnpkvbh966KGVPj7W3oiL6zguLkAb6l0UW5V58+blheDefPPNOhegcVE6ffr0tM022+T1PWgelixZklq3bp2KIL6j0TOsag0aYO2JRUpnz56dOnXqlBs8gDJyXsMNmOvCstQize6we+o074nUopRDw0f8u3Tnfu+6epNNNklz585t9g278V5stNFG9b4XVddvYozmI0Lsd999N6/xEVN2lTvfUVi3McasWbPydHZiDCgzJZ5+KmKMWR16pM7zHi9tjDFqbtleV5fVmhofRIzS2H333RtMaIQxY8ZU9wSqKRqq4uJtxcbt+A8mLkBjo3kEHEuXLs3/LkLAEd/L+I5Gj7eiJGKgqOK3Fv+JRj0h4IAyE3PNljjgmNu+e6pMFaUNOGbNKt25U0rz588v6fkBAIDmqaRJjZiXNIYkv/7667X2x+1YL2NlFixYkOduPe+881Z63PDhw/NC5CuO1Iiet/X1oorgLHrUxEbzUZQEQXwvo3E1htjrRQVrP6kRyU4jNaAM5cXzSpvUqEiVpR+p0blz6c6dkmsRAACgJEract+mTZvUq1evNGnSpDzXaVUjUtx+v8XhYi7XmFbqS1/60kqPa9u2bd5WFA1UKzZSxe1owKraaPqiB3bVZ12Ez7zqu1nf9xdY8/zeoFyVMJHwnkhqREKjpEmNEtdN6kYAAKAUSj4cIUZRDBo0KO299955Gqlx48blURiDBw/O98eicrHuRkwjteLUU5EIWdmicAAAAAAAQNNR8qTGgAED8voWI0aMSDNnzkw9e/ZMEydOTF26dMn3z5gxo04vsKeeeirdd9996Y9//GOJSg0AAAAAADS7pEaIqaYamm7q7rvvrrNvp512ytMGAQAAAAAAzUdZJDUAoI5RG5W4AC1S6tDjvQWJSzhn/qi5pTs3AAAAQJmxuh8AAAAAAFAIkhoAAAAAAEAhSGo0Ax/96EfTN7/5zdTcnHDCCal///6lLkbq3r17GjduXKOPHzVqVOrZs+daLRMAAKwu8UVpiS8AgObOmhrlOrf7Ks6hHhfY119/fZ39zzzzTLr11ltT69at09oQC7l/7GMfW+kxd911Vw58yk1V2TfeeOP06quvpvXWW6/6vocffjjtu++++d8WpQcAYI0TXzTp+GLGjBlpgw02qL5PfAEAsOZIajQhhx12WPrpT39aa1+nTp1Sy5Yt19o5999///Taa69V3z7ttNPSvHnzapVj0003TeVsww03TL/5zW/SF7/4xep91157bdpqq61yMAIAAM2R+GL144vbbrstfelLX6reJ74AAFhzTD/VhLRt2zZ17dq11hYBx4rDw2O48gUXXJC+/OUv5wvuuLi+6qqraj3XSy+9lD7/+c/nXkYRNBxzzDFp+vTpdc7Zpk2bWueL0Q41y/GFL3whnXnmmbUeE0O2o+fXmizP0qVL07Bhw/L9m222WT5nY3tARbBRM0j6z3/+k2666aY0aNCgOsf++te/Trvttlt+jVHuH/zgB7XunzVrVjrqqKPy+7DNNtukG264oc5zvPXWW+mkk07KAWGHDh3Sxz/+8fTYY481qqwAALCuiC9WL74YOHBgrVEu4gsAgDVLUqOZiovlvffeO/39739PJ598cvr617+ennrqqXzfkiVLUr9+/XIAcO+996b7778/D52OnlqLFy8uy/LE46+77ro0fvz4dN9996U33ngjj75ojOOOOy4/b1WvqQgsIqDYa6+9ah03derUHPhEIPXEE0/kuWnPPffcfN4qEUxFgBRD4m+55Zb0k5/8JAciNX3uc5/L+/7whz/k54zzfOITn8hlBgCAIhJf/Nfxxx+fHyO+AABYOyQ1mpDf//73+WK8aouL24YcccQR+eJ+++23T2eddVbq2LFjvlAOEyZMSMuWLUvXXHNN2n333dMuu+ySRzLERXnME7s2fNDyxEJ5w4cPT5/+9Kfz/VdeeWXaaKPGzVPcuXPndPjhh1cHDxG4RK+uFY0dOzYHBxFo7LjjjjnAGDp0aLrkkkvy/U8//XQOJK6++uq03377pV69euVh5tEzq0oEN1OmTEk333xzDrJ22GGHdOmll+YeYBGkAABAuRBfrH58EUkT8QUAwNohqdGExKJ0jz76aPX2ox/9qMFje/ToUf3vioqKPJS7qsdPDFV+9tlnc8+lqgAmhmQvXLgwPffcc7k3U83gpr4h0Kvqg5Rn7ty5ed7d3r17Vz9Hq1at8kV9Yw0ePDgHHc8//3yaPHlyHr2xoieffDIdcMABtfbF7VgsMYanx/1x3gg2quy88845oKgSr+Xtt9/OQ9hrvocvvPBCfi0AAFAuxBerH19ULbQuvgAAWPMsFN6ErL/++rknUmO0bt261u240I/eSiEuiuPCub5gIuZpjXluI6ip0qVLlwbP06JFizpzz8Zw7zVZnjUhRmp89atfTSeeeGKeszaCgrUhXsvmm29eb4+0msEJAACUmvhi9cVUVjFSRHwBALDmSWpQR8zBGkOyY9h0LDRXn8YGNxEURC+nKtHj6B//+Efu9bUmyxMX8g899FD6yEc+km+/++671fPJNkb0gIoF/S6++OI8xLs+Mew85tutKW7HUPFYMDF6TVWdd5999sn3x7y9sXBfzdcyc+bMfL6YVxcAAJq65hpfxNoaMZWU+AIAYM0y/RR1xNDomHP2mGOOyUPBY+hy9Pw59dRT08svv7xKz/Xxj3883X777XmbNm1aXqCv5kX4mirPaaedli688MJ022235fNEr6hVPc/555+fZs+enee/rc+3vvWtNGnSpHxczG8bw8l//OMfpzPOOCPfv9NOO+UeWTHiIwKgCD5OOumktN5661U/R9++fVOfPn1S//790x//+Mc0ffr09MADD6Tvfve76W9/+9sqlRcAAIpAfCG+AABYkyQ1qKN9+/bpnnvuSVtttVX1wngxbDrmmG2oJ1NDYkG8QYMG5VEQBx98cNp2221XqRdVY8sTAUH0hIpzxUV9zI/7qU99apXOE8PeI7iJoen1iV5Qv/rVr9JNN92UPvzhD6cRI0ak8847L8+XWyUWGNxiiy3ya42yfuUrX8k9wKrEc99xxx25x1es4xG9sL7whS+kF198caXD7AEAoKjEF+ILAIA1qaJyxQlJm7h58+aljTbaKC/+tuIFdFzERi+dbbbZJrVr165kZWTdia9/DOmO4doNBRvlxHeUZmXURiU9/bLUIs3q0CN1nvd4apGWz8FdEqPmlu7cUK7UD2VRP6zsurq5EWNQkxgDaEis7TNr1qycnIw1goAyIsYoVIyhBgUAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVKjHsuWlXCFeViJysrKUhcBAIDVIMagXPluAgBF06rUBSgnbdq0SS1atEivvvpq6tSpU75dUVFR6mKxlpME7777bmrVqlXZf9ZR1tmzZ+dytm7dutTFAQCgEcQYzU9RYowo5+LFi3OMEd/R+G4CABSBpEYNcSG3zTbbpNdeey0HHTR9cSEfPZPisy/ngKNKlPFDH/pQatmyZamLAgBAI4gxmp+ixRjt27dPW221VS4vNHmjNipxAVqk1KFHSvMej3FSpSvGqLmlOzfAGiCpsYLonRIXdNGzZunSpaUuDmtZBBv//ve/02abbVaIi/gYoSGhAQBQLGKM5qVIMUbEFuU+ogQAYEWSGvWomt7HFD/NI+CIz7ldu3ZlH3AAAFBcYozmQ4wBALB2ucICAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAJqUyy+/PHXv3j21a9cu9e7dO02ZMmWlx48bNy7ttNNOab311ktbbrllOv3009PChQvXWXkBAIACJTVWNeB466230imnnJI233zz1LZt27TjjjumO+64Y52VFwAAKF8TJkxIw4YNSyNHjkyPPPJI2mOPPVK/fv3SrFmz6j3+xhtvTGeffXY+/sknn0zXXnttfo7vfOc767zsAABAmSc1VjXgWLx4cTrkkEPS9OnT0y233JKeeuqpdPXVV6du3bqt87IDAADlZ+zYsWnIkCFp8ODBadddd01XXnllat++fRo/fny9xz/wwAPpgAMOSF/84hdzZ6tDDz00HXvsse/b2QoAACiNVqlMAo4QAcftt9+eA47oLbWi2P/GG2/kwKN169Z5XwQeAAAA0Qlq6tSpafjw4dX7WrRokfr27ZsmT55c72P233//9Itf/CInMfbdd9/0/PPP55Hgxx9/fIPnWbRoUd6qzJs3L/9dtmxZ3mje4jtQWVnpuwBlqbQTlixLLVJlqsh/S1sQ9RP/tf13zIATnm2nfiiH+qGx10+tihRw/O53v0t9+vTJ00/99re/TZ06dco9qs4666zUsmXLdVh6AACg3MyZMyctXbo0denSpdb+uD1t2rR6HxPxRDzuwAMPzA3R7777bvra17620umnxowZk0aPHl1n/+zZs63FQQ7G586dm79PEeMCZaRDj5KePhor57bvnhsuW6QSNhw2MEMKzdMum1SWughlYVZr9UM51A/z588v76TG6gQc0WvqL3/5SzruuONy76lnn302nXzyyWnJkiV5Cqv66EXFyuhFBeVML4nlBVE/QV3qhyL1oip3d999d7rgggvST37yk7zGX8QYp512Wjr//PPTueeeW+9jomNWTKNbM8aIBcaj01WHDh3WYekpR/HbqKioyN8HSQ0oM/MeL+np49qhIlWmTvOeKG2jZefOpTs3ZefJNytKXYSy0Lmd+qEc6odYd7vsp59anYvDzp07p6uuuiqPzOjVq1d65ZVX0iWXXNJgUkMvKlZGLyooY3pRLacXFdSlfihUL6p1qWPHjjlOeP3112vtj9tdu3at9zGRuIippk466aR8e/fdd08LFixIX/nKV9J3v/vdeq8R27Ztm7cVxbGuKQmR1PB9gHJU+oR8NFrG9UNJryHUTdSwLElqhJL+Jt+jfkiNvnZqVaSAY/PNN89radScamqXXXZJM2fOzNNZtWnTps5j9KJiZfSigjKmF9VyelFBXeqHQvWiWpciHoiOT5MmTUr9+/evvt6L20OHDq33Me+8806d68CqeCM6vgAAAOWlVZECjgMOOCDdeOON+biqwOPpp5/OyY76EhpBLyrej15UUK70ksjUTVAP9UORelGta9GhadCgQWnvvffOC3+PGzcuj7wYPHhwvn/gwIGpW7dueUR3OOqoo9LYsWPTnnvuWT39VIzeiP3W7QMAgPLTqkgBx9e//vX04x//OM9x+41vfCM988wzef7bU089tZQvAwAAKBMDBgzIU82OGDEij+ju2bNnmjhxYvVafjNmzKiVkDnnnHNyJ5f4G1PbxgjeSGh8//vfL+GrAAAAyjKpsaoBR0wbdeedd6bTTz899ejRIyc8IsFx1llnlfBVAAAA5SRGfjc0+jsWBq+pVatWeX2+htboAwAAykurIgUcoU+fPunBBx9cByUDAAAAAADKSXlOhAsAAAAAALACSQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACiEVqUuAM3cqI1KXIAWKXXokdK8x1NKy0pXjFFzS3duAAAAAICCMFIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCaFXqAgAAAACw9nQ/+/ZSF6EsTG9X6hIAsCYYqQEAAAAAABSCpAYAAAAAAFAIkhoAAAAAAEAhSGoAAAAAAACFIKkBAAAAAAAUgqQGAAAAAABQCJIaAAAAAABAIUhqAAAAAAAAhVAWSY3LL788de/ePbVr1y717t07TZkypcFjr7vuulRRUVFri8cBAAAAAABNW8mTGhMmTEjDhg1LI0eOTI888kjaY489Ur9+/dKsWbMafEyHDh3Sa6+9Vr29+OKL67TMAAAAAABAM0xqjB07Ng0ZMiQNHjw47brrrunKK69M7du3T+PHj2/wMTE6o2vXrtVbly5d1mmZAQAAAACAZpbUWLx4cZo6dWrq27fvfwvUokW+PXny5AYf9/bbb6ett946bbnllumYY45J//znP9dRiQEAAAAAgFJpVbIzp5TmzJmTli5dWmekRdyeNm1avY/Zaaed8iiOHj16pLlz56ZLL7007b///jmx8aEPfajO8YsWLcpblXnz5uW/y5YtyxvNe7DQstQiVaaK/Le0BfFdhLrUD8sLon6AutQP5VA/uJYGAACaXVJjdfTp0ydvVSKhscsuu6T//d//Teeff36d48eMGZNGjx5dZ//s2bPTwoUL13p5eR8depT09NEYMbd999ww0SKVMDBfyRoy0GypH5ZTP1DDidc/XOoilIVr1Q9lUT/Mnz+/pOcHAACap5ImNTp27JhatmyZXn/99Vr743asldEYrVu3TnvuuWd69tln671/+PDheSHymiM1YtqqTp065QXHKbF5j5e8UaIiVaZO854obaNE586lOzeUK/XDcuoHanjyzYpSF6EsdG6nfiiH+qFdu3YlPT8AANA8lTSp0aZNm9SrV680adKk1L9//+ph7HF76NChjXqOmL7qiSeeSEcccUS997dt2zZvK4q1O2Kj1Eo/bUE0SkSDREkbJXwXoR7qh0z9QA3LkqRGKOlv8j3qh+XX0wAAAM1u+qkYRTFo0KC09957p3333TeNGzcuLViwIA0ePDjfP3DgwNStW7c8jVQ477zz0n777Ze233779NZbb6VLLrkkvfjii+mkk04q8SsBAAAAAACadFJjwIABeX2LESNGpJkzZ6aePXumiRMnVi8ePmPGjFq9wN588800ZMiQfOwmm2ySR3o88MADaddddy3hqwAAAAAAAJp8UiPEVFMNTTd1991317r9wx/+MG8AAAAAAEDzYiJcAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAAqhVakLAEBt3c++vdRFKAvT25W6BAAAAACUGyM1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAACalMsvvzx17949tWvXLvXu3TtNmTJlpce/9dZb6ZRTTkmbb755atu2bdpxxx3THXfcsc7KCwAANF6rVTgWAACgrE2YMCENGzYsXXnllTmhMW7cuNSvX7/01FNPpc6dO9c5fvHixemQQw7J991yyy2pW7du6cUXX0wbb7xxScoPAACsnKQGAADQZIwdOzYNGTIkDR48ON+O5Mbtt9+exo8fn84+++w6x8f+N954Iz3wwAOpdevWeV+M8gAAAMqTpAYAANAkxKiLqVOnpuHDh1fva9GiRerbt2+aPHlyvY/53e9+l/r06ZOnn/rtb3+bOnXqlL74xS+ms846K7Vs2bLexyxatChvVebNm5f/Llu2LG80b/EdqKys9F2grLRIlaUuQllYVuJZ2OP8lami5OVI6idqUD8sV+rfpfphucZeP0lqAAAATcKcOXPS0qVLU5cuXWrtj9vTpk2r9zHPP/98+stf/pKOO+64vI7Gs88+m04++eS0ZMmSNHLkyHofM2bMmDR69Og6+2fPnp0WLly4hl4NRRXB+Ny5c3NiI5JqUA522USjZZjVukdJzx+NlXPbd88Nly1SCRsOZ80q3bkpO+qH5dQP5VE/zJ8/v1HHSWoAAADNugE61tO46qqr8siMXr16pVdeeSVdcsklDSY1YiRIrNtRc6TGlltumUd5dOjQYR2WnnL9TlVUVOTvg6QG5eLJNytKXYSy0Lnd4yVvtKxIlanTvCdK22hZzxpTNF/qh+XUD+VRP7Rr165Rx0lqAAAATULHjh1zYuL111+vtT9ud+3atd7HbL755nktjZpTTe2yyy5p5syZeTqrNm3a1HlM27Zt87aiaMDWiE2IpIbvA+VkWdJoGUraUPieaLSMcpS0LOomalA/LKd+KI/6obHXTmoxAACgSYgERIy0mDRpUq1e83E71s2ozwEHHJCnnKo5f+/TTz+dkx31JTQAAIDSktQAAACajJgW6uqrr07XX399evLJJ9PXv/71tGDBgjR48OB8/8CBA2stJB73v/HGG+m0007LyYzbb789XXDBBXnhcAAAoPyYfgoAAGgyBgwYkBfsHjFiRJ5CqmfPnmnixInVi4fPmDGj1rD2WAvjzjvvTKeffnrq0aNH6tatW05wnHXWWSV8FQAAQEMkNQAAgCZl6NCheavP3XffXWdfTE314IMProOSAQAAH5TppwAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCaFXqAgAAAMAaM2qjEhegRUodeqQ07/GU0rLSFWPU3NKdGwBgLTJSAwAAAAAAKISySGpcfvnlqXv37qldu3apd+/eacqUKY163E033ZQqKipS//7913oZAQAAAACAZp7UmDBhQho2bFgaOXJkeuSRR9Iee+yR+vXrl2bNmrXSx02fPj2dccYZ6aCDDlpnZQUAAAAAAJpxUmPs2LFpyJAhafDgwWnXXXdNV155ZWrfvn0aP358g49ZunRpOu6449Lo0aPTtttuu07LCwAAAAAAFHCh8MWLF6cXXnghbbfddqlVq1ar9fipU6em4cOHV+9r0aJF6tu3b5o8eXKDjzvvvPNS586d04knnpjuvffelZ5j0aJFeasyb968/HfZsmV5o3nn1ZalFqkyVeS/pS2I7yL/Fd9Klv8+S31+9QPlRv2wXKl/l+qHqtOv+fN/0PgCAABo+lYrUnjnnXfSN77xjXT99dfn208//XQeMRH7unXrls4+++xGPc+cOXPyqIsuXbrU2h+3p02bVu9j7rvvvnTttdemRx99tFHnGDNmTB7RsaLZs2enhQsXNuo5WIs69Cjp6aMxYm777rlhokUqYcPA+0y3RvOyyyYaLcOs1uqHTP1ADeqH5dQP5VE/zJ8/f40915qKLwAAgKZvtZIaMbLiscceS3fffXc67LDDqvfHCItRo0attaAjAqfjjz8+XX311aljx46NLmus2VFzpMaWW26ZOnXqlDp06LBWyskqmPd4yRslKlJl6jTvidI2SnTuXLpzU3aefLOi1EUoC53bqR8y9QM1qB+WUz+UR/3Qrl27NfZcpYovAACAZpLUuO222/IC3/vtt1+qqPhvcL3bbrul5557rtHPE4mJli1bptdff73W/rjdtWvXOsfHc8cC4UcddVSdYe8xPP2pp57KQ9Vratu2bd5WFNNcxUaplX5alWiUiAaJkjZK+C5Sw7Kk0TKU9Df5HvUD5Ub9sJz6oTzqhzV5Lb2m4gsAAKDpW61IJKZuijUtVrRgwYJaQcj7adOmTerVq1eaNGlSrSRF3O7Tp0+d43feeef0xBNP5Kmnqrajjz46fexjH8v/jhEYAABAsayp+AIAAGj6Viupsffee6fbb7+9+nZVoHHNNdfUm4xYmZgaKqaTivlzn3zyyfT1r389By+DBw/O9w8cOLB6IfEY4v7hD3+41rbxxhunDTfcMP87kiQAAECxrMn4AgAAaNpWa/qpCy64IB1++OHpX//6V3r33XfTZZddlv/9wAMPpL/+9a+r9FwDBgzIPbNGjBiRZs6cmXr27JkmTpxYvXj4jBkzTBMFAABN2JqMLwAAgKZttbIFBx54YF7ILwKO3XffPf3xj3/Mw8UnT56cp5NaVUOHDk0vvvhiWrRoUXrooYdS7969q++LxQKvu+66Bh8b98UcvAAAQDGt6fgCAABoulZ5pMaSJUvSV7/61XTuuefmaaMAAABWl/gCAABYqyM1WrdunX7961+v6sMAAADqEF8AAABrffqp/v37m/IJAABYI8QXAADAWl0ofIcddkjnnXdeuv/++/Mct+uvv36t+0899dTVeVoAAKAZEl8AAABrNalx7bXXpo033jhNnTo1bzVVVFQIOgAAgEYTXwAAAGs1qfHCCy+szsMAAADqEF8AAABrdU2NmiorK/MGAADwQYkvAACAtZLU+NnPfpZ23333tN566+WtR48e6ec///nqPh0AANCMiS8AAIC1Nv3U2LFj07nnnpuGDh2aDjjggLzvvvvuS1/72tfSnDlz0umnn746TwsAADRD4gsAAGCtJjX+53/+J11xxRVp4MCB1fuOPvrotNtuu6VRo0YJOgAAgEYTXwAAAGt1+qnXXnst7b///nX2x764DwAAoLHEFwAAwFpNamy//fbpV7/6VZ39EyZMSDvssMPqPCUAANBMiS8AAIC1Ov3U6NGj04ABA9I999xTPeft/fffnyZNmlRvMAIAANAQ8QUAALBWR2p85jOfSQ899FDq2LFjuu222/IW/54yZUr61Kc+tTpPCQAANFPiCwAAYK2O1Ai9evVKv/jFL1b34QAAANXEFwAAwFobqXHHHXekO++8s87+2PeHP/xhdZ4SAABopsQXAADAWk1qnH322Wnp0qV19ldWVub7AAAAGkt8AQAArNWkxjPPPJN23XXXOvt33nnn9Oyzz67OUwIAAM2U+AIAAFirSY2NNtooPf/883X2R8Cx/vrrr85TAgAAzZT4AgAAWKtJjWOOOSZ985vfTM8991ytgONb3/pWOvroo1fnKQEAgGZKfAEAAKzVpMbFF1+ce0zFcPBtttkmb/HvzTbbLF166aWr85QAAEAzJb4AAAAaq1VazeHhDzzwQPrTn/6UHnvssbTeeuulPfbYIx100EGr83QAAEAzJr4AAADWykiNyZMnp9///vf53xUVFenQQw9NnTt3zr2nPvOZz6SvfOUradGiRavylAAAQDMlvgAAANZqUuO8885L//znP6tvP/HEE2nIkCHpkEMOSWeffXb6v//7vzRmzJhVLgQAAND8iC8AAIC1mtR49NFH0yc+8Ynq2zfddFPad99909VXX52GDRuWfvSjH6Vf/epXq1wIAACg+RFfAAAAazWp8eabb6YuXbpU3/7rX/+aDj/88Orb++yzT3rppZdWuRAAAEDzI74AAADWalIjAo4XXngh/3vx4sXpkUceSfvtt1/1/fPnz0+tW7de5UIAAADNj/gCAABYq0mNI444Is9te++996bhw4en9u3bp4MOOqj6/scffzxtt912q1wIAACg+RFfAAAAq6rVqhx8/vnnp09/+tPp4IMPThtssEG6/vrrU5s2barvHz9+fDr00ENXuRAAAEDzI74AAADWalKjY8eO6Z577klz587NQUfLli1r3X/zzTfn/QAAAO9HfAEAAKzVpEaVjTbaqN79m2666eo8HQAA0IyJLwAAgLWypgYAAAAAAECpSGoAAAAAAACFIKkBAAAAAAAUgqQGAAAAAABQCJIaAAAAAABAIUhqAAAAAAAAhSCpAQAAAAAAFIKkBgAAAAAAUAiSGgAAAAAAQCFIagAAAAAAAIUgqQEAAAAAABSCpAYAAAAAAFAIkhoAAAAAAEAhSGoAAAAAAACFIKkBAAA0KZdffnnq3r17ateuXerdu3eaMmVKox530003pYqKitS/f/+1XkYAAKDASY1VCTpuvfXWtPfee6eNN944rb/++qlnz57p5z//+TotLwAAUJ4mTJiQhg0blkaOHJkeeeSRtMcee6R+/fqlWbNmrfRx06dPT2eccUY66KCD1llZAQCAAiY1VjXo2HTTTdN3v/vdNHny5PT444+nwYMH5+3OO+9c52UHAADKy9ixY9OQIUNyjLDrrrumK6+8MrVv3z6NHz++wccsXbo0HXfccWn06NFp2223XaflBQAAVk2rVEZBR4ig4/bbb89Bx9lnn13n+I9+9KO1bp922mnp+uuvT/fdd19OhhRF97NvL3URysL0dqUuAQAATcXixYvT1KlT0/Dhw6v3tWjRIvXt2zd3imrIeeedlzp37pxOPPHEdO+9977veRYtWpS3KvPmzct/ly1bljead9+9ZalFqkwV+W9pC+K7yH/Ft5Llv89Sn1/9QLlRPyxX6t+l+mG5xl5Ltypi0FGlsrIy/eUvf0lPPfVUuuiiiwoVcKgwliv1D1WFQTlSPyxX6t+l+oFypH5YrtS/S/VD1enLr36aM2dOHnXRpUuXWvvj9rRp0+p9THSOuvbaa9Ojjz7a6POMGTMmj+pY0ezZs9PChQtXo+SsUR16lPT0UTfMbd891xMtUgl/J+8z5RrNyy6buIYIs1qrHzL1AzWoH5ZTP5RH/TB//vzyT2qsTtAR5s6dm7p165aTFS1btkw/+clP0iGHHFKogEOFsZwK4z0uKKhB/bCc+uE96gdqUD8sp34oVsBRzuI1HH/88enqq69OHTt2bPTjolNWTKFbs+PUlltumTp16pQ6dOiwlkpLo817vOR1REWqTJ3mPVHaOqJz59Kdm7Lz5JsVpS5CWejcTv2QqR+oQf2wnPqhPOqHWHO7ENNPrY4NN9ww96R6++2306RJk3JAEXPfrjg1VTkHHCqM5VQY73FBQQ3qh+XUD+9RP1CD+mE59UOxAo51KRIT0enp9ddfr7U/bnft2rXO8c8991xeIPyoo46qMwKlVatWeUT4dtttV+dxbdu2zduKYtR5bJRa6UcRRR0R9UNJ6wjfRWpYllxDhJL+Jt+jfqDcqB+WUz+UR/3Q2GvpVkUKOmq+uO233z7/u2fPnunJJ5/MIzLqS2qUa8ChwlhOhfEeFxTUoH5YTv3wHvUDNagfllM/FCvgWJfatGmTevXqlTs+9e/fvzpJEbeHDh1a5/idd945PfHEE7X2nXPOOXkEx2WXXZY7QwEAAOWlVZGCjobEY2qumwEAADRPMUp70KBBae+990777rtvGjduXFqwYEEaPHhwvn/gwIF5KtvoFBWjTT784Q/XevzGG2+c/664HwAAKA+tihR0hPgbx8Yw8Ehk3HHHHennP/95uuKKK0r8SgAAgFIbMGBAXj9vxIgRaebMmXlk98SJE6vX8ZsxY0ZZjjIBAAAKktRY1aAjEh4nn3xyevnll9N6662Xh4z/4he/yM8DAAAQo74bGvl99913r/Sx11133VoqFQAA0CSSGqsadHzve9/LGwAAAAAA0LwYdw0AAAAAABSCpAYAAAAAAFAIkhoAAAAAAEAhSGoAAAAAAACFIKkBAAAAAAAUgqQGAAAAAABQCJIaAAAAAABAIUhqAAAAAAAAhSCpAQAAAAAAFIKkBgAAAAAAUAiSGgAAAAAAQCFIagAAAAAAAIUgqQEAAAAAABSCpAYAAAAAAFAIkhoAAAAAAEAhSGoAAAAAAACFIKkBAAAAAAAUgqQGAAAAAABQCJIaAAAAAABAIUhqAAAAAAAAhSCpAQAAAAAAFIKkBgAAAAAAUAiSGgAAAAAAQCFIagAAAAAAAIUgqQEAAAAAABSCpAYAAAAAAFAIkhoAAAAAAEAhSGoAAAAAAACFIKkBAAAAAAAUgqQGAAAAAABQCJIaAAAAAABAIUhqAAAAAAAAhSCpAQAAAAAAFIKkBgAAAAAAUAitSl0AAAAAPrjuZ99e6iKUhentSl0CAADWJiM1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAohLJIalx++eWpe/fuqV27dql3795pypQpDR579dVXp4MOOihtsskmeevbt+9KjwcAAAAAAJqGkic1JkyYkIYNG5ZGjhyZHnnkkbTHHnukfv36pVmzZtV7/N13352OPfbYdNddd6XJkyenLbfcMh166KHplVdeWedlBwAAAAAAmlFSY+zYsWnIkCFp8ODBadddd01XXnllat++fRo/fny9x99www3p5JNPTj179kw777xzuuaaa9KyZcvSpEmT1nnZAQAAAACAdadVKqHFixenqVOnpuHDh1fva9GiRZ5SKkZhNMY777yTlixZkjbddNN671+0aFHeqsybNy//jURIbKXSIlWW7NzlZFmJ82px/spUUfJypBJ+Fyk/6oflSv27VD9QjtQPy5X6d6l+qDq9+gkAAGhmSY05c+akpUuXpi5dutTaH7enTZvWqOc466yz0hZbbJETIfUZM2ZMGj16dJ39s2fPTgsXLkylsssmGiXCrNY9Snr+aIyY2757bphokUoYmDcw3RrNk/phOfXDe9QP1KB+WE79UB71w/z580t6fgAAoHkqaVLjg7rwwgvTTTfdlNfZiEXG6xOjQGLNjpojNWIdjk6dOqUOHTqkUnnyzYqSnbucdG73eMkbJSpSZeo074nSNkp07ly6c1N21A/LqR/eo36gBvXDcuqH8qgfGrr+BgAAaLJJjY4dO6aWLVum119/vdb+uN21a9eVPvbSSy/NSY0///nPqUePhnvrtW3bNm8rimmuYiuVZUmjRChpQ8B7olEiylHSspTwu0j5UT8sp354j/qBGtQPy6kfyqN+KOW1NAAA0HyVNBJp06ZN6tWrV61FvqsW/e7Tp0+Dj7v44ovT+eefnyZOnJj23nvvdVRaAAAAAACgWU8/FVNDDRo0KCcn9t133zRu3Li0YMGCNHjw4Hz/wIEDU7du3fLaGOGiiy5KI0aMSDfeeGPq3r17mjlzZt6/wQYb5A0AAAAAAGiaSp7UGDBgQF60OxIVkaDo2bNnHoFRtXj4jBkzag1tv+KKK9LixYvTZz/72VrPM3LkyDRq1Kh1Xn4AAAAAAKCZJDXC0KFD81afWAS8punTp6+jUgEAAAAAAOXE6n4AAAAAAEAhSGoAAAAAAACFIKkBAAAAAAAUgqQGAAAAAABQCJIaAAAAAABAIUhqAAAAAAAAhSCpAQAAAAAAFIKkBgAAAAAAUAiSGgAAAAAAQCFIagAAAAAAAIUgqQEAAAAAABSCpAYAAAAAAFAIkhoAAAAAAEAhSGoAAAAAAACFIKkBAAAAAAAUgqQGAAAAAABQCJIaAAAAAABAIUhqAAAATcrll1+eunfvntq1a5d69+6dpkyZ0uCxV199dTrooIPSJptskre+ffuu9HgAAKC0JDUAAIAmY8KECWnYsGFp5MiR6ZFHHkl77LFH6tevX5o1a1a9x999993p2GOPTXfddVeaPHly2nLLLdOhhx6aXnnllXVedgAA4P1JagAAAE3G2LFj05AhQ9LgwYPTrrvumq688srUvn37NH78+HqPv+GGG9LJJ5+cevbsmXbeeed0zTXXpGXLlqVJkyat87IDAADvr1UjjgEAACh7ixcvTlOnTk3Dhw+v3teiRYs8pVSMwmiMd955Jy1ZsiRtuummDR6zaNGivFWZN29e/hvJkNhKpUWqLNm5y8myEvfdi/NXpoqSlyOV8LtI+VE/LFfq36X6gXKkfliu1L9L9cNyjb2WltQAAACahDlz5qSlS5emLl261Noft6dNm9ao5zjrrLPSFltskRMhDRkzZkwaPXp0nf2zZ89OCxcuTKWyyyYaJcKs1j1Kev5ojJjbvntumGiRStgw0MCUazRP6ofl1A/vUT9Qg/phOfVDedQP8+fPb9RxkhoAAAAppQsvvDDddNNNeZ2NWGS8ITESJNbtqDlSI9bi6NSpU+rQoUMqlSffrCjZuctJ53aPl7xRoiJVpk7znihto0TnzqU7N2VH/bCc+uE96gdqUD8sp34oj/phZdfgNUlqAAAATULHjh1Ty5Yt0+uvv15rf9zu2rXrSh976aWX5qTGn//859Sjx8p76rVt2zZvK4qprmIrlWVJo0QoaUPAe6JRIspR0rKU8LtI+VE/LKd+eI/6gRrUD8upH8qjfmjstbRaDAAAaBLatGmTevXqVWuR76pFv/v06dPg4y6++OJ0/vnnp4kTJ6a99957HZUWAABYHUZqAAAATUZMCzVo0KCcnNh3333TuHHj0oIFC9LgwYPz/QMHDkzdunXL62KEiy66KI0YMSLdeOONqXv37mnmzJl5/wYbbJA3AACgvEhqAAAATcaAAQPygt2RqIgERc+ePfMIjKrFw2fMmFFrWPsVV1yRFi9enD772c/Wep6RI0emUaNGrfPyAwAAKyepAQAANClDhw7NW31iEfCapk+fvo5KBQAArAnW1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAohJInNS6//PLUvXv31K5du9S7d+80ZcqUBo/95z//mT7zmc/k4ysqKtK4cePWaVkBAAAAAIBmmtSYMGFCGjZsWBo5cmR65JFH0h577JH69euXZs2aVe/x77zzTtp2223ThRdemLp27brOywsAAAAAADTTpMbYsWPTkCFD0uDBg9Ouu+6arrzyytS+ffs0fvz4eo/fZ5990iWXXJK+8IUvpLZt267z8gIAAAAAAKXTqlQnXrx4cZo6dWoaPnx49b4WLVqkvn37psmTJ6+x8yxatChvVebNm5f/Llu2LG+l0iJVluzc5WRZiWdAi/NXpoqSlyOV8LtI+VE/LFfq36X6gXKkfliu1L9L9UPV6dVPAABAM0pqzJkzJy1dujR16dKl1v64PW3atDV2njFjxqTRo0fX2T979uy0cOHCVCq7bKJRIsxq3aOk54/GiLntu+eGiRaphIF5A1Ou0TypH5ZTP7xH/UAN6ofl1A/lUT/Mnz+/pOcHAACap5IlNdaVGAkS63bUHKmx5ZZbpk6dOqUOHTqUrFxPvllRsnOXk87tHi95o0RFqkyd5j1R2kaJzp1Ld27KjvphOfXDe9QP1KB+WE79UB71Q7t27Up6fgAAoHkqWVKjY8eOqWXLlun111+vtT9ur8lFwGPtjfrW34iprmIrlWVJo0QoaUPAe6JRIspR0rKU8LtI+VE/LKd+eI/6gRrUD8upH8qjfijltTQAANB8lSwSadOmTerVq1eaNGlSrXl543afPn1KVSwAAAAAAKBMlXT6qZgWatCgQWnvvfdO++67bxo3blxasGBBGjx4cL5/4MCBqVu3bnldjKrFxf/1r39V//uVV15Jjz76aNpggw3S9ttvX8qXAgAAAAAANOWkxoABA/KC3SNGjEgzZ85MPXv2TBMnTqxePHzGjBm1hrW/+uqrac8996y+femll+bt4IMPTnfffXdJXgMAAAAAANBMFgofOnRo3uqzYqKie/fuqbKych2VDAAAAAAAKCdW9wMAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKARJDQAAAAAAoBAkNQAAAAAAgEKQ1AAAAAAAAApBUgMAAAAAACgESQ0AAAAAAKAQJDUAAAAAAIBCkNQAAAAAAAAKQVIDAAAAAAAoBEkNAAAAAACgECQ1AAAAAACAQpDUAAAAAAAACkFSAwAAAAAAKISySGpcfvnlqXv37qldu3apd+/eacqUKSs9/uabb04777xzPn733XdPd9xxxzorKwAAUN7EFwAA0HSVPKkxYcKENGzYsDRy5Mj0yCOPpD322CP169cvzZo1q97jH3jggXTsscemE088Mf39739P/fv3z9s//vGPdV52AACgvIgvAACgaSt5UmPs2LFpyJAhafDgwWnXXXdNV155ZWrfvn0aP358vcdfdtll6bDDDkvf/va30y677JLOP//8tNdee6Uf//jH67zsAABAeRFfAABA09aqlCdfvHhxmjp1aho+fHj1vhYtWqS+ffumyZMn1/uY2B89r2qKnle33XZbvccvWrQob1Xmzp2b/7711ltp2bJlqWQWLSjducvIWxUVJT3/slSR5i1cmtosqkgtUgnL8tZbpTs35Uf9kKkf3qN+oCb1Q6Z+KI/6Yd68eflvZWVlKhfrIr4IYozypo54j2sIalI/ZOqH96gfqEn9kKkfihVjlDSpMWfOnLR06dLUpUuXWvvj9rRp0+p9zMyZM+s9PvbXZ8yYMWn06NF19m+99dYfqOysGZukcnB/qQuQ0oXl8U5AOSmPX4X6AcpRefwq1A9V5s+fnzbaaKNUDtZFfBHEGOWtPH4Z6ggoR+Xxq1A/QDkqj1+F+qGxMUZJkxrrQvTSqtnzKnpOvfHGG2mzzTZLFSXOwFF6kf3bcsst00svvZQ6dOhQ6uIAZUT9ADRE/ZCqe09FsLHFFluk5kaMwcqoI4CGqB+AhqgfVi3GKGlSo2PHjqlly5bp9ddfr7U/bnft2rXex8T+VTm+bdu2eatp4403/sBlp2mJyqI5VxhAw9QPQEPUD6lsRmisy/giiDFoDHUE0BD1A9AQ9UNqVIxR0oXC27Rpk3r16pUmTZpUq5dT3O7Tp0+9j4n9NY8Pf/rTnxo8HgAAaB7EFwAA0PSVfPqpGLY9aNCgtPfee6d99903jRs3Li1YsCANHjw43z9w4MDUrVu3PG9tOO2009LBBx+cfvCDH6Qjjzwy3XTTTelvf/tbuuqqq0r8SgAAgFITXwAAQNNW8qTGgAED0uzZs9OIESPyYnw9e/ZMEydOrF6sb8aMGalFi/8OKNl///3TjTfemM4555z0ne98J+2www7ptttuSx/+8IdL+Cooqpg2YOTIkXWmDwBQPwANUT+UN/EFpaaOABqifgAaon5YNRWVsfoGAAAAAABAmSvpmhoAAAAAAACNJakBAAAAAAAUgqQGAAAAAABQCJIaAAAAsAaNGjUqL1Jf5YQTTkj9+/cvaZmA8qB+ABqifmg8SQ1Ywb///e9cieyzzz6pU6dOaauttkpHHnlkuummm1JlZWWd47///e+n/fffP7Vv3z5tvPHGJSkzUJ71w9FHH52PadeuXdp8883T8ccfn1599dWSlB0or/qhe/fuqaKiotZ24YUXlqTsQOksWbIkXXXVValv376pW7duqWvXrjm2uPTSS9M777xT5/hbb701HXrooWmzzTbL9cajjz5aknID5Vc/xHXIzjvvnNZff/20ySab5Mc99NBDJSk7UF71wwknnFAn9jjssMNSkUlq0GxEg8K7775bZ//ixYur//3HP/4x7bjjjunhhx9OZ5xxRr4dgcMnP/nJdP7556d+/fqlBQsW1Hn85z73ufT1r399nbwOoDj1w8c+9rH0q1/9Kj311FPp17/+dXruuefSZz/72XXymoDyrh/Ceeedl1577bXq7Rvf+MZafz1A7d9vKT3//PNpr732Spdffnm+Prj55ptz/fHNb34zTZo0Ke22227p6aefrvWYqEsOPPDAdNFFF5Ws3NCUFbl+iGuRH//4x+mJJ55I9913X+5AEUnQ2bNnl+x1QFNS5PohRBKjZuzxy1/+MhVaJRTEwQcfXDl06NDK0047rXLjjTeu7Ny5c+VVV11V+fbbb1eecMIJlRtssEHldtttV3nHHXfk4++6667oFplv77XXXpWtW7fO++J5TjnllPw8m222WeVHP/rRfPzDDz9cuemmm1b+7ne/q/f8S5YsqRw8eHDlUUcdVe/9P/3pTys32mijtfgOAEWtH6r89re/rayoqKhcvHjxWngXgCLVD1tvvXXlD3/4w3XwDgD1/X6feOKJysMOO6xy/fXXz/XCl770pcrZs2dXP2bp0qWVF110Ua4f2rRpU7nllltWfu9736u+/8wzz6zcYYcdKtdbb73KbbbZpvKcc86p9f/7yJEjK/fYY4/q24MGDao85phjqm+/9dZbldtvv33lueeeW7ls2bJ6yx11VdQVb7zxRp37XnjhhVxX/f3vf18j7xE0V02xfqgyd+7cXE/8+c9//kDvETRXTal+GLTC8zQFRmpQKNdff33q2LFjmjJlSu7NGKMjYpREDLF65JFHci+EmN6l5lCrs88+O0/n8OSTT6YePXpUP0+bNm3S/fffn6688sq8L54vppI66qij0r/+9a908MEH5+kjPv/5z6dhw4aliy++OB8b9911110lew+AYtYPb7zxRrrhhhtyeVq3br2O3hWgnOuHeP6YQmbPPfdMl1xySb0jQoA1o+bvN357H//4x/Nv729/+1uaOHFiev311/Pvtsrw4cPzceeee27+/d54442pS5cu1fdvuOGG6brrrsv3XXbZZenqq69OP/zhDxtdnnjuXr165RFbc+fOTccdd1z11BE/+tGP0uGHH56GDBmSDjrooDRu3Lg1/n4ATbt+iB7lMTXNRhttlPbYY48P+A5B89WU6oe77747de7cOe200045Horpcwut1FkVWJUM6YEHHlh9+913382Z0eOPP75632uvvZZ7IkyePLm6p+Vtt91W53n23HPPWvuefvrpyq5du+belPG8O+64Y+VXvvKV3PPpRz/6UWWrVq1ytjREFvWss86qUz4jNaB0yrl+iJ4Y7du3z+fbb7/9KufMmbOW3gWgSPXDD37wg3yuxx57rPKKK67Io0hOP/30tfhOQPO14u/3/PPPrzz00ENrHfPSSy/l3/5TTz1VOW/evMq2bdtWXn311Y0+xyWXXFLZq1evRve07NatW+7tGb785S9X9unTp/LBBx/Mo76iXokyh+hh3bt37zrnM1ID1oymVj/83//9X77OidHhW2yxReWUKVNW6f0Ammb98Mtf/jLPHPH4449X/uY3v6ncZZddKvfZZ58cwxRVq1InVWBVVPWUDC1btsy9G3fffffqfVXZz1mzZqUOHTrkf++99951nieymjXFnJOxsGerVq1ytvSVV17Jc1FGb+qePXum3/3ud9XHxmK/jz322Fp5fUDTqx++/e1vpxNPPDG9+OKLafTo0WngwIHp97//fV6YC2i+9UOM4qhZvugB9tWvfjWNGTMmtW3bdo29dqDu7zd+izFyaoMNNqhzXKx/9dZbb6VFixalT3ziEw0+34QJE3KPyDj+7bffziOtquqP9xOjN+fPn58+/OEP59v/93//l2677bbUu3fvfHvo0KHpT3/6U3Xd8eabb67y6wWaZ/0Qa/o9+uijac6cObkHePQgj8XCo3c20Hzrhy984QvV/444KOKP7bbbLo/eWFl5y5nppyiUFadsiUbBmvuqGgmXLVtWvW/99dev8zwr7otKZL311qsephnPWfN5a1ZYMU3F9ttvv0ZeD9D064eY8iYW7TvkkEPSTTfdlO6444704IMPfoBXCjSV+qGmCEbi+aZPn76Krw5ojJq/32hEiCnjouGv5vbMM8+kj3zkI9W/64ZMnjw5T/dwxBFH5I4Kf//739N3v/vdRi8gGr/1du3aVd+Ox9Usn9gD1q2mVD/EsbFvv/32S9dee23ueBF/gdXTlOqHmrbddtvcVvHss8+mopLUgJTyDz16W4aYWy4aJKKn5dKlS3Pj45133pmWLFmS59L7wx/+kE444YRSFxkoYP1Q1WAavTeA4luT9UMERC1atNCTEtaBvfbaK/3zn/9M3bt3z7/jmls0Duywww65YWLSpEn1Pv6BBx5IW2+9dW6IiFFdcXyMyGysaESIhoiYhzsceOCBef2d//znP3nEV/SurjpPnKPmyC5g7Wpq9UPEH2IPWDOaUv3w8ssv5zU1YkRHUUlqQEp5kZ+oBGIYWVRAsWjPiBEj8vQPgwcPTv37908XXXRR+ulPf5r++Mc/5gVAq8yYMSM3RMTfaMSoytRGBhdovvVDDPOOxs2oD+JC5S9/+Us69thj8xDPPn36lPplASWsH6KXVizcF0PYn3/++XTDDTek008/PX3pS19Km2yySalfFjR5p5xySp7CIf5ffvjhh/MUEJGEjN9tXM9HL8izzjornXnmmelnP/tZvj8SlVW9naMRIq79YwRm3BfTSPzmN79p9PkjgXn00Uenn/zkJ/l2LBQavTWjh2VMCRGjO//617+mL3/5y/m+mtNCRLnj2iKmvAtPPfVUvj1z5sw1/j5Bc1TU+mHBggXpO9/5Ti5LxB5Tp07Nx0RD5+c+97m19G5B81LU+uHtt9/O02JHWWJUeCRdjjnmmJyM6devXyoqa2rAe9NORKPDoEGD0j333JOHgs2ePTsHB1tssUWaO3duuuqqq+qdNy8aL6IHZs0GjhANHB/96EfX6esAyqd+aN++fbr11lvTyJEjc5ARPSAOO+ywdM4555gvH5p5/RB1QAQzo0aNyr0nt9lmm5zU0Bsb1o34fd5///254eHQQw/Nv8PoORn/T0eDQTj33HPztC1xrf/qq6/m/8e/9rWv5fuiQSF+szF3dTz2yCOPzMfHb7qx4nn33XffPEXM4YcfnpMUUXdEYjN6VkcPy+iRuaJYqycaT1acIzuuN1bl/EDTqh9izbBp06bltolYTyPWD4t1v+6999602267reF3CZqnItcPjz/+eK4fYt2PeB1R/vPPP7/QbRMVsVp4qQsB5eKCCy5IY8eOTcOHD08DBgxIH/rQh3JFE5nO+LFHY8OnPvWpUhcTKAH1A9AQ9QOwOmIEVyQlYpTWkCFDqhseY1q7Sy+9NI/uiroFaH7UD0BD1A/LSWrACqInw/e+97109913p/h5xEI8u+66azr11FPTSSedVJ19BZof9QPQEPUDsDpeeOGFdN555+XpJ6qmr421dWIEWCRKO3ToUOoiAiWifgAa8oL6QVIDGhI9LGfNmpU23HDDtPHGG5e6OEAZUT8ADVE/AKsjpoyIhT8jAdqlS5dSFwcoI+oHoCHLmnH9IKkBAAAAAAAUgnHwAAAAAABAIUhqAAAAAAAAhSCpAQAAAAAAFIKkBgAAAAAAUAiSGgAAAAAAQCFIagAAAAAAAIUgqQEAAAAAABSCpAYAAAAAAFAIkhoAAAAAAEAqgv8HWEuewWUgDTwAAAAASUVORK5CYII=",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "\n",
+ "# Create figure with two subplots side by side\n",
+ "fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 5))\n",
+ "\n",
+ "# Plot MRR scores\n",
+ "mrr_cols = [\"mrr@1\", \"mrr@3\", \"mrr@5\"]\n",
+ "x = np.arange(len(mrr_cols))\n",
+ "width = 0.25\n",
+ "\n",
+ "for i, model in enumerate(df.index):\n",
+ " offset = (i - 1) * width\n",
+ " ax1.bar(x + offset, df.loc[model, mrr_cols], width, label=model)\n",
+ "\n",
+ "ax1.set_title(\"Mean Reciprocal Rank (MRR)\")\n",
+ "ax1.set_xticks(x)\n",
+ "ax1.set_xticklabels(mrr_cols)\n",
+ "ax1.set_ylabel(\"Score\")\n",
+ "ax1.legend()\n",
+ "ax1.grid(True, alpha=0.3)\n",
+ "\n",
+ "# Plot Recall scores\n",
+ "recall_cols = [\"recall@1\", \"recall@3\", \"recall@5\"]\n",
+ "x = np.arange(len(recall_cols))\n",
+ "\n",
+ "for i, model in enumerate(df.index):\n",
+ " offset = (i - 1) * width\n",
+ " ax2.bar(x + offset, df.loc[model, recall_cols], width, label=model)\n",
+ "\n",
+ "ax2.set_title(\"Recall\")\n",
+ "ax2.set_xticks(x)\n",
+ "ax2.set_xticklabels(recall_cols)\n",
+ "ax2.set_ylabel(\"Score\")\n",
+ "ax2.legend()\n",
+ "ax2.grid(True, alpha=0.3)\n",
+ "\n",
+ "plt.tight_layout()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this notebook, we successfully fine-tuned the BAAI/bge-base-en model using sentence-transformers, achieving a 20% improvement in MRR@1 and a 12% increase in recall@5 compared to the base model. More importantly, we learned how to balance performance gains against implementation complexity.\n",
+ "\n",
+ "This concludes our exploration of fine-tuning approaches, where we:\n",
+ "\n",
+ "1. Created synthetic training data thoughtfully using iterative generation and manual review to ensure quality\n",
+ "2. Explored managed re-rankers through Cohere, showing how they offer quick wins with minimal setup\n",
+ "3. Implemented open-source fine-tuning using sentence-transformers, trading simplicity for greater control and lower inference costs\n",
+ "\n",
+ "While managed solutions like Cohere offer faster time-to-production and simplified deployment, open-source models provide greater control and customization potential. The choice between these approaches should be guided by your specific requirements, resources, and the level of performance improvement needed for your use case. \n",
+ "\n",
+ "These retrieval improvements complement the techniques we'll explore in later weeks - like discovering query patterns (Week 4) and handling structured data (Week 5). By combining effective retrieval with these methods, we can build more robust and capable RAG systems."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.6"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}