-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create llm prompt #2366
Create llm prompt #2366
Conversation
Here's haiku on
The key difference between blocking rules for prediction and blocking rules for EM training is: Blocking rules for prediction:
Blocking rules for EM training:
In summary, the blocking rules for prediction are focused on efficiency and coverage of true matches, while the blocking rules for EM training are focused on providing the EM algorithm with the data it needs to accurately estimate the model parameters. |
Here's sonnet 3.5. Haiku is decent, sonnet is noticably better:
The key differences between blocking rules for prediction and blocking rules for EM training in Splink are:
In summary, blocking rules for prediction are more comprehensive and determine which record pairs are scored in the final model, while blocking rules for EM training are more focused and are used to efficiently estimate model parameters during the training phase. |
Prob want to fit prompt to 128k tokes so that it can be run gpt4 o mini for very little money GPT-4o mini Learn about GPT-4o mini(opens in a new window) |
24b3e5d
to
6835075
Compare
This is a script that creates a LLM prompt that includes all the key parts of the docs.
i.e. all the most important parts of the docs get put in context, and then the user asks a question.
At the moment, the prompt is about 60,000 tokens. So a single prompt with Anthropic Sonnet 3.5 costs about $0.20 (20 cents)
It seems to work pretty well, consider the following prompt:
And the output:
Same prompt with the haiku model, costing $0.02 gives similarly good results
Things to do:
.md
format to the prompt e.g. the part that discusses blocking rules and efficiency