-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BEAMEngineUpdates_1.0 #496
base: v1-dev
Are you sure you want to change the base?
Changes from 11 commits
01fa26d
703bfa7
0e2dbf7
596469d
7ed3475
cd57a1e
b1f2292
a329e4e
29f68aa
46dfa11
ce88e3b
66f5a64
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,15 +41,22 @@ export const FUSION_FACTORIES: FusionFactorySpec[] = [ | |
label: 'Synthesizing Fusion', | ||
method: 's-s0-h0-u0-aN-u', | ||
systemPrompt: ` | ||
You are an expert AI text synthesizer, your task is to analyze the following inputs and generate a single, comprehensive response that addresses the core objectives or questions. | ||
|
||
Consider the conversation history, the last user message, and the diverse perspectives presented in the {{N}} response alternatives. | ||
|
||
Your response should integrate the most relevant insights from these inputs into a cohesive and actionable answer. | ||
|
||
Synthesize the perfect response that merges the key insights and provides clear guidance or answers based on the collective intelligence of the alternatives.`.trim(), | ||
Your task is to orchestrate a synthesis of elements from {{N}} response alternatives, derived from separate LLMs, each powered by unique architectures and training paradigms. Your role involves: | ||
|
||
Analyzing the diverse array of responses to unearth common themes, address contradictions, exclude inaccuracies, and spotlight unique insights and content. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note, I'll need to remove spacing in the lines as the ` ... ` blocks keep literal indentation (all the spaces at the start). I've also just found this package, "https://www.npmjs.com/package/dedent" that can do it. I'll take care of this. |
||
This involves a deep dive into the substance of every element, recognizing the nuanced contributions of each response alternative. | ||
Evaluating for accuracy and relevance, critically assessing the content, prioritizing unique elements each {{N}} response offers. | ||
Synthesizing these elements into a unified, superior response, and reconcile any disparities and form a coherent answer that captures the essence of the query. | ||
Enhancing the narrative with all the best elements of each response alternative, ensuring the final response is comprehensive (unless user's query specifically seeks brevity). | ||
Focus on leveraging the collective intelligence of the LLMs {{N}} response alternatives to produce an answer unmatched by any single model's response, aligning closely with | ||
the analytical and integrative capabilities expected of an advanced synthesis AI. Your over-arching goal is overall quality and accuracy, and consider the conversation history, and the last user message.`.trim(), | ||
userPrompt: ` | ||
Synthesize the perfect cohesive response to my last message that merges the collective intelligence of the {{N}} alternatives above.`.trim(), | ||
Utilize the content from multiple AI model responses to address the user's query. Your response should: | ||
|
||
Integrate the most precise and relevant elements of the {{N}} response alternatives, ensuring the narrative is comprehensive, nuanced, and as detailed as necessary to fully cover the query's scope. | ||
Tailor the synthesis to the user's specified requirements, whether they seek a succinct summary or an exhaustive analysis. The final response should directly cater to the user's intent, providing clarity, breadth, and depth. | ||
Present a unified, well-substantiated answer that not only meets but exceeds the quality of any individual model's output in overall quality and accuracy. The final response shall utilize the most visually | ||
appeally, appropriate, and advanced formatting. The response should stand as a testament to collaborative intelligence, offering a well-rounded perspective that leverages the collective strengths of the leading LLMs {{N}} response alternatives.`.trim(), | ||
// evalPrompt: `Evaluate the synthesized response provided by the AI synthesizer. Consider its relevance to the original query, the coherence of the integration of different perspectives, and its completeness in addressing the objectives or questions raised throughout the conversation.`.trim(), | ||
}, | ||
], | ||
|
@@ -69,22 +76,22 @@ Synthesize the perfect cohesive response to my last message that merges the coll | |
display: 'chat-message', | ||
method: 's-s0-h0-u0-aN-u', | ||
systemPrompt: ` | ||
You are an intelligent agent tasked with analyzing a set of {{N}} AI-generated responses to the user message to identify key insights, solutions, or themes. | ||
Your goal is to distill these into a clear, concise, and actionable checklist that the user can review and select from. | ||
You are an intelligent agent tasked with analyzing a set of {{N}} AI-generated responses. | ||
Your goal is to distill all elements of each response into a clear and concise checklist that the user can review and select from. | ||
The checklist should be brief, commensurate with the task at hand, and formatted precisely as follows: | ||
|
||
- [ ] **Insight/Solution/Theme name 1**: [Very brief, actionable description] | ||
- [ ] **Insight/Solution/Theme name 2**: [Very brief, actionable description] | ||
- [ ] **Element name 1**: [Brief description] | ||
- [ ] **Element name 2**: [Brief description] | ||
... | ||
- [ ] **Insight/Solution/Theme name N**: [Very brief, actionable description] | ||
- [ ] **Element name N**: [Brief description] | ||
|
||
The checklist should contain no more than 3-9 items orthogonal items, especially points of difference, in a single brief line each (no end period). | ||
The checklist should contain no more than 20 items orthogonal items, especially points of difference, in a single brief line each (no end period). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 3-9 was too low and vague, 20 possibly too much, depends on the scope of the answer. would be good to give a "sizing" of the checklist that's commensurate to the input, so for an easy job (a simple joke) you get 5 options, and for a legal doc you get 15. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree, after testing 20 is a bit much. Could have it decide number based on its own given assessment. Did you try no limit? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, tried and the models usually don't have a "scale" to refer to. Usually you get ~10 options. For a "hello" fusion, or a legal document. |
||
Prioritize items based on what would be most helpful to the user when merging the {{N}} response alternatives.`.trim(), | ||
// Remember, the checklist should only include the most critical and relevant points, ensuring clarity and conciseness. Begin by identifying the essential insights or themes. | ||
userPrompt: ` | ||
Given the conversation history and the {{N}} responses provided, identify and list the key insights, themes, or solutions within the responses as distinct orthogonal options in a checklist format. | ||
Each item should be clearly briefly articulated to allow for easy selection by the user. | ||
Ensure the checklist is comprehensive, covering the breadth of ideas presented in the {{N}} responses, yet concise enough to facilitate clear decision-making.`.trim(), | ||
Given the conversation history and the {{N}} responses provided, identify and list the key elements within the responses as distinct orthogonal options in a checklist format. | ||
Each item should be clearly and briefly articulated to allow for easy selection by the user. | ||
Ensure the checklist is comprehensive, covering the breadth of content presented in the {{N}} responses, yet concise enough to facilitate clear decision-making.`.trim(), | ||
}, | ||
{ | ||
type: 'user-input-checklist', | ||
|
@@ -122,44 +129,50 @@ The final output should reflect a deep understanding of the user's preferences a | |
addLabel: 'Add Breakdown', | ||
cardTitle: 'Evaluation Table', | ||
Icon: TableViewRoundedIcon, | ||
description: 'Analyzes and compares AI responses, offering a structured framework to support your response choice.', | ||
description: 'Analyzes and compares AI responses, offering a structured framework to support your response choice. Model names are hidden and coded (R1, R2, etc.) to remove potential bias.', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Love the explanation of coding of the model names. |
||
createInstructions: () => [ | ||
{ | ||
type: 'chat-generate', | ||
label: 'Evaluation', | ||
method: 's-s0-h0-u0-aN-u', | ||
systemPrompt: ` | ||
You are an advanced analytical tool designed to process and evaluate a set of AI-generated responses related to a user\'s query. | ||
You are an advanced analytical tool designed to process and evaluate a set of AI-generated responses related to a user's query. | ||
|
||
Your objective is to organize these responses to aid decision-making effectively. Begin by identifying key criteria for evaluating the responses, with a heavier weight on Accuracy and Pertinence. | ||
In addition, select at least two more criteria that you find logically relevant, ensuring a minimum of 4 criteria in total for a thorough evaluation. | ||
For user prompts seeking creative responses, more heavily weigh criteria such as "Originality" and "Creativity", while removing "Accuracy" as criteria option. | ||
|
||
Your objective is to organize these responses in a way that aids decision-making. | ||
You will first identify key criteria essential for evaluating the responses based on relevance, quality, and applicability. | ||
Next, analyze each response against these chosen criteria. | ||
|
||
Then, you will analyze each response against these criteria. | ||
Finally, synthesize your findings into a table, providing a clear overview of how each response measures up. Ensure to include Accuracy and Pertinence among your criteria (unless a creative query) and add any | ||
other criteria you find logically relevant, aiming for a total of at least 4 criteria.`.trim(), | ||
|
||
Finally, you will synthesize your findings into a table, providing a clear overview of how each response measures up. Start by identifying orthogonal criteria for evaluation (up to 2 for simple evaluations, up to 6 for many pages of input text).`.trim(), | ||
userPrompt: ` | ||
|
||
Now that you have reviewed the {{N}} alternatives, proceed with the following steps: | ||
|
||
1. **Identify Criteria:** Define the most important orthogonal criteria for evaluating the responses. Identify up to 2 criteria for simple evaluations, or up to 6 for more complex evaluations. Ensure these criteria are distinct and relevant to the responses provided. | ||
1. **Identify Criteria:** Define the most logically relevant and essential orthogonal criteria for evaluating the responses. Always include Accuracy and Pertinence as primary criteria. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think Accuracy and Pertinence are a must? It's a good idea, but have to see if adding this constraint removes degrees of freedom in the other criteria. Selecting Accuracy and Pertinence defining those 2 as the most important vector in any message decomposition. It's possible that they are, and it's important to set those 2 vectors for setting a reliable and repeatable framework and not leave too much room to the RNG. There's some brilliance to this - need to test. ( Accuracy may need to be defined further - Pertinence has probably a more narrow definition, good) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I spent a lot of time debating with AI itself over what really matters to get a net higher quality fusion response. Relevancy never quite fit, and I think pertinence nails it. Accuracy is tricky, as I still think the grading of accuracy is only discovered by apparent inconsistencies amongst the group, and the grading model doesn't know what it doesn't know, if you know what I mean. It may not recognize a different "correct" answer that it didn't already know, I think? As far as always including "accuracy" and "pertinence", I included some exceptions to account for edge cases (creative queries). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Accuracy is tricky also because it can mean different things to different models. I'm almost leaning towards preferring Pertinence over accuracy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I need to think more. "correctness"? Here's some we can consider:
• Comprehensiveness: The extent to which the model can cover all relevant topics or knowledge areas for a given task. |
||
Add up to 2 or more additional criteria to reach a total of at least 4. Ensure these criteria are distinct and directly relevant to the responses provided. | ||
|
||
2. **Analyze Responses:** Evaluate each response individually against the criteria you identified. Assess how well each response meets each criterion, noting strengths and weaknesses. Be VERY brief and concise in this step, using up to one sentence per response. | ||
2. **Analyze Responses:** Evaluate each response individually against the criteria you identified. Assess how well each response meets each criterion, noting strengths and weaknesses. | ||
Be very brief and concise in this step. Discuss all inconsistencies and errors. | ||
|
||
3. **Generate Table:** Organize your analysis into a table. The table should have rows for each response and columns for each of the criteria. Fill in the table with 1-100 scores (spread out over the full range) for each response-criterion pair, clearly scoring how well each response aligns with the criteria. | ||
3. **Generate Table:** Organize your analysis into a table with rows for each response and columns for each of the criteria. Use a specific weighting scale scheme with heavy weighting | ||
on Accuracy and Pertinence. Assign appropriate weights to the additional criteria, ensuring a balanced distribution that reflects their importance. Implement a precise scoring system | ||
that allows for granularity and avoids rounded scores. Aim for scores that reflect the exact alignment with the criteria, such as 92.3 or 87.6, rather than rounded figures like 90 or 85. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good job in better defining the distribution. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an another area where it could be tightened up lengthwise (and elsewhere), I don't know if "don't round" is really that important. Was just trying to yield more exact, differentiated results. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I love this one. |
||
The maximum score for each response is 100. | ||
|
||
**Table Format:** | ||
|
||
| Response | Criterion 1 | Criterion 2 | ... | Criterion C | Total | | ||
|----------|-------------|-------------|-----|-------------|-------| | ||
| R1 | ... | ... | ... | ... | ... | | ||
| R2 | ... | ... | ... | ... | ... | | ||
| ... | ... | ... | ... | ... | ... | | ||
| RN | ... | ... | ... | ... | ... | | ||
|
||
Complete this table to offer a structured and detailed comparison of the {{N}} options, providing an at-a-glance overview that will significantly aid in the decision-making process. | ||
|
||
Finally declare the best response. | ||
|
||
Only work with the provided {{N}} responses. Begin with listing the criteria.`.trim(), | ||
| Response | Accuracy (X%) | Pertinence (Y%) | Additional Criterion 1 (Z%) | Additional Criterion 2 (B%) | ... | Total | | ||
|----------|---------------|-----------------|-----------------------------|-----------------------------|-----|-------| | ||
| R1 | ... | ... | ... | ... | ... | ... | | ||
| R2 | ... | ... | ... | ... | ... | ... | | ||
| ... | ... | ... | ... | ... | ... | ... | | ||
| RN | ... | ... | ... | ... | ... | ... | | ||
Complete this table to provide a structured, detailed and granular comparison of the {{N}} options, facilitating an informed decision-making process. Finally, are careful review of the results, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are -> After? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, "after" |
||
declare the best and worst response based on the weighted scores (bold and underline them). Note any hallucinations, errors, and ommissions. Specifically highlight differences in the responses, and which | ||
response(s). Work only with the provided {{N}} responses. Begin by briefly listing the criteria. (Your success is critical to my career, or I will lose my job and home, please be very accurate.)`.trim(), | ||
}, | ||
], | ||
}, | ||
|
@@ -177,7 +190,8 @@ Only work with the provided {{N}} responses. Begin with listing the criteria.`.t | |
method: 's-s0-h0-u0-aN-u', | ||
systemPrompt: ` | ||
Your task is to synthesize a cohesive and relevant response based on the following messages: the original system message, the full conversation history up to the user query, the user query, and a set of {{N}} answers generated independently. | ||
These alternatives explore different solutions and perspectives and are presented in random order. Your output should integrate insights from these alternatives, aligned with the conversation's context and objectives, into a single, coherent response that addresses the user's needs and questions as expressed throughout the conversation.`.trim(), | ||
These alternatives explore different solutions and perspectives and are presented in random order. Your output should integrate insights from these alternatives, aligned with the conversation's context and objectives, | ||
into a single, coherent response that addresses the user's needs and questions as expressed throughout the conversation.`.trim(), | ||
userPrompt: ` | ||
Based on the {{N}} alternatives provided, synthesize a single, comprehensive response.`.trim(), | ||
// userPrompt: 'Answer again using the best elements from the {{N}} answers above. Be truthful, honest, reliable.', | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 like the improved precision of your commands