Skip to content

wlvh/RBT_RL

Repository files navigation

金融决策思维链数据集构建框架 / Financial Decision-Making Chain Dataset Construction Framework

GitHub License GitHub Stars GitHub Forks GitHub Issues

项目概述 / Project Overview

本项目设计并实现了一个创新的金融决策思维链数据集构建框架,融合了多Agent系统、强化学习(RL)和大型语言模型(LLM),实现了无监督的高效思维链数据集生成。该框架旨在提升金融决策过程的多样性、深度和结构化水平,为金融分析和决策支持提供高质量的数据基础。

This project designs and implements an innovative Financial Decision-Making Chain Dataset Construction Framework, integrating multi-agent systems, Reinforcement Learning (RL), and Large Language Models (LLM) to achieve efficient, unsupervised generation of decision-making chain datasets. The framework aims to enhance the diversity, depth, and structuring of financial decision-making processes, providing high-quality data foundations for financial analysis and decision support.

主要功能 / Key Features

A. 多样化Agent生态系统 / A. Diverse Agent Ecosystem

  • 设计与训练多样化Agents:构建并训练了超过400种具有不同风格和目标的智能Agent,每个Agent具备独特的决策逻辑和行为模式。

    Design and Train Diverse Agents: Developed and trained over 400 intelligent Agents with varying styles and objectives, each possessing unique decision-making logic and behavior patterns.

  • 生成多角度决策日志:通过多Agent协作,生成涵盖不同市场情景和决策路径的多层次决策日志,确保数据的全面性和多样性。

    Generate Multi-Angle Decision Logs: Through multi-Agent collaboration, generate multi-level decision logs covering various market scenarios and decision paths, ensuring data comprehensiveness and diversity.

  • 定制化Agent行为:根据金融决策的具体需求,定制Agent的行为策略超参数,实现对不同金融产品和市场条件的适应。

    Customized Agent Behavior: Tailor Agent behavior strategy hyperparameters based on specific financial decision-making requirements to adapt to different financial products and market conditions.

B. 强化学习驱动的Agent筛选和组合机制 / B. RL-Driven Agent Selection and Combination Mechanism

  • 多目标RL系统开发:开发了基于多目标强化学习的Agent筛选和组合系统,优化多个目标如决策有效性、多样性和长期收益。

    Development of Multi-Objective RL System: Developed an Agent selection and combination system based on multi-objective Reinforcement Learning, optimizing multiple goals such as decision effectiveness, diversity, and long-term rewards.

  • 奖励函数设计:设计复杂的奖励函数,平衡短期和长期奖励,确保Agent的筛选过程既能实现高效决策又保持多样性。

    Reward Function Design: Designed complex reward functions to balance short-term and long-term rewards, ensuring the Agent selection process achieves efficient decision-making while maintaining diversity.

  • 优化算法应用:采用PPO强化学习算法,提高Agent筛选的准确性和效率。

    Application of Optimization Algorithms: Utilized Proximal Policy Optimization (PPO) reinforcement learning algorithms to enhance the accuracy and efficiency of Agent selection.

C. LLM与RL融合的思维链生成器 / C. LLM and RL Integrated Decision-Making Chain Generator

  • 集成大型语言模型:将GPT-4 o1等先进LLM集成到思维链生成器中,提升自然语言理解和生成能力。

    Integration of Large Language Models: Integrated advanced LLMs such as GPT-4 o1 into the decision-making chain generator to enhance natural language understanding and generation capabilities.

  • 数据模板设计:创建结构化数据模板,确保生成的思维链具有一致的格式和高质量的内容。

    Data Template Design: Created structured data templates to ensure the generated decision-making chains have consistent formats and high-quality content.

  • RL优化路径:使用强化学习优化Agent推荐路径,确保生成的思维链既符合逻辑又具备创新性。

    RL Path Optimization: Employed Reinforcement Learning to optimize Agent recommendation paths, ensuring the generated decision-making chains are both logical and innovative.

  • 协同生成机制:实现LLM与RL模块的无缝协作,通过中间层接口协调两者的工作流程,提升思维链生成的效率和质量。

    Collaborative Generation Mechanism: Achieved seamless collaboration between LLM and RL modules by coordinating their workflows through intermediate layer interfaces, enhancing the efficiency and quality of decision-making chain generation.

D. 双轨模型优化策略 / D. Dual-Track Model Optimization Strategy

  • 监督微调(SFT):利用标注数据对LLM进行监督微调,提升模型在金融决策推理中的准确性和可靠性。

    Supervised Fine-Tuning (SFT): Utilized labeled data to perform supervised fine-tuning on LLMs, enhancing the accuracy and reliability of the model in financial decision-making reasoning.

  • 近端策略优化(PPO):应用PPO算法优化Agent的决策策略,提高模型在复杂决策场景下的表现。

    Proximal Policy Optimization (PPO): Applied PPO algorithms to optimize Agent decision strategies, improving model performance in complex decision-making scenarios.

  • 协同训练流程:设计了阶段性训练流程,使SFT和PPO策略互补,共同提升LLM的思维链推理能力。

    Collaborative Training Process: Designed a phased training process where SFT and PPO strategies complement each other, jointly enhancing the LLM's decision-making chain reasoning capabilities.

  • 性能提升评估:通过定量指标(如决策准确率、思维链一致性)评估优化策略的效果,验证模型性能的显著提升。

    Performance Improvement Evaluation: Assessed the effectiveness of optimization strategies using quantitative metrics (e.g., decision accuracy, chain consistency) to verify significant enhancements in model performance.

E. 无监督的快速扩充高质量思维链数据集 / E. Unsupervised Rapid Expansion of High-Quality Decision-Making Chain Dataset

  • 自动化数据生成流程:构建了全自动化的数据生成流程,实现高效的思维链数据集扩充,无需人工干预。

    Automated Data Generation Process: Established a fully automated data generation process to efficiently expand the decision-making chain dataset without the need for manual intervention.

  • 质量控制机制:引入多层次过滤和验证机制,确保生成的思维链数据具备高质量和相关性,排除低质量或无关数据。

    Quality Control Mechanism: Introduced multi-level filtering and validation mechanisms to ensure the generated decision-making chain data maintains high quality and relevance, excluding low-quality or unrelated data.

  • 数据集规模与覆盖面:目前已构建超过1万条思维链数据,涵盖多种金融决策场景如风险评估和市场分析。

    Dataset Scale and Coverage: Currently constructed over 10,000 decision-making chain data entries, covering various financial decision-making scenarios such as risk assessment and market analysis.

F. 未来计划 / F. Future Plans

  • 强化学习作为LLM Agent的记忆与规划模块:计划将强化学习技术应用于LLM Agent的记忆和规划功能,实现智能的营销策略和文案生成。

    Reinforcement Learning as Memory and Planning Modules for LLM Agents: Plan to apply Reinforcement Learning techniques to the memory and planning functions of LLM Agents, enabling intelligent generation of marketing strategies and copywriting.

  • 自动化营销策略生成:通过组合多源素材,自动生成针对不同市场需求的定制化营销策略和文案,提高营销效率和效果。

    Automated Marketing Strategy Generation: Automatically generate customized marketing strategies and copywriting tailored to different market demands by combining multi-source materials, enhancing marketing efficiency and effectiveness.

技术栈与工具 / Tech Stack and Tools

  • 编程语言 / Programming Languages: Python, Java
  • 框架与库 / Frameworks and Libraries: PyTorch, OpenAI GPT-4, Stable Baselines 3
  • 工具与平台 / Tools and Platforms: GitHub, Docker, Azure (Cloud Computing)
  • 方法与技术 / Methods and Technologies: 多Agent系统设计, 强化学习(PPO), 监督微调 (SFT), 数据模板设计, 无监督学习
    Multi-agent system design, Reinforcement Learning (PPO), Supervised Fine-Tuning (SFT), Data Template Design, Unsupervised Learning

安装与使用 / Installation and Usage

前提条件 / Prerequisites

  • Python 3.8+
  • Docker(可选,用于容器化部署) / Docker (optional, for containerized deployment)
  • 云计算资源(如AWS或GCP) / Cloud computing resources (e.g., AWS or GCP)

安装步骤 / Installation Steps

  1. 克隆仓库 / Clone the repository:

    git clone https://github.com/wlvh/RBT.git
    cd RBT/RL
  2. 创建并激活虚拟环境 / Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # Linux/macOS
    venv\Scripts\activate  # Windows
  3. 安装依赖 / Install dependencies:

    pip install -r requirements.txt
  4. 运行项目 / Run the project:

    python main.py

使用示例 / Usage Examples

详细的使用说明和示例代码请参考 Wiki示例教程

For detailed usage instructions and example code, please refer to the Wiki or Example Tutorials.

项目链接 / Project Links

成就与成果 / Achievements and Outcomes

  • 成功生成并扩充了高质量的金融决策思维链数据集,显著提升了金融决策支持系统的性能。

    Successfully generated and expanded a high-quality financial decision-making chain dataset, significantly enhancing the performance of financial decision support systems.

  • 通过双轨模型优化策略,提升LLM在复杂金融决策推理任务中的准确性和一致性。

    Enhanced the accuracy and consistency of LLMs in complex financial decision-making reasoning tasks through dual-track model optimization strategies.

  • 实现了无监督的数据扩充方法,大幅减少了数据准备的时间和人力成本。

    Implemented an unsupervised data expansion method, greatly reducing the time and labor costs associated with data preparation.

贡献 / Contributions

欢迎贡献!请阅读 贡献指南 了解如何参与。

Welcome contributions! Please read the Contributing Guide to learn how to participate.

许可证 / License

本项目采用 MIT 许可证

This project is licensed under the MIT License.

联系方式 / Contact

如有任何问题或建议,请通过 Issues 联系我们。

For any questions or suggestions, please contact us through Issues.


About

RL-driven financial decision-chain dataset and agent-selection experiments for RBT workflows

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages