This prototype explores how Large Language Models (LLMs) can enhance research workflows by enabling intelligent analysis of large volumes of qualitative and quantitative data. By identifying patterns, scoring responses against user-defined metrics, and surfacing nuanced insights through advanced data processing, it supports more efficient research decision-making, improves accessibility to complex datasets, and fosters deeper understanding of research content through personalized, adaptive AI assistance.
| Index | Description |
|---|---|
| High Level Architecture | High level overview illustrating component interactions |
| Deployment | How to deploy the project |
| User Guide | The working solution |
| Directories | General project directory structure |
| API Documentation | Documentation on the API the project uses |
| Credits | Meet the team behind the solution |
| License | License details |
The following architecture diagram illustrates the various AWS components utilized to deliver the solution. For an in-depth explanation of the frontend and backend stacks, please look at the Architecture Guide.
To deploy this solution, please follow the steps laid out in the Deployment Guide
Please refer to the Web App User Guide for instructions on navigating the web app interface.
├── cdk/
│ ├── bin/
│ ├── lambda/
│ ├── layers/
│ ├── stacks/
│ └── OpenAPI_Swagger_Definition.yaml
├── docs/
│ ├── userGuide.md
│ ├── deploymentGuide.md
│ ├── architectureDeepDive.md
│ ├── securityGuide.md
│ ├── Experimentation_Guide.md
│ ├── data_ingestion.md
│ ├── bedrock_guardrails.md
│ ├── dependencyManagement.md
│ ├── modificationGuide.md
│ ├── scoring.md
│ ├── troubleshootingGuide.md
│ ├── api-documentation.pdf
│ ├── data_ingestion/
│ │ ├── helpers/
│ │ │ ├── helper.md
│ │ │ └── vectorstore.md
│ │ └── processing/
│ │ └── documents.md
│ └── media/
├── frontend/
│ ├── public/
│ └── src/
│ ├── app/
│ └── components/
├── Notebooks/
│ ├── LLM_scoring.ipynb
│ └── RAG_model.ipynb
/cdk: Contains the deployment code for the app's AWS infrastructure/bin: Contains the instantiation of CDK stack/lambda: Contains the lambda functions for data ingestion, scoring, and other core functionalities/layers: Contains the required layers for lambda functions/stacks: Contains the deployment code for all infrastructure stacksOpenAPI_Swagger_Definition.yaml: API specification for the research data insights service
/docs: Contains comprehensive documentation for the application including:- User guides, deployment instructions, and architecture details
- Security and troubleshooting guides
- Bedrock guardrails and dependency management documentation
- Modification guides and scoring methodology
- Detailed data ingestion documentation with helper utilities and processing guides
/frontend: Contains the user interface of the research data insights application/Notebooks: Contains Jupyter notebooks for experimentation with LLM scoring and RAG (Retrieval-Augmented Generation) models
Here you can learn about the API the project uses: API Documentation.
For information on how to experiment with the LLM scoring and RAG models using the provided Jupyter notebooks, see the Experimentation Guide.
Details about the data ingestion process and how to work with research datasets can be found in the Data Ingestion Guide.
Security considerations and best practices for deploying and using the research data insights platform are outlined in the Security Guide.
- Bedrock Guardrails: Configuration and management of AWS Bedrock guardrails for AI safety
- Dependency Management: Managing project dependencies and version control
- Modification Guide: Guidelines for modifying and extending the application
- Troubleshooting Guide: Common issues and their solutions
- Scoring Documentation: Detailed information about the LLM scoring system and methodologies
- Data Processing Helpers: Utility functions and helper documentation for data processing
- Document Processing: Document processing workflows and procedures
This application was architected and developed by Harsh Amin, Rohit Murali, and Harleen Chahal. Thanks to the UBC Cloud Innovation Centre Technical and Project Management teams for their guidance and support.
This project is distributed under the MIT License.
Licenses of libraries and tools used by the system are listed below:
- For PostgreSQL and pgvector
- "a liberal Open Source license, similar to the BSD or MIT licenses."
LLaMa 3 Community License Agreement
- For Llama 3 70B Instruct model
