This project aims to build a comprehensive automation to identify real estate properties with high investment potential. The process involves automated data extraction, processing, analysis, and generating a final output with the most relevant properties.
To run the main dashboard application, execute the following command from the project root directory:
python run_dashboard.pyThis script handles all necessary setup, including configuring the Python path, and launches the Streamlit interactive dashboard.
The complete automation flow consists of the following steps:
Automates the login to a real estate portal and downloads the property inventory using Selenium WebDriver.
Script: src/data_collection/download_inventory.py
Downloads individual property PDFs directly via HTTP requests (no WebDriver needed for this step).
Script: src/data_collection/download_pdf.py
Cleans, transforms, and loads property data into the PostgreSQL database. This logic is now modularized for clarity and maintenance.
Modules:
src/data_processing/data_cleaner.pysrc/data_processing/excel_converter.pysrc/data_access/property_repository.py
An interactive Streamlit dashboard to filter and visualize properties based on various criteria.
Main App: src/visualization/dashboard_app.py
For a detailed discussion on the project's future roadmap, please refer to docs/next_steps_project_roadmap.md.
graph TD
subgraph User Interaction
User[Usuario] --> Runner(run_dashboard.py)
end
subgraph Application Execution
Runner -- Sets PYTHONPATH & Executes --> Streamlit(Streamlit Engine)
Streamlit -- Runs --> DashboardApp(visualization/dashboard_app.py)
end
subgraph Core Modules
DashboardApp -- Uses --> Logic(visualization/dashboard_logic.py)
DashboardApp -- Uses --> UI(visualization/ui_components.py)
DashboardApp -- Interacts with --> Repo(data_access/property_repository.py)
Repo -- Manages --> DB[(PostgreSQL Database)]
end
subgraph Data Flow
RawExcel[Archivos Excel Crudos] --> Cleaner(data_processing/data_cleaner.py)
Cleaner -- Uses --> Converter(data_processing/excel_converter.py)
Cleaner -- Uses --> Validator(data_processing/data_validator.py)
Cleaner -- Loads Data --> Repo
end
subgraph Configuration
EnvFile[.env] -- Provides Credentials --> Repo
EnvFile -- Provides Settings --> DashboardApp
end
style Runner fill:#cde4f7,stroke:#333,stroke-width:2px
style Streamlit fill:#ff4b4b,stroke:#333,stroke-width:2px
graph TD
subgraph Execution Flow
A[Usuario] --> B(run_dashboard.py)
B -- Launches --> C(streamlit run src/visualization/dashboard_app.py)
end
subgraph Data Initialization
D[Agente] --> E(src/data_collection/download_inventory.py)
E -- Downloads --> F{Archivos Excel Crudos}
F --> G(src/data_processing/data_cleaner.py)
G -- Cleans & Transforms --> H(data_access/property_repository.py)
H -- Loads Data --> I[(PostgreSQL Database)]
end
subgraph Dashboard Interaction
C -- Displays Interactive --> J[Dashboard UI]
J -- Filters Data via --> H
I -- Credentials via --> K[.env file]
end
subgraph Setup
L[src/db_setup/create_db_table.py] -- Creates Table --> I
end
graph TD
subgraph Current Foundation
A[(PostgreSQL Database)]
B[Modular Codebase]
end
subgraph Next Development Steps
A --> C{Step 1: Unit & Integration Testing}
B --> C
C -- Ensures Reliability --> D{Step 2: Advanced Visualizations}
D -- e.g., Maps, Charts --> E{Step 3: PDF & Image Analysis}
E -- Extracts Data --> F{Step 4: Reporting Engine}
F -- Generates --> G[PDF/CSV Reports]
G --> H[User/Agent]
end
subgraph Details
C -- Validates --> Modules[data_cleaner, property_repository, etc.]
D -- Enhances --> Dashboard[Dashboard UI]
E -- Processes --> Files[Property Brochures, Images]
end
Install the required Python libraries by running:
pip install -r src/data_collection/requirements.txtEnsure chromedriver.exe (for Google Chrome) matches your browser version and is placed in the src/data_collection/ directory. This is required for the initial inventory download.
Create a .env file in the project root with your PostgreSQL credentials. Refer to .env.example for the required variables.
To set up the database table, run:
python src/db_setup/create_db_table.pyTo execute all unit tests for the project, navigate to the project root directory in your terminal and run:
python -m pytest tests/To run a specific test file, you can specify the path to the test file:
python -m pytest tests/test_dashboard_logic.pyFor a detailed roadmap of existing and pending tests, refer to docs/testing_roadmap.md.
- Environment Variables: Ensure all required variables are set in your
.envfile. - WebDriver: Keep
chromedriver.exeupdated and correctly placed. - Web Portal Changes: The real estate portal's structure may change, requiring script adjustments.
Follow these steps to get started with the data processing workflow:
pip install -r requirements.txtpython src/data_processing/data_validator.py- If
reports/missing_critical.csvexists, runpython src/scripts/pdf_autofill.py - If
manual_fixes.csvis created, fill it and runpython src/scripts/apply_manual_fixes.py python -m pytest tests/(all tests must pass)- Commit & push referencing the roadmap milestone.