Automated Real Estate Property Insights

This project aims to build a comprehensive automation to identify real estate properties with high investment potential. The process involves automated data extraction, processing, analysis, and generating a final output with the most relevant properties.

🚀 How to Run the Application

To run the main dashboard application, execute the following command from the project root directory:

python run_dashboard.py

This script handles all necessary setup, including configuring the Python path, and launches the Streamlit interactive dashboard.

🎯 Project Workflow

The complete automation flow consists of the following steps:

Step 1: Inventory Data Collection

Automates the login to a real estate portal and downloads the property inventory using Selenium WebDriver.

Script: src/data_collection/download_inventory.py

Step 1.1: PDF Download

Downloads individual property PDFs directly via HTTP requests (no WebDriver needed for this step).

Script: src/data_collection/download_pdf.py

Step 2: Data Cleaning, Validation, and Storage

Cleans, transforms, and loads property data into the PostgreSQL database. This logic is now modularized for clarity and maintenance.

Modules:

src/data_processing/data_cleaner.py
src/data_processing/excel_converter.py
src/data_access/property_repository.py

Step 3: Interactive Property Visualization

An interactive Streamlit dashboard to filter and visualize properties based on various criteria.

Main App: src/visualization/dashboard_app.py

For a detailed discussion on the project's future roadmap, please refer to docs/next_steps_project_roadmap.md.

📊 System Diagrams

🗺️ Overall System Architecture

graph TD
    subgraph User Interaction
        User[Usuario] --> Runner(run_dashboard.py)
    end

    subgraph Application Execution
        Runner -- Sets PYTHONPATH & Executes --> Streamlit(Streamlit Engine)
        Streamlit -- Runs --> DashboardApp(visualization/dashboard_app.py)
    end

    subgraph Core Modules
        DashboardApp -- Uses --> Logic(visualization/dashboard_logic.py)
        DashboardApp -- Uses --> UI(visualization/ui_components.py)
        DashboardApp -- Interacts with --> Repo(data_access/property_repository.py)
        Repo -- Manages --> DB[(PostgreSQL Database)]
    end

    subgraph Data Flow
        RawExcel[Archivos Excel Crudos] --> Cleaner(data_processing/data_cleaner.py)
        Cleaner -- Uses --> Converter(data_processing/excel_converter.py)
        Cleaner -- Uses --> Validator(data_processing/data_validator.py)
        Cleaner -- Loads Data --> Repo
    end

    subgraph Configuration
        EnvFile[.env] -- Provides Credentials --> Repo
        EnvFile -- Provides Settings --> DashboardApp
    end

    style Runner fill:#cde4f7,stroke:#333,stroke-width:2px
    style Streamlit fill:#ff4b4b,stroke:#333,stroke-width:2px

🚀 Current ETL Process State

graph TD
    subgraph Execution Flow
        A[Usuario] --> B(run_dashboard.py)
        B -- Launches --> C(streamlit run src/visualization/dashboard_app.py)
    end

    subgraph Data Initialization
        D[Agente] --> E(src/data_collection/download_inventory.py)
        E -- Downloads --> F{Archivos Excel Crudos}
        F --> G(src/data_processing/data_cleaner.py)
        G -- Cleans & Transforms --> H(data_access/property_repository.py)
        H -- Loads Data --> I[(PostgreSQL Database)]
    end

    subgraph Dashboard Interaction
        C -- Displays Interactive --> J[Dashboard UI]
        J -- Filters Data via --> H
        I -- Credentials via --> K[.env file]
    end

    subgraph Setup
        L[src/db_setup/create_db_table.py] -- Creates Table --> I
    end

➡️ Future Development Steps

graph TD
    subgraph Current Foundation
        A[(PostgreSQL Database)]
        B[Modular Codebase]
    end

    subgraph Next Development Steps
        A --> C{Step 1: Unit & Integration Testing}
        B --> C
        C -- Ensures Reliability --> D{Step 2: Advanced Visualizations}
        D -- e.g., Maps, Charts --> E{Step 3: PDF & Image Analysis}
        E -- Extracts Data --> F{Step 4: Reporting Engine}
        F -- Generates --> G[PDF/CSV Reports]
        G --> H[User/Agent]
    end

    subgraph Details
        C -- Validates --> Modules[data_cleaner, property_repository, etc.]
        D -- Enhances --> Dashboard[Dashboard UI]
        E -- Processes --> Files[Property Brochures, Images]
    end

⚙️ Dependencies and Installation

Install the required Python libraries by running:

pip install -r src/data_collection/requirements.txt

WebDriver

Ensure chromedriver.exe (for Google Chrome) matches your browser version and is placed in the src/data_collection/ directory. This is required for the initial inventory download.

Database Configuration

Create a .env file in the project root with your PostgreSQL credentials. Refer to .env.example for the required variables.

To set up the database table, run:

python src/db_setup/create_db_table.py

🧪 How to Run Tests

To execute all unit tests for the project, navigate to the project root directory in your terminal and run:

python -m pytest tests/

To run a specific test file, you can specify the path to the test file:

python -m pytest tests/test_dashboard_logic.py

For a detailed roadmap of existing and pending tests, refer to docs/testing_roadmap.md.

⚠️ Considerations

Environment Variables: Ensure all required variables are set in your .env file.
WebDriver: Keep chromedriver.exe updated and correctly placed.
Web Portal Changes: The real estate portal's structure may change, requiring script adjustments.

🚀 Kick-off Guide

Follow these steps to get started with the data processing workflow:

pip install -r requirements.txt
python src/data_processing/data_validator.py
If reports/missing_critical.csv exists, run python src/scripts/pdf_autofill.py
If manual_fixes.csv is created, fill it and run python src/scripts/apply_manual_fixes.py
python -m pytest tests/ (all tests must pass)
Commit & push referencing the roadmap milestone.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.codex		.codex
.logs		.logs
docs		docs
reports		reports
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
GEMINI_DEBUG_LOG.md		GEMINI_DEBUG_LOG.md
LICENSE		LICENSE
LICENSE.chromedriver		LICENSE.chromedriver
README.md		README.md
THIRD_PARTY_NOTICES.chromedriver		THIRD_PARTY_NOTICES.chromedriver
pytest.ini		pytest.ini
questions_for_dashboard_columns.txt		questions_for_dashboard_columns.txt
requirements.txt		requirements.txt
run_dashboard.py		run_dashboard.py
workflow_questions.txt		workflow_questions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Automated Real Estate Property Insights

🚀 How to Run the Application

🎯 Project Workflow

Step 1: Inventory Data Collection

Step 1.1: PDF Download

Step 2: Data Cleaning, Validation, and Storage

Step 3: Interactive Property Visualization

📊 System Diagrams

🗺️ Overall System Architecture

🚀 Current ETL Process State

➡️ Future Development Steps

⚙️ Dependencies and Installation

WebDriver

Database Configuration

🧪 How to Run Tests

⚠️ Considerations

🚀 Kick-off Guide

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Licenses found

fm-dev-mx/real_estate_insights

Folders and files

Latest commit

History

Repository files navigation

Automated Real Estate Property Insights

🚀 How to Run the Application

🎯 Project Workflow

Step 1: Inventory Data Collection

Step 1.1: PDF Download

Step 2: Data Cleaning, Validation, and Storage

Step 3: Interactive Property Visualization

📊 System Diagrams

🗺️ Overall System Architecture

🚀 Current ETL Process State

➡️ Future Development Steps

⚙️ Dependencies and Installation

WebDriver

Database Configuration

🧪 How to Run Tests

⚠️ Considerations

🚀 Kick-off Guide

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages