This project collects real estate data from Sarajevo online listings and performs regression analysis to predict apartment prices based on features like size, location, and other available attributes.
sarajevo-flats/
│
├── README.md # Project description and instructions
├── requirements.txt # Python dependencies
├── .gitignore # Files to ignore in Git
├── data/ # Folder for scraped or cleaned datasets
│ └── sarajevo_flats.csv # Scraped dataset
├── src/ # Python scripts
│ ├── scrape.py # Web scraping script using BeautifulSoup
│ ├── clean_data.py # Optional: cleaning/preprocessing data
│ └── regression.py # Regression model (train/test)
└── notebooks/ # Optional Jupyter notebooks for exploration
└── exploration.ipynb
- Clone the repository:
git clone https://github.com/EmreArapcicUevak/EE418-Introduction-to-Machine-Learning-Project
cd EE418-Introduction-to-Machine-Learning-Project- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtRun the scraping script to collect apartment listings:
python src/scrape.pyThis will save the dataset as data/sarajevo_flats.csv.
python src/clean_data.pyThis script will format prices, sizes, and handle missing values.
python src/regression.pyThis will train a regression model to predict apartment prices and display performance metrics.
- Python 3.8+
- BeautifulSoup4
- Requests
- Pandas
- Scikit-learn
- Lxml (parser for BeautifulSoup)
Install via:
pip install -r requirements.txt- Scraped data is intended for educational and research purposes only.
- Web page structure may change; scraping scripts may need updates accordingly.
- CSV files are ignored in
.gitignoreto avoid large data in Git.