This project conducts a comprehensive statistical analysis of groundwater quality across various states in India. Using Python and machine learning techniques, it processes water quality data, generates visualizations, and builds predictive models to classify water as "Safe" or "Unsafe" and predict specific water quality parameters (e.g., pH, DO, Conductivity, TDS etc).
- Analyze key water quality parameters such as pH, Conductivity, BOD, Nitrate, Faecal Coliform, Total Coliform, Total Dissolved Solids (TDS), and Fluoride.
- Provide state-wise statistical summaries and visualizations.
- Predict water quality safety using classification models (KNN and Random Forest).
- Predict continuous water quality parameters (e.g., pH) using regression models (KNN Regressor). The project leverages Python libraries like pandas, numpy, matplotlib, seaborn, and scikit-learn to process, visualize, and model the data.
Features:
- Data Cleaning: Handles missing values, erroneous entries (e.g., #DIV/0!), and converts data types for analysis.
- Exploratory Data Analysis (EDA): Generates descriptive statistics, histograms, box plots, violin plots, and correlation heatmaps.
- Visualization: Creates state-wise visualizations for water quality parameters using line, bar, scatter, and histogram plots.
Machine Learning:
- Classification: Predicts water quality as "Safe" or "Unsafe" using KNN and Random Forest classifiers.
- Regression: Predicts continuous parameters (e.g., pH) using KNN Regressor.
- Evaluation: Provides accuracy, precision, recall, F1-score, MAE, MSE, and R² metrics