COSMIC is an open-source Python package for analyzing star clusters using machine learning techniques and Bayesian inference. Built specifically for processing Gaia satellite data, COSMIC employs unsupervised clustering algorithms and statistical analysis to identify and characterize open star clusters.
This project is currently in alpha development (v0.0.1) and is not yet recommended for production scientific work. The API may change significantly between versions as we work toward a stable release.
- Advanced Clustering: HDBSCAN-based clustering with hyperparameter optimization via Optuna
- Gaia Integration: Native support for Gaia data formats and photometric systems
- Flexible Preprocessing: Comprehensive data cleaning and preparation tools
- Analysis Tools: Statistical analysis and visualization of cluster properties
- Modular Design: Clean, extensible architecture with backward-compatible shims
# Clone the repository
git clone https://github.com/notluquis/COSMIC.git
cd COSMIC
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install in development mode
pip install --upgrade pip
pip install -e ".[dev]"- Python 3.11 or higher
- See
pyproject.tomlfor complete dependency list
import cosmic
# Load Gaia data
loader = cosmic.DataLoader("your_gaia_catalog.ecsv")
data = loader.load_data(systems=["Gaia", "TMASS"])
# Preprocess the data
preprocessor = cosmic.DataPreprocessor(data)
good_data, bad_data = preprocessor.process()
# Perform clustering
clusterer = cosmic.Clustering(good_data, bad_data)
clusterer.search(['pmra', 'pmdec', 'parallax'])
# Analyze results
analyzer = cosmic.ClusterAnalyzer(clusterer.combined_data)
analyzer.run_analysis()cosmic/
├── core/ # Clustering algorithms and core functionality
├── io/ # Data loading and I/O utilities
├── preprocess/ # Data preprocessing and cleaning
├── analysis/ # Statistical analysis and characterization
└── utils/ # General utility functions
- HDBSCAN-based clustering with persistence thresholding
- Grid search and Optuna optimization for hyperparameters
- Multiple validation metrics (relative validity, DBCV, cluster persistence)
- Support for multiple photometric systems (Gaia, 2MASS, WISE)
- Flexible column mapping and data validation
- Built-in quality filtering
- Zero-point corrections for photometry
- Proper motion corrections
- Missing value handling and outlier detection
- Statistical characterization of clusters
- Integration with external tools (Sagitta for stellar parameters)
- Comprehensive plotting and visualization
We welcome contributions! Please see our contribution guidelines for details.
git clone https://github.com/notluquis/COSMIC.git
cd COSMIC
pip install -e ".[dev]"
pytest # Run tests- API Reference (Coming Soon)
- Examples - Jupyter notebooks with usage examples
- Changelog - Version history and changes
COSMIC is licensed under the GNU Affero General Public License v3.0. This ensures that any modifications or derivative works remain open source.
- Lucas Pulgar-Escobar - Universidad de Concepción, Chile (lescobar2019@udec.cl)
- Nicolás Henríquez Salgado - Universidad de Concepción, Chile
COSMIC builds upon excellent open-source libraries:
- HDBSCAN for clustering algorithms
- Optuna for hyperparameter optimization
- Astropy for astronomical data handling
- scikit-learn for machine learning utilities
- Issues: GitHub Issues
- Email: lescobar2019@udec.cl
- Discussions: GitHub Discussions
- v0.1.0: Stable API and comprehensive documentation
- v0.2.0: PyPI distribution and additional clustering algorithms
- v1.0.0: Production-ready release with full validation suite
Citation: If you use COSMIC in your research, please cite our paper (in preparation) and acknowledge the underlying libraries.