Software Engineering for Machine Learning are techniques and guidelines for building ML applications that do not concern the core ML problem -- e.g. the development of new algorithms -- but rather the surrounding activities like data ingestion, coding, testing, versioning, deployment, quality control, and team collaboration. Good software engineering practices enhance development, deployment and maintenance of production level applications using machine learning components.
⭐ Must-read
🎓 Scientific publication
Based on this literature, we compiled a survey on the adoption of software engineering practices for applications with machine learning components.
Feel free to take and share the survey and to read more!
- Broad Overviews
- Data Management
- Model Training
- Deployment and Operation
- Social Aspects
- Governance
- Tooling
These resources cover all aspects.
- AI Engineering: 11 Foundational Practices ⭐
- Best Practices for Machine Learning Applications
- Engineering Best Practices for Machine Learning ⭐
- Hidden Technical Debt in Machine Learning Systems 🎓⭐
- Rules of Machine Learning: Best Practices for ML Engineering ⭐
- Software Engineering for Machine Learning: A Case Study 🎓⭐
How to manage the data sets you use in machine learning.
- A Survey on Data Collection for Machine Learning A Big Data - AI Integration Perspective_2019 🎓
- Automating Large-Scale Data Quality Verification 🎓
- Data management challenges in production machine learning
- Data Validation for Machine Learning 🎓
- How to organize data labelling for ML
- The curse of big data labeling and three ways to solve it
- The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets 🎓
- The ultimate guide to data labeling for ML
How to organize your model training experiments.
- 10 Best Practices for Deep Learning
- Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement 🎓
- Fairness On The Ground: Applying Algorithmic FairnessApproaches To Production Systems🎓
- How do you manage your Machine Learning Experiments?
- Machine Learning Testing: Survey, Landscapes and Horizons 🎓
- Nitpicking Machine Learning Technical Debt
- On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach 🎓⭐
- On human intellect and machine failures: Troubleshooting integrative machine learning systems 🎓
- Pitfalls and Best Practices in Algorithm Configuration 🎓
- Pitfalls of supervised feature selection 🎓
- Preparing and Architecting for Machine Learning
- Preliminary Systematic Literature Review of Machine Learning System Development Process 🎓
- Software development best practices in a deep learning environment
- Testing and Debugging in Machine Learning
- What Went Wrong and Why? Diagnosing Situated Interaction Failures in the Wild 🎓
How to deploy and operate your models in a production environment.
- Best Practices in Machine Learning Infrastructure
- Building Continuous Integration Services for Machine Learning 🎓
- Continuous Delivery for Machine Learning ⭐
- Continuous Training for Production ML in the TensorFlow Extended (TFX) Platform 🎓
- Fairness Indicators: Scalable Infrastructure for Fair ML Systems 🎓
- Machine Learning Logistics
- Machine learning: Moving from experiments to production
- ML Ops: Machine Learning as an engineered disciplined
- Model Governance Reducing the Anarchy of Production 🎓
- ModelOps: Cloud-based lifecycle management for reliable and trusted AI
- Operational Machine Learning
- Scaling Machine Learning as a Service🎓
- TFX: A tensorflow-based Production-Scale ML Platform 🎓
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction 🎓
- Underspecification Presents Challenges for Credibility in Modern Machine Learning 🎓
- Versioning for end-to-end machine learning pipelines 🎓
How to organize teams and projects to ensure effective collaboration and accountability.
- Data Scientists in Software Teams: State of the Art and Challenges 🎓
- Machine Learning Interviews
- Managing Machine Learning Projects
- Principled Machine Learning: Practices and Tools for Efficient Collaboration
- A Human-Centered Interpretability Framework Based on Weight of Evidence 🎓
- An Architectural Risk Analysis Of Machine Learning Systems
- Beyond Debiasing
- Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing 🎓
- Inherent trade-offs in the fair determination of risk scores 🎓
- Responsible AI practices ⭐
- Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
- Understanding Software-2.0 🎓
Tooling can make your life easier.
We only share open source tools, or commercial platforms that offer substantial free packages for research.
- Aim - Aim is an open source experiment tracking tool.
- Airflow - Programmatically author, schedule and monitor workflows.
- Alibi Detect - Python library focused on outlier, adversarial and drift detection.
- Archai - Neural architecture search.
- Data Version Control (DVC) - DVC is a data and ML experiments management tool.
- Facets Overview / Facets Dive - Robust visualizations to aid in understanding machine learning datasets.
- FairLearn - A toolkit to assess and improve the fairness of machine learning models.
- Git Large File System (LFS) - Replaces large files such as datasets with text pointers inside Git.
- Great Expectations - Data validation and testing with integration in pipelines.
- HParams - A thoughtful approach to configuration management for machine learning projects.
- Kubeflow - A platform for data scientists who want to build and experiment with ML pipelines.
- Label Studio - A multi-type data labeling and annotation tool with standardized output format.
- LiFT - Linkedin fairness toolkit.
- MLFlow - Manage the ML lifecycle, including experimentation, deployment, and a central model registry.
- Model Card Toolkit - Streamlines and automates the generation of model cards; for model documentation.
- Neptune.ai - Experiment tracking tool bringing organization and collaboration to data science projects.
- Neuraxle - Sklearn-like framework for hyperparameter tuning and AutoML in deep learning projects.
- OpenML - An inclusive movement to build an open, organized, online ecosystem for machine learning.
- PyTorch Lightning - The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
- REVISE: REvealing VIsual biaSEs - Automatically detect bias in visual data sets.
- Robustness Metrics - Lightweight modules to evaluate the robustness of classification models.
- Seldon Core - An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models on Kubernetes.
- Spark Machine Learning - Spark’s ML library consisting of common learning algorithms and utilities.
- TensorBoard - TensorFlow's Visualization Toolkit.
- Tensorflow Extended (TFX) - An end-to-end platform for deploying production ML pipelines.
- Tensorflow Data Validation (TFDV) - Library for exploring and validating machine learning data. Similar to Great Expectations, but for Tensorflow data.
- Weights & Biases - Experiment tracking, model optimization, and dataset versioning.
Contributions welcomed! Read the contribution guidelines first