Enable MLflow Tracking

Prerequisites

Ensure MLflow is enabled in your Code Ocean deployment. If it is, you will see the MLflow icon to enter the MLflow tracking server dashboard via the Navigation sidebar. Reach out to your Code Ocean admin if you require MLflow to be enabled.

Enable MLflow Tracking in a Capsule

Enabling MLflow tracking in a Capsule ensures that models created in that Capsule can be tracked, managed, and deployed using MLflow.

To enable MLflow tracking within your Capsule:

Open the Capsule in which you want to enable MLflow tracking.
Open the Capsule Settings panel from the top right corner.
Navigate to the MLflow tab.
Enable tracking by toggling ON “Track this Capsule”.
Add MLflow Code: Include the necessary MLflow tracking code in your Capsule’s training script. See more information below.
Run your Capsule. MLflow will automatically create a new experiment in your tracking server, and all runs will be tracked accordingly.

{% hint style="info" %} For Capsules running MLflow created prior to Code Ocean 4.2, you may need to update the MLflow package to maintain compatibility. Code Ocean 4.2 and newer versions run MLflow v3.6. {% endhint %}

To start tracking models, add the following snippets to your training code:

# python for libraries that support autologging: Fastai, Gluon, Keras, LightGBM, PyTorch, Scikit-learn, Spark, Statsmodels, XGBoost,  just use autolog

import mlflow
mlflow.autolog()

# If possible, it is recommended to add the name of the library before autolog:
# example :
import mlflow
mlflow.fastai.autolog()

# other libraries that do not support autologging can still be logged using the convention:
(for example for 'prophet'):
with mlflow.start_run():
   mlflow.prophet.log_model(model, "model")
   mlflow.log_param("seasonality_mode", params["seasonality_mode"])
   
   mlflow.log_metric("mae", mae)
   df.to_csv("data.csv", index=False)
   mlflow.log_artifact("data.csv", artifact_path="data")
   forecast.to_csv("forecast.csv", index=False)
   mlflow.log_artifact("forecast.csv", artifact_path="forecast")

Usage with R:

# Install and load the MLflow package
install.packages("mlflow")
library(mlflow)

# Start a MLflow run
mlflow_start_run()
mlflow_log_param("learning_rate", 0.01)
mlflow_log_metric("rmse", 0.02)
model <- lm(mpg ~ ., data = mtcars)
model_path <- "lm_model"
saveRDS(model, model_path)
mlflow_log_artifact(model_path)
mlflow_end_run()

Usage with JAVA:

xml
<dependency> 
<groupId>org.mlflow</groupId> <artifactId>mlflow-client</artifactId> <version>1.29.0</version> 
</dependency>
java
import org.mlflow.api.proto.Service.*;
import org.mlflow.tracking.MlflowClient;

public class MLflowExample {
public static void main(String[] args) {
MlflowClient client = new MlflowClient("http://localhost:5000");
String experimentId = client.createExperiment("MyExperiment");
RunInfo runInfo = client.createRun(experimentId);
String runId = runInfo.getRunId();

client.logParam(runId, "learning_rate", "0.01");
client.logMetric(runId, "rmse", 0.02);
client.logArtifact(runId, new File("path/to/your/model"));
client.setTerminated(runId);
}
}

It is recommended to give a run a name, by adding mlflow.start_run(run_name =“run name”), otherwise, MLflow gives each run a random name.

{% hint style="info" %} MLflow’s autolog feature automatically tracks key information from machine learning models during training, including parameters, metrics, and model artifacts, without requiring much manual coding. When using autolog, MLflow automatically captures these details for supported libraries like TensorFlow, PyTorch, and Scikit-learn. For libraries with specific autolog implementations (e.g., mlflow.sklearn.autolog()), this can provide deeper integration by logging library-specific details and configurations. However, it’s important to ensure that the library’s version is compatible with MLflow’s autologging, and to monitor for potential performance issues or unintended behavior, such as logging excessive data or missing custom metrics. {% endhint %}