Overview

This repository contains the Python code associated with the following paper:

Barry DJ, Marcotti S, Gerontogianni L and Kelly G (2025). A Statistical Framework for Robust and Reproducible BioImage Analysis. https://doi.org/10.1101/2025.02.10.637409

Get Started

The quickest and easiest way to try this code is to try it on Binder. This will allow you to reproduce the plots in the associated publication. On some occasions, you may find Binder produces the following error:

Should this occur, simply close the tab and relaunch binder from the link above.

Run On Your Own Data

You can modify the Jupyter notebooks to run on your own data. In order to do this, you will need to produce some data to analyse - for this you have two options.

Option 1

Download this repo and run the Nuclear_Localisation CellProfiler pipeline on your own images and replace the files in the cell_profiler_outputs folder. You can then use the Jupyter Notebooks companion_notebook_idr0028.ipynb or companion_notebook_idr0139.ipynb to generate plots for your own images. Both these notebooks produce similar outputs, but they have been configured to handle slightly different input data formats, specific to the requirements of the IDR0028 and IDR0139 datasets.

Option 2

You can analyse your images using any software that outputs the results of the analysis in a CSV file. Then, download this repo and use the CSV file as input for the companion_framework_notebook.ipynb notebook.

Step 1: Download the Contents of this Github Repo

A step-by-step guide to downloading the repo and running the notebooks is presented below. You only need to perform steps 1 and 2 once. Every subsequent time you want to run the code, skip straight to step 3.

Easy Way - Follow these steps if you are not familiar with Git

Click on the small arrow on the green Code button above and then click Download Zip:

When the download completes, unzip the contents. You should now have a folder that looks like this:

Below, we will use the requirements file to set up a python environment to run the Jupyter notebooks contained in the notebooks folder.

Harder Way

If you are already familiar with Git, you can obviously clone this repo like any other. However, some of the data in the inputs folder is quite large. As such, you will need to install Git LFS to download the full dataset.

Step 2: Install a Python Distribution

We recommend using conda as it's relatively straightforward and makes the management of different Python environments simple. You can install conda from here (miniconda will suffice).

Step 3: Organise Your Data

Option 1

The notebooks in this repository will only work if your own data is stuctured appopriately. If you wish to run the Nuclear_Localisation CellProfiler pipeline on your own images, the outputs must be structured in the same way as the inputs in this repository. This assumes that the raw data has originated from the Image Data Resource and has a suitable annotations file associated with (like this one, for example). Your data can then be analysed using either companion_notebook_idr0028.ipynb or companion_notebook_idr0139.ipynb. It is certainly possible to adapt the notebooks to analyse data from other sources, but a reasonable knowledge of Python coding would be required to achieve this.

Option 2

Alternatively, you can analyse your images using any software that outputs the results of the analysis in a CSV file. Then, download this repo and use the CSV file as input for the companion_framework_notebook.ipynb notebook. Ensure that your CSV file contains data in the tidy format described by Pylvänäinen et al (2025):

Step 4: Set Up Environment

Once conda is installed, open Anaconda Prompt and run the following series of commands:

conda create --name enhancing-reproducibility pip
conda activate enhancing-reproducibility
python -m pip install -r <path to this repo>/requirements.txt

where you need to replace <path to this repo> with the location on your file system where you downloaded this repo. You will be presented with a list of packages to be downloaded and installed. The following prompt will appear:

Proceed ([y]/n)?

Hit Enter and all necessary packages will be downloaded and installed - this may take some time. When complete, you can deactivate the environment you have created with the following command.

conda deactivate

You have successfully set up the necessary conda environment!

Step 5: Run The Code!

The following commands will launch a Jupyter notebook allowing you to run the code on your own data:

conda activate enhancing-reproducibility
jupyter notebook <path to this repo>/notebooks/companion_notebook.ipynb

The Jupyter Notebook should open in your browser - follow the step-by-step instructions in the notebook to run the code. If you are not familiar with Jupyter Notebooks, you can find a detailed introduction here.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
assets		assets
inputs		inputs
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Nuclear_Localisation.cppipe		Nuclear_Localisation.cppipe
README.md		README.md
explore_idr0028_data.py		explore_idr0028_data.py
img.png		img.png
img_1.png		img_1.png
requirements.txt		requirements.txt
runtime.txt		runtime.txt
subset_cp_data.py		subset_cp_data.py
subset_idr0139_data.py		subset_idr0139_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Get Started

Run On Your Own Data

Option 1

Option 2

Step 1: Download the Contents of this Github Repo

Easy Way - Follow these steps if you are not familiar with Git

Harder Way

Step 2: Install a Python Distribution

Step 3: Organise Your Data

Option 1

Option 2

Step 4: Set Up Environment

Step 5: Run The Code!

About

Uh oh!

Uh oh!

Contributors 2

Languages

License

FrancisCrickInstitute/Enhancing-Reproducibility

Folders and files

Latest commit

History

Repository files navigation

Overview

Get Started

Run On Your Own Data

Option 1

Option 2

Step 1: Download the Contents of this Github Repo

Easy Way - Follow these steps if you are not familiar with Git

Harder Way

Step 2: Install a Python Distribution

Step 3: Organise Your Data

Option 1

Option 2

Step 4: Set Up Environment

Step 5: Run The Code!

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Languages