The epiGenomic Efficient Correlator tool is designed to efficiently perform pairwise correlations of thousands of epigenomic datasets. It supports a growing number of file formats and offers the possibility to compute correlations at any resolution on custom or predefined filtered regions. Please refer to the following paper for a description of epiGeEC : https://academic.oup.com/bioinformatics/article/35/4/674/5058096
A galaxy implementation including thousands of pre-computed public datasets is available at http://epigeec.genap.ca/galaxy/ and also includes support for the WIG format and Spearman correlation. It also offers tools for further annotation and analysis of the matrix files created by epiGeEC.
Precompiled wheels are available for Linux (x86_64, aarch64) and macOS (x86_64, arm64) on PyPI.
pip install epigeecThis will automatically install all required dependencies. If you encounter issues, please see the troubleshooting steps section.
Most users can simply run pip install epigeec. If pip fails to build dependencies like h5py, install system headers or use a modern Python environment (e.g. via uv or conda) to ensure compatible wheels are found.
The process is done in 2 or 3 steps, conversion to hdf5, filtering(optional) and correlation.
The signal files (bedgraph, wig or bigwig) need to first be converted to hdf5 format, this will require a chromSizes file(available here or from UCSC) for the assembly used by your signal files. The chromSizes file can be truncated. For example, keeping only canonical chromosomes will work even if the bigwig countains non-canonical chromosomes. You will also need to choose a resolution, we suggest a resolution of 1000 or 10000 base pair to obtain biologically interesting results.
The hdf5 files can be filtered over certain regions (such as regions corresponding to genes) using your own bed files or those available here.
The final step, the correlation itself, will require a list of all hdf5 files to be correlated(one file path per line) as well as the chromSizes file used to generate the hdf5 files. It is not possible to correlate files from different assemblies.
For more info on each parameter use the help flag
epigeec [tool] --helpConversion of a signal file to the hdf5 format
usage: epigeec to_hdf5 [-h] (-bw | -bg) signalFile chromSizes resolution outHdf5Filter an hdf5 file (optional)
usage: epigeec filter [-h] [--select SELECT] [--exclude EXCLUDE] hdf5 chromSizes outHdf5Generate an NxN Pearson correlation matrix
usage: epigeec correlate [-h] [--concat] [--name] hdf5List chromSizes outMatrix List of assemblies and filters offered in the resource folder:
- hg19
- blklst: blacklisted regions from here
- gene: regions corresponding to genes (from refSeq annotation)
- tss: transcription sites (from refSeq annotation)
- hg38
- gene: regions corresponding to genes (from refSeq annotation)
- tss: transcription sites (from refSeq annotation)
- mm10
- blklst: blacklisted regions from here
- saccer3
Create a directory structure to hold the data
myfolder
├── signal
├── hdf5
├── filtered
└── resource
Start running the tools
epigeec to_hdf5 -bw signal/myfile.bw resource/chrom_sizes 1000 hdf5/myfile.hdf5
epigeec filter hdf5/myfile.hdf5 resource/chrom_sizes filtered/myfile.hdf5 -s resource/sel.bed -e resource/excl.bed
epigeec correlate filtered_list resource/chrom_sizes mymatrix.mat --name 5dts_1kb The output is a tab separated matrix file with your correlations
5dts_1kb file1 file2 file3 file4 file5
file1 1.0000 0.0225 0.0579 0.0583 0.0603
file2 0.0225 1.0000 0.0625 0.0523 0.0642
file3 0.0579 0.0625 1.0000 0.7535 0.7917
file4 0.0583 0.0523 0.7535 1.0000 0.7754
file5 0.0603 0.0642 0.7917 0.7754 1.0000
Using an isolated Python environment helps avoid version conflicts:
Using venv (built-in):
python3 -m venv env
source env/bin/activate
pip install epigeecUsing uv (fastest option):
uv venv
source .venv/bin/activate
uv pip install epigeecUsing conda:
conda create -n epigeec python=3.11
conda activate epigeec
pip install epigeecpip will usually install all dependencies automatically.
However, installation can fail on some systems if binary wheels are unavailable or incompatible (e.g., older distributions, minimal Docker images, or systems missing compilers or libraries).
Common solutions:
-
Update pip and setuptools:
pip install -U pip setuptools wheel
-
Install missing system libraries (Debian/Ubuntu):
sudo apt install python3-dev libhdf5-dev
-
Manually install dependencies:
pip install -U numpy pandas h5py
If pip is not available on your system:
Debian/Ubuntu:
sudo apt install python3-pipmacOS (via Homebrew):
brew install pythonIf installation via pip fails (e.g., missing wheels or system compilers), you can build epiGeEC from source.
Prerequisites (Linux/macOS):
You may need to install development tools first:
# Debian/Ubuntu
sudo apt install build-essential python3-dev cmake libhdf5-dev
# macOS (Homebrew)
brew install cmake hdf5git clone https://github.com/rabyj/epigeec.git
cd epigeecpython3 -m venv env
source env/bin/activate
pip install setuptools wheelcmake .
make -j $(nproc --all) # build with all available cores
pip install .