This benchmarking study evaluates the performance of HiCARN and Capricorn using the same input dataset: primary interchromosomal GM12878 Hi-C data.
We followed the official HiCARN tutorial with minor modifications to allow chromosome-by-chromosome processing, specifically during the read extraction and downsampling steps.
- From our downsampled matrices, we generated input datasets for the GM12878 cell line using default parameters.
- These datasets consist of low-resolution matrices and are stored at: ./HiCARN/data/
-
We used the pre-trained HiCARN1 and HiCARN2 models (16× downsampling) provided by the authors: ./HiCARN/Pretrained_Weights/
-
No training or validation was performed.
-
We ran inference using 40×40 patch-based prediction mode, applying each model to its corresponding dataset.
-
Example: For the 16× downsampled matrices, we used
hicarn_1_16ds.pytorchorhicarn_2_16ds.pytorch.
- Final predicted matrices are stored in: ./HiCARN/predict/
For Capricorn, we also followed a chromosome-by-chromosome approach as outlined in their GitHub repository.
- Input data was preprocessed using default parameters (10 kb resolution and 16× downsampling ratio).
- Downsampled data was transformed using the recommended multichannel strategy: HiC OE, TAD, Lp, and Lr.
- Test cross-chromosome datasets were then generated as input for prediction: ./Capricorn/data/
- As with HiCARN, no training or validation was done.
- Finally, for the prediction step, we used the available pre-trained weights (
best_loss.pt): ./Capricorn/checkpoints/
- Final predicted matrices from the previous step are stored in: ./Capricorn/predict/
For both methods, we recovered the reconstructed matrices for each of our tested regions from the corresponding prediction outputs using helper functions defined in : ./src/utils.py
Please refer to the utils.py file for details on region extraction and matrix formatting. • capricorn_get_region • hicarn_get_region