Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy install #1

Open
wants to merge 233 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
233 commits
Select commit Hold shift + click to select a range
42b428e
Create .gitignore
Congm12 May 6, 2022
86a8365
Merge branch 'master' into main
Congm12 May 6, 2022
e4799e6
add HMM with NB and BetaBinom emission probabilities and site-specifi…
Congm12 May 18, 2022
ce49d30
Changed log v to per chromosome
MayaGupta27 Jun 7, 2022
a8c2c63
Minor change to backward algorithm
MayaGupta27 Jun 7, 2022
3f86daa
fix params initialization in Weighted_BetaBinomMixture to avoid out o…
Congm12 Jun 8, 2022
59617a1
fix bug in hmm_NB_sharedstates (based on hmmlearn) for multiple seque…
Congm12 Jun 8, 2022
40060c0
simplify the forward/backward lattice code to use kronecker product; …
Congm12 Jun 8, 2022
b4689e8
Merge branch 'main' of https://github.com/raphael-group/locality-clus…
Congm12 Jun 8, 2022
bdb8d84
add user defined initialization for NB and BetaBinom mean; to reduce …
Congm12 Jun 10, 2022
583c647
adjust the M step convergence criteria; add a heuristic to speed up M…
Congm12 Jun 17, 2022
5ea619b
multi-sample + allow fixing parameters
Congm12 Jul 15, 2022
32fbfb8
fix a variable naming bug
Congm12 Jul 15, 2022
0304e09
add phase switch utility functions
Congm12 Jul 21, 2022
a0621ed
update default parameters in pipeline_baum_welch; shrink input size b…
Congm12 Jul 21, 2022
ddfc6cf
address the 0 issue in RDR signal; update the initialization in pipel…
Congm12 Sep 28, 2022
2167cda
add initialize HMM RDR in log/original space option; add gurobi for f…
Congm12 Oct 6, 2022
141c246
a trick to speed up M step
Congm12 Oct 13, 2022
db6ef79
adding a scaling factor of genetic distance in computing phase switch…
Congm12 Oct 18, 2022
1087032
adding rdr-related scale factor for each clone to account for genome …
Congm12 Oct 18, 2022
caeee7f
trying making hmrf an easy-to-run script
Congm12 Oct 18, 2022
6b8aaa5
revert hmm_NB_BB_phaseswitch temporarily for git pull
Congm12 Oct 18, 2022
f8c748a
add back hmm code with scalefactor; fix the directory naming issue in…
Congm12 Oct 19, 2022
6e3de6f
add an example data for running HMM
Congm12 Oct 23, 2022
873e2f1
gaussian HMM + phase switch for hatchet2
Congm12 Oct 27, 2022
e35c902
update utility functions
Congm12 Nov 29, 2022
b067685
a full pipeline for BAF-only clones followed by using both RDR and BA…
Congm12 Jan 22, 2023
af6053f
filter DE genes, better integer copy number inference; leads to norma…
Congm12 Feb 9, 2023
5ef8887
Fixed minor error in initialization_by_gmm method
MayaGupta27 Apr 10, 2023
2148326
first phase and bin, and then run HMM without phase switch (variable …
Congm12 Apr 11, 2023
8d5c176
update to the latest hmm phase switch code
Congm12 Apr 11, 2023
8253d9f
Merge branch 'main' of https://github.com/raphael-group/locality-clus…
Congm12 Apr 11, 2023
1f8f962
parse input and output tables according to google-doc specification
Congm12 Jun 23, 2023
0953204
add a check for nan tumor purity
Congm12 Jun 23, 2023
b55c360
add an updated version of tumor UMI proportion in the mixture model; …
Congm12 Jun 23, 2023
9bf0c3b
clean up hmrf utility
Congm12 Jun 23, 2023
d2c1af9
add reordering between tissue position list, adata, and cellsnp-lite;…
Congm12 Jun 23, 2023
2fdaba4
change when to draw arrows for mirrored events
Congm12 Jun 23, 2023
db4616d
make the number of clones inferred using both BAF+RDR as a parameter
Congm12 Jun 23, 2023
5f17d24
change integer copy number inference to squared distances; add pipeli…
Congm12 Jun 23, 2023
b6aedc3
add supervised clone version; fix related bugs
Congm12 Jun 26, 2023
21fea42
add adjacency matrix construction using KNN and data parsing for slid…
Congm12 Jun 26, 2023
4b1f55c
remove an unused import dependency
Congm12 Jun 28, 2023
c50c006
update the supervised run as making pseudospot + run entire inference
Congm12 Jun 28, 2023
b025aa3
update supervised calicost and add min UMI threshold filtering in clo…
Congm12 Jul 4, 2023
4013bc8
update argparsing for joint samples
Congm12 Jul 4, 2023
92edd0e
fix the error in merging spots by min_spots or min_umis; change the f…
Congm12 Jul 4, 2023
c68937c
splitting clone spatial plot for supervised version into samples
Congm12 Jul 5, 2023
78dbdd9
addressing the NAN error when spaceranger outputs of different slices…
Congm12 Jul 6, 2023
79bd32e
add hatchet plotting and evaluation accuracy utility; fix bug related…
Congm12 Jul 11, 2023
c677109
update the variable for thresholding minimum SNP-covering umi per bin…
Congm12 Jul 16, 2023
d585720
add function to summarize to events
Congm12 Jul 16, 2023
65e1d84
update converting acn to state function in evaluation
Congm12 Jul 16, 2023
b65a85e
update post-thresholding of key events and highlighting in plots
Congm12 Jul 17, 2023
dc0a880
add plotting in joint, fix n_clone_rdr bug in processing
Congm12 Jul 26, 2023
8428c01
fix ties in merge clones by min_spots
Congm12 Jul 26, 2023
524399e
fix an edge case of merging bins across chrs
Congm12 Jul 31, 2023
569e256
add transition probability parameter for phasing
Congm12 Aug 3, 2023
d3ee6cf
add HMM transition for phasing in the input parsing step
Congm12 Aug 3, 2023
56f1c6d
update the integer copy number searching without ploidy constraint to…
Congm12 Aug 10, 2023
b8610e6
force at least 100 normal spots are used to remove bins potentially w…
Congm12 Aug 13, 2023
73a20e7
test fixing mean and betabinom dispersion in the test for bin-removal…
Congm12 Aug 13, 2023
9e7868d
add shared arg parse
Congm12 Aug 15, 2023
10b524b
fixing a bug introduced in the previous version for log phase transit…
Congm12 Aug 15, 2023
70ffdd6
removing testing for ploidy
Congm12 Aug 15, 2023
ef8f35c
import ploting utilities; multiple random initialization for initial …
Congm12 Aug 16, 2023
e064ecc
add integer copy number approach by scale factor from diploid state; …
Congm12 Aug 20, 2023
f33bdff
fix bug in arg parse and plotting functions
Congm12 Aug 27, 2023
e245a30
add accuracy evaluation for numbat, infercnv, and starch in utils
Congm12 Aug 27, 2023
f88af1b
more aggresive filter DE genes; enforce the same genomic bins to have…
Congm12 Aug 27, 2023
e7dc0b3
fix the error in merge pseudobulk about format of tumor proportion; u…
Congm12 Oct 9, 2023
f67f2a3
fix errors in edges cases; add more plotting functions
Congm12 Oct 11, 2023
46427e9
plotting 1d2d plots from dataframes
Congm12 Oct 12, 2023
9ffb83b
distribute transcript count of each gene to the overlapping bins prop…
Congm12 Oct 18, 2023
a9afa37
constrain to gene boundaries for blocking and binning
Congm12 Oct 21, 2023
dcfe7d9
fix bug in parse input; change the plotting of mirrored events to tri…
Congm12 Oct 21, 2023
cdd5c65
enable clone proportion info in plotting
Congm12 Oct 21, 2023
8802fba
fix bug in greedy binning; remove unused arguments in arg parse
Congm12 Oct 23, 2023
19cc0f6
change the schematic of mirrored event to triangles
Congm12 Oct 24, 2023
938b7ac
add purity threshold option in evaluation using hatchet functions
Congm12 Oct 24, 2023
ffd861c
wes comparison: add precision and recall of predicting aberrations bins
Congm12 Oct 25, 2023
0fde835
update expand_cnv function
Congm12 Oct 25, 2023
5331d32
remove nan of the input data frame in plotting function
Congm12 Oct 26, 2023
7db5f74
include more integer to state conversions in plot_hatchet
Congm12 Oct 28, 2023
54f46dd
update utils to convert variable length bin to even bin for plotting
Congm12 Oct 30, 2023
b3ca9f9
fix an edge case in converting variable length bin to even bin for pl…
Congm12 Oct 30, 2023
52959ce
merge from devrdr
Congm12 Nov 1, 2023
61fcc31
Create LICENSE
balabanmetin Nov 1, 2023
2fe341b
turn genetic map an input panel; update readme
Congm12 Nov 2, 2023
c2522ef
Merge branch 'main' of github.com:raphael-group/CalicoST
Congm12 Nov 2, 2023
c8ff74b
fix a bug related to genetic map file
Congm12 Nov 2, 2023
9595aaf
clean the rdr parsing functions
Congm12 Nov 7, 2023
af49274
add snakemake for snp parsing; update readme
Congm12 Nov 7, 2023
daebfd7
update readme
Congm12 Nov 7, 2023
d2cb832
organize directory structure; test running calicost in snakemake
Congm12 Nov 8, 2023
090266a
finished snakemake pipeline
Congm12 Nov 13, 2023
fff7fe4
update readme
Congm12 Nov 13, 2023
13a63e1
add example to run on a simulated data
Congm12 Nov 13, 2023
8df90fd
update google drive link and partial output spec
Congm12 Nov 13, 2023
37b2614
add more output spec
Congm12 Nov 13, 2023
6809aae
clean up a few unused files
Congm12 Nov 13, 2023
137a31b
remove prefix in conda environment
Congm12 Nov 13, 2023
7bdac2e
update readme on the test run of simulation
Congm12 Nov 13, 2023
4f0fe12
update readme and setup for dependencies
Congm12 Nov 14, 2023
64e1a55
add system requirements and runtime on example data
Congm12 Nov 14, 2023
aab0b07
remove redundant sorting in merge_bamfile
Congm12 Nov 14, 2023
0a47446
change the directory structure and allow user-input snp directory
Congm12 Nov 14, 2023
312af50
update parameter specification
Congm12 Nov 16, 2023
9c217b8
update default parameter for phasing
Congm12 Nov 29, 2023
0d2dd19
add functions for normal identification and tumor proportion estimation
Congm12 Nov 29, 2023
6b993c6
update tumor proportion estimation function
Congm12 Nov 30, 2023
31daf8e
add inferring tumor proportion module (identify LOH states, estimate …
Congm12 Dec 6, 2023
06e649c
fix bug in loh states detection
Congm12 Dec 7, 2023
4641086
update utils/plot_hatchet import step using calicost installation
Congm12 Dec 7, 2023
8291613
estimate tumor proportion using LOH
Congm12 Dec 30, 2023
7603f20
update the high-purity clone and loh state selection threshold, now u…
Congm12 Jan 20, 2024
ebebf41
add smoothing in tumor proportion estimatino
Congm12 Jan 23, 2024
d0ad5d0
add BAF-only plotting function; fix edge case of exiting
Congm12 Jan 23, 2024
b515bf5
include a user-parameter that was omitted before
Congm12 Jan 23, 2024
43b1a48
minor adjustment on plotting and input parsing
Congm12 Feb 5, 2024
376bbdb
remove redundant code in estimate tumor proportion
Congm12 Feb 15, 2024
6fa1512
make the chr id non-overlapping in copy number plot
Congm12 Feb 15, 2024
cda21a7
add a general code for diffusion model
Congm12 Feb 17, 2024
e737fe3
add a notebook for prostate data
Congm12 Feb 17, 2024
3693c8b
test iframe to show pdf
Congm12 Feb 17, 2024
6d240f3
try another visualization of pdf in notebook; update google drive lin…
Congm12 Feb 17, 2024
1a386b4
update setup to include ete3
Congm12 Feb 18, 2024
f195c3e
readthedocs files
Congm12 Feb 18, 2024
fb3da35
update conf
Congm12 Feb 18, 2024
c7ab19c
version of readthedocs.yaml
Congm12 Feb 18, 2024
b635397
merge change
Congm12 Feb 19, 2024
94db407
update docs
Congm12 Feb 19, 2024
be9be3b
add random initialization selection
Congm12 Feb 20, 2024
bc62fcd
add simulated tutorial
Congm12 Feb 20, 2024
5b2e916
add tutorial to readthedocs
Congm12 Feb 20, 2024
4d2ad0c
move default parsing params to input configurations
Congm12 Feb 26, 2024
c7f1e6c
add example data to github
Congm12 Feb 26, 2024
bd5e8c4
update readme paths for example data
Congm12 Mar 3, 2024
da36c56
fix an edge case
Congm12 Mar 15, 2024
a959923
fixx
Apr 3, 2024
ab77c40
edits
Apr 3, 2024
6f18eb4
update README
Apr 3, 2024
384b0eb
fix README
Apr 3, 2024
82986e7
fix
Apr 3, 2024
409ecc2
fix
Apr 4, 2024
47cb719
remove dependency definition from setup.py; in favour of conda.
Apr 4, 2024
0a611fd
fix
Apr 4, 2024
93a9075
slim parse input
Apr 4, 2024
02c72a0
fix
Apr 4, 2024
2f46ae1
fix
Apr 4, 2024
96d4189
facilitate script
Apr 4, 2024
4c40783
fix
Apr 4, 2024
a4424c9
mv utils
Apr 4, 2024
129b7d3
mv bash scripts
Apr 4, 2024
6a841fb
fix
Apr 4, 2024
4ae0095
fix
Apr 4, 2024
a6cc348
fix
Apr 4, 2024
bd0ed7c
fix
Apr 4, 2024
f7fd78e
update gitignore for build
Apr 4, 2024
c5c248d
fix
Apr 4, 2024
ccea02a
fix
Apr 5, 2024
6597c43
fix
Apr 5, 2024
899f152
fix
Apr 5, 2024
3ab7043
fix
Apr 5, 2024
ab35209
fix
Apr 5, 2024
ca46cb6
fix
Apr 5, 2024
79d417c
fix
Apr 5, 2024
6c26f13
fix
Apr 5, 2024
6a260a9
fix
Apr 5, 2024
3e8118b
fix
Apr 5, 2024
495b12b
fix
Apr 5, 2024
4f259d9
fix
Apr 5, 2024
62c62e1
fix
Apr 5, 2024
c034c1c
fix
Apr 5, 2024
1104cc1
fix
Apr 9, 2024
86e9b96
fix
Apr 9, 2024
7d032c2
fix
Apr 9, 2024
bd159a8
fix
Apr 9, 2024
30463c6
fix
Apr 9, 2024
a4a4fb7
fix
Apr 9, 2024
152321e
fix
Apr 9, 2024
8cc3fcd
fix
Apr 9, 2024
32ef6e0
fix
Apr 9, 2024
649230d
fix
Apr 9, 2024
188bd64
fix
Apr 9, 2024
b49185f
fix
Apr 9, 2024
6ea4780
fix
Apr 10, 2024
2bc074d
eagle and cellsnp-lite containers
Apr 10, 2024
1ff0da0
fix
Apr 10, 2024
5f180ee
fix
Apr 10, 2024
9ccf9bb
fix
Apr 10, 2024
b624cc5
fix
Apr 10, 2024
7742651
fix
Apr 10, 2024
904017d
fix
Apr 10, 2024
4e37e9d
add tests
Apr 10, 2024
dabbc55
fix
Apr 10, 2024
c65f51b
fix
Apr 10, 2024
854a4dc
fix
Apr 10, 2024
b584612
fix
Apr 10, 2024
3cb34bc
fix
Apr 10, 2024
d79a029
fix
Apr 10, 2024
4b5b418
fix
Apr 10, 2024
c0a47f6
fix
Apr 10, 2024
8b64936
fix
Apr 10, 2024
a0a94f1
fix
Apr 10, 2024
4e78c14
fix
Apr 10, 2024
a3943e4
fix
Apr 10, 2024
3404505
fix
Apr 10, 2024
15dabc6
fix
Apr 10, 2024
0022a5e
fix
Apr 10, 2024
d6cf14b
fix
Apr 10, 2024
b817abb
fix
Apr 10, 2024
81c312c
fix
Apr 10, 2024
7cc12a6
fix
Apr 10, 2024
4eefe90
fix
Apr 10, 2024
b913ef6
fix
Apr 11, 2024
0aa0787
fix
Apr 11, 2024
43947b1
fix
Apr 11, 2024
57ef533
fix
Apr 11, 2024
da950f5
fix
Apr 11, 2024
44b28fb
fix
Apr 11, 2024
180d538
fix
Apr 11, 2024
9434c7c
fix
Apr 11, 2024
afdb697
fix
Apr 11, 2024
2b2587a
fix
Apr 11, 2024
3799659
fix
Apr 11, 2024
792a935
fix
Apr 11, 2024
a8f42cb
fix
Apr 11, 2024
94fded9
fix
Apr 11, 2024
892d6ca
fix
Apr 11, 2024
85cc093
fix
Apr 12, 2024
608242b
fix
Apr 12, 2024
6454851
fix
Apr 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# Apptainer build containers
*.sif

#
build/
containers/deprecated/

.snakemake/
21 changes: 21 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.10"

sphinx:
builder: html
configuration: docs/conf.py
fail_on_warning: false

python:
install:
- method: pip
path: .
extra_requirements: [docs]

submodules:
include: [docs/notebooks]
recursive: true
28 changes: 28 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
BSD 3-Clause License

Copyright (c) 2023, Princeton University

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 changes: 26 additions & 0 deletions NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
### Questions
- Why is it necessary to download the SNP panel and phasing panel for an example that provides the reduced matrices?
- wisdom of naming random realizations calicost/clone* etc.?
- in config, phasing_panel -> phasing_panel_dir
- wildcards: outputdir=calicost? NB prevents target rules.
- GRCh38_resources: what are these and were are they found for other references?
- genetic_map_GRCh38_merged.tab: tabix index?
- deprecate bin/* bash scripts. These aren't used.
- what resolves outputdir variable?
- startle snakemake rule?
- sandbox deprecation

### Warnings
- convert_params defined multiple times [hmm_NB_sharedstates.py, utils_distribution_fitting.py]
- compute_posterior_transition_sitewise defined multiple times [hmm_gaussian.py, utils_hmm.py]
- compute_posterior_obs [hmm_gaussian.py, utils_hmm.py]
- update_startprob_sitewise [hmm_gaussian.py, utils_hmm.py]
- np_sum_ax_squeeze [hmm_gaussian.py, utils_hmm.py]
- mylogsumexp [hmm_gaussian.py, utils_hmm.py]
- update_transition_sitewise [hmm_gaussian.py, utils_hmm.py]


### NOTES
- How to pin conda: http://damianavila.github.io/blog/posts/how-to-pin-conda.html
- Apple silicon installs can be facilitated with Rosetta emulation of the x86 instruction set, see e.g. [here](https://taylorreiter.github.io/2022-04-05-Managing-multiple-architecture-specific-installations-of-conda-on-apple-M1/) - note, brew install iterm2 as duplication of the terminal app. is no longer supported.
- poetry config virtualenvs.prefer-active-python false
161 changes: 159 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,159 @@
# Locality clustering for CNV inference
A repo for the locality clustering (HMM) module in CNV calling across various data modalities.
# CalicoST

<p align="center">
<img src="https://github.com/raphael-group/CalicoST/blob/main/docs/_static/img/overview4_combine.png?raw=true" width="100%" height="auto"/>
</p>

[CalicoST](https://www.biorxiv.org/content/10.1101/2024.03.09.584244v1) is a probabilistic model that infers allele-specific copy number aberrations and tumor phylogeography from spatially resolved transcriptomics. CalicoST has the following key features:
1. Identifies allele-specific integer copy numbers for each transcribed region, revealing events such as Copy Neutral Loss of Heterozygosity (CNLOH) and mirrored subclonal CNAs that are invisible to total copy number analysis.
2. Assigns each spot a clone label indicating whether the spot is primarily normal cells or a cancer clone with an aberration copy number profile.
3. Infers a phylogeny relating the identified cancer clones as well as a phylogeography that combines genetic evolution and spatial dissemination of clones.
4. Handles normal cell admixture in SRT technologies that are not single-cell resolution (e.g. 10x Genomics Visium) to ensure more accurate allele-specific copy numbers and cancer clones.
5. Simultaneously analyzes multiple regions or aligned SRT slices from the same tumor.

# System requirements
The package has been tested on the following Linux operating systems: SpringdaleOpenEnterprise 9.2 (Parma) and CentOS Linux 7 (Core).

# Installation
First, setup a conda environment from the `environment.yml` file:
```
cd CalicoST

conda env create -f environment.yml --name calicost_env

conda activate calicost
```
Next download [Eagle2](https://alkesgroup.broadinstitute.org/Eagle/) by
```
wget https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/Eagle_v2.4.1.tar.gz
tar -xzf Eagle_v2.4.1.tar.gz
```
Next, we need to install [Startle](https://github.com/raphael-group/startle). Its dependencies
include [LEMON](https://lemon.cs.elte.hu/trac/lemon/wiki/InstallLinux), [CPLEX](https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer),
perl and python3. We do so with conda
```
conda install -c schmidt73 startle
```
or by building from source,
```
git clone --recurse-submodules https://github.com/raphael-group/startle.git

cd startle

mkdir build; cd build

cmake -DLIBLEMON_ROOT=<lemon path>\
-DCPLEX_INC_DIR=<cplex include path>\
-DCPLEX_LIB_DIR=<cplex lib path>\
-DCONCERT_INC_DIR=<concert include path>\
-DCONCERT_LIB_DIR=<concert lib path>\
..
make
```
Note this will install a copy of cellsnp-lite to the environment directory, which must be updated
in the config.yaml, i.e. with the new path to the installed cellsnp-lite.

Finally, install CalicoST using pip in the root directory with
```
pip install --no-deps -e .
```
Setting up the conda environments takes around 10 minutes on an HPC head node. Make sure to use the
[mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) backend to ensure
fast builds.

# Getting started
CalicoST requires annotations for genes and SNPs - these are available for the GRCh38 reference are available in the [example tarballs](https://github.com/raphael-group/CalicoST/tree/main/examples). Specify the information file paths, your input SRT data paths, and running configurations in `config.yaml`, and then you can run CalicoST by
```
snakemake --cores <number cores> --configfile config.yaml --snakefile calicost.smk all (--use-singularity) (--use-conda)
```

Check out our [readthedocs](https://calicost.readthedocs.io/en/latest/) for tutorials on the simulated data and prostate cancer data.

# Run on a simulated example data
### Download data
The simulated count matrices are available from [`examples/CalicoST_example.tar.gz`](https://github.com/raphael-group/CalicoST/blob/main/examples/CalicoST_example.tar.gz).
CalicoST requires a reference SNP panel and phasing panel, which can be downloaded directly with wget and the links below
* [SNP panel](https://downloads.sourceforge.net/project/cellsnp/SNPlist/genome1K.phase3.SNP_AF5e4.chr1toX.hg38.vcf.gz?ts=gAAAAABmDbjZ1jaoDHw8fbmTQcP1y_WA9KfnTJH3aLrm0O7S4voV89YyU55O3jJdtO163_SpSBquChmB7dIl4dZ7pB-64L-W8A%3D%3D) - 0.5GB in size.
* [Phasing panel](http://pklab.med.harvard.edu/teng/data/1000G_hg38.zip) - 9.0GB in size.

Other SNP panels are available at [cellsnp-lite webpage](https://cellsnp-lite.readthedocs.io/en/latest/main/data.html).

### Run CalicoST
Gunzip the downloaded example data and replace the following paths in the provide `example_config.yaml` with those on your machine,
* region_vcf: the path to the downloaded SNP panel vcf.
* phasing_panel: the path to the downloaded and unzipped phasing panel directory.
* spaceranger_dir: the path to Visium Space Ranger output directory.

To avoid falling into local maxima in CalicoST's optimization objective, we recommend running CalicoST with multiple random initializations that are specified by the `random_state` variable in each config. Those provided use five random initializations, but this may be made smaller in testing.

Finally, run CalicoST with
```
snakemake --cores <number cores> --configfile example_config.yaml --snakefile <calicost_dir>/calicost.smk all
```
CalicoST takes just ove an hour to complete this example with 5 cores.

### Understanding the results
The above snakemake run will create a folder `calicost` in the directory of downloaded example data. Within this folder, each random initialization of CalicoST generates a subdirectory of `calicost/clone*`.

CalicoST generates the following key files for each random initialization:
* clone_labels.tsv: The inferred clone labels for each spot.
* cnv_seglevel.tsv: Allele-specific copy numbers for each clone for each genome segment.
* cnv_genelevel.tsv: The projected allele-specific copy numbers from genome segments to the covered genes.
* cnv_*_seglevel.tsv and cnv_*_genelevel.tsv: Allele-specific copy numbers when enforcing a ploidy of {diploid, triploid, tetraploid} for each genome segment & each gene.

See the following examples of these key files.
```
head -n 10 calicost/clone3_rectangle0_w1.0/clone_labels.tsv
BARCODES clone_label
spot_0 2
spot_1 2
spot_2 2
spot_3 2
spot_4 2
spot_5 2
spot_6 2
spot_7 2
spot_8 0
```

```
head -n 10 calicost/clone3_rectangle0_w1.0/cnv_seglevel.tsv
CHR START END clone0 A clone0 B clone1 A clone1 B clone2 A clone2 B
1 1001138 1616548 1 1 1 1 1 1
1 1635227 2384877 1 1 1 1 1 1
1 2391775 6101016 1 1 1 1 1 1
1 6185020 6653223 1 1 1 1 1 1
1 6785454 7780639 1 1 1 1 1 1
1 7784320 8020748 1 1 1 1 1 1
1 8026738 9271273 1 1 1 1 1 1
1 9292894 10375267 1 1 1 1 1 1
1 10398592 11922488 1 1 1 1 1 1
```

```
head -n 10 calicost/clone3_rectangle0_w1.0/cnv_genelevel.tsv
gene clone0 A clone0 B clone1 A clone1 B clone2 A clone2 B
A1BG 1 1 1 1 1 1
A1CF 1 1 1 1 1 1
A2M 1 1 1 1 1 1
A2ML1-AS1 1 1 1 1 1 1
AACS 1 1 1 1 1 1
AADAC 1 1 1 1 1 1
AADACL2-AS1 1 1 1 1 1 1
AAK1 1 1 1 1 1 1
AAMP 1 1 1 1 1 1
```

CalicoST creates the following plots to visualize the spatial distribution of the inferred cancer clones and allele-specific copy number profiles for each realization.
* ```plots/clone_spatial.pdf```: The spatial distribution of inferred cancer clones and normal regions - grey color, clone 0 by default.
* ```plots/rdr_baf_defaultcolor.pdf```: The Read Depth Ratio (RDR) and B-Allele Frequency (BAF) along the genome for each clone. Higher RDR indicates higher total copy numbers and BAF deviations from 0.5 indicates allelic imbalance due to CNAs.
* ```plots/acn_genome.pdf```: The default allele-specific copy numbers along the genome.
* ```plots/acn_genome_*.pdf```: Allele-specific copy numbers when enforcing a ploidy of {diploid, triploid, tetraploid}.

The allele-specific copy number plots have the following color legend.
<p align="left">
<img src="https://github.com/raphael-group/CalicoST/blob/main/docs/_static/img/acn_color_palette.png?raw=true" width="20%" height="auto"/>
</p>

# Software dependencies
TODO
Loading