Skip to content

Conversation

@ChristinaXu2017
Copy link
Collaborator

@ChristinaXu2017 ChristinaXu2017 commented Oct 28, 2025

This pull request will change the way to install Variant spark:

  1. install Hail through pip install hail==0.2.74
  • our scala package of variant spark is not required hail dependency, eg run examples/command-line/local_run-importance-ch22.sh without Hail installed
  • the package au.csiro.aehrc.third.hail-is was the copy of Hail=0.2.74, according to dev/misc-deploy-hail-to-maven.sh
  • the only scala class: src/main/scala/au/csiro/variantspark/hail/methods/RFModel.scala requires Hail, which is only called through python code: "import varspark.hail as vshl; vshl.init() ..."
  • pip install hail==0.2.74 will install /usr/local/lib/python3.7/dist-packages/hail/; and add to java path during vshl.init()
  1. install variant spark through pip install variant-spark
  2. create a Dockerfile where jdk8, python3.7, spark-3.1.2-bin-hadoop3.2, hail==0.2.74 and pyspark==3.1.2 will be installed.

??? trouble shooting: RUN wget -q https://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz it is so slow. on 28Oct, I tried wget in my local pc, it took nearly one hour to download it ( 218M). Maybe the archive.apach.org was too busy at that time.

@ChristinaXu2017 ChristinaXu2017 marked this pull request as draft October 28, 2025 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants