This project extends Convolutional Neural Networks (CNNs) to Big Data, particularly focusing on genomic predictions with BGLR. Below are the steps and requirements for setting up the necessary computing environment using OpenBLAS and Python inside containers.
Clone this repository and ty running
bash ./manage_cc help
This call to the script will try to find a container and associated scripts at /filer-5/agruppen/QG/gogna. If this is not found you will get an error. See requirements section below for a solution to this.
Ensure that you have access to the container. I put one at /filer-5/agruppen/QG/gogna.
In addition, ensure the following paths are correctly set up in manage_cc:
cc_dir="../computing_containers/containers"usr_scr="../computing_containers/usr_scr"ext_lib_blas="../computing_containers/openblas_3.23/inst/qg-10.ipk-gatersleben.de/lib/libopenblas.so"# see below
Lastly, you need to create a directory for storing large volume files at /qg-10/data/AGR-QG/temp. Modify this path at line 21 in manage_cc. Easiest way to do this is
bash usr_name=$(whoami) mkdir /qg-10/data/AGR-QG/temp/${usr_name}/tut_CNN
In case you did not set up BLAS (next section), i suggest you put "#" in front of line 18 in manage_cc. You can use
bash nano manage_cc
to edit the script in the terminal itself.
By starting the container i mean the start of jupyter server. This is possible with
bash ./manage_cc start_jup
This command:
- Creates a directory called cc_data containing all session files
- Outputs a session name (e.g. jup_tut_CNN_xx) to the terminal
- Displays an access address (e.g. 127.0.0.1:xxxx) to access the server using a web browser like chrome
- Stores the Jupyter password token in ./cc_data/jup/{session_name}/run.err
For safety reasons, you need to set up a tunnel (read more at https://github.com/IPK-QG/bench_setup/blob/master/docs/computing-clusters.md). For the tunnel setup, I prefer using MobaXterm:
- Download the portable version from MobaXterm's website (https://mobaxterm.mobatek.net/download-home-edition.html)
- Launch the .exe file
- Click the "Session" button in the top left corner to connect to qg-10 for example
- Create a new session connecting to qg-10/slurm
- Use the "Tunneling" button (near the Session button) to configure your tunnel with your username and port address(es) as output when starting the container (xxxx from 127.0.0.1:xxxx)

OpenBLAS is required when running genomic predictions with BGLR. Follow these steps to install and configure OpenBLAS:
-
Start a shell session in the container. For this you need to access the jupyter server as before and go to File -> New -> Terminal to start the shell session
-
Clone the OpenBLAS repository:
git clone -b v0.3.23 https://github.com/xianyi/OpenBLAS cd OpenBLAS -
Create a symbolic link for the
libmpfr.solibrary:ln -s /usr/lib/x86_64-linux-gnu/libmpfr.so.6 ../computing_containers/lib_symlinks/libmpfr.so.4
-
Set the
LD_LIBRARY_PATHenvironment variable:export LD_LIBRARY_PATH="/qg-10/data/AGR-QG/Gogna/computing_containers/lib_symlinks:$LD_LIBRARY_PATH"
-
Build and install OpenBLAS:
make DYNAMIC_ARCH=1 make install PREFIX="inst/$(hostname)" -
Use
sessionInfo()in R to find the default BLAS library location. -
Bind the absolute path of the installed BLAS library to the path provided by
sessionInfo. For example:"${OpenBLAS_lib}:/usr/local/lib/R/lib/libRblas.so"where
OpenBLAS_lib="/qg-10/data/AGR-QG/Gogna/computing_containers/OpenBLAS/inst/qg-10.ipk-gatersleben.de/lib/libopenblas.so" -
Optionally, you can benchmark OpenBLAS using this script:
https://mac.r-project.org/benchmarks/R-benchmark-25.R
OpenBLAS completed this benchmark in 8 seconds on
qg-10.
To set up the Python environment inside the container, follow these steps:
-
Run the following command inside the container to create a requirements file:
echo -e "## for DL\n\ tensorflow==2.8\n\ tensorboard==2.8\n\ pyarrow==5.0.0\n\ matplotlib==3.5.1\n\ pandas==1.4\n\ scikit-learn==1.0.2\n\ patsy==0.5.2\n\ protobuf==3.19.6\n\ keras-tuner==1.1.3\n\ ipykernel==6.22.0\n\ ## for ML\n\ xgboost==2.0.0\n\ ## for doit\n\ graphviz==0.20\n\ doit==0.36.0\n\ pygraphviz==1.9\n\ import_deps==0.2.0" > "/proj/requirements.txt"
You can add or remove libraries as needed in the
requirements.txtfile. -
Set up the Python environment:
python3 -m venv /proj/py_env source /proj/py_env/bin/activate pip3 --no-cache-dir install -r /proj/requirements.txt -
Ensure the environment is loaded each time the container starts:
echo "source /proj/py_env/bin/activate" > /proj/.bash_profile
-
Add this environment to Jupyter:
python3 -m ipykernel install --user --name=py_env
Note: You may need to run this agian and referh if the notebook does not detect your kernel (see step 5.).
-
Refresh Jupyter and select
py_envfrom the kernel dropdown in the top-right corner of the notebook.
This setup will ensure you have a reproducible environment for your project, including all necessary dependencies for both genomic predictions with BGLR and machine learning tasks.
-
Try to run the command in a terminal inside the container
bash cd /proj/run/Py doitThis should work and output "works" on your terminal -
Try running the notebook file in notebooks directory.