Tutorial for runnning a CNN for genomic preidiction

This project extends Convolutional Neural Networks (CNNs) to Big Data, particularly focusing on genomic predictions with BGLR. Below are the steps and requirements for setting up the necessary computing environment using OpenBLAS and Python inside containers.

First steps

Clone this repository and ty running bash ./manage_cc help

This call to the script will try to find a container and associated scripts at /filer-5/agruppen/QG/gogna. If this is not found you will get an error. See requirements section below for a solution to this.

Requirements

Ensure that you have access to the container. I put one at /filer-5/agruppen/QG/gogna.

In addition, ensure the following paths are correctly set up in manage_cc:

cc_dir="../computing_containers/containers"
usr_scr="../computing_containers/usr_scr"
ext_lib_blas="../computing_containers/openblas_3.23/inst/qg-10.ipk-gatersleben.de/lib/libopenblas.so" # see below

Lastly, you need to create a directory for storing large volume files at /qg-10/data/AGR-QG/temp. Modify this path at line 21 in manage_cc. Easiest way to do this is bash usr_name=$(whoami) mkdir /qg-10/data/AGR-QG/temp/${usr_name}/tut_CNN

In case you did not set up BLAS (next section), i suggest you put "#" in front of line 18 in manage_cc. You can use bash nano manage_cc to edit the script in the terminal itself.

Starting the container

By starting the container i mean the start of jupyter server. This is possible with bash ./manage_cc start_jup This command:

Creates a directory called cc_data containing all session files
Outputs a session name (e.g. jup_tut_CNN_xx) to the terminal
Displays an access address (e.g. 127.0.0.1:xxxx) to access the server using a web browser like chrome
Stores the Jupyter password token in ./cc_data/jup/{session_name}/run.err

For safety reasons, you need to set up a tunnel (read more at https://github.com/IPK-QG/bench_setup/blob/master/docs/computing-clusters.md). For the tunnel setup, I prefer using MobaXterm:

Download the portable version from MobaXterm's website (https://mobaxterm.mobatek.net/download-home-edition.html)
Launch the .exe file
Click the "Session" button in the top left corner to connect to qg-10 for example
Create a new session connecting to qg-10/slurm
Use the "Tunneling" button (near the Session button) to configure your tunnel with your username and port address(es) as output when starting the container (xxxx from 127.0.0.1:xxxx)

OpenBLAS Setup

OpenBLAS is required when running genomic predictions with BGLR. Follow these steps to install and configure OpenBLAS:

Start a shell session in the container. For this you need to access the jupyter server as before and go to File -> New -> Terminal to start the shell session

Clone the OpenBLAS repository:

git clone -b v0.3.23 https://github.com/xianyi/OpenBLAS
cd OpenBLAS

Create a symbolic link for the libmpfr.so library:

ln -s /usr/lib/x86_64-linux-gnu/libmpfr.so.6 ../computing_containers/lib_symlinks/libmpfr.so.4

Set the LD_LIBRARY_PATH environment variable:

export LD_LIBRARY_PATH="/qg-10/data/AGR-QG/Gogna/computing_containers/lib_symlinks:$LD_LIBRARY_PATH"

Build and install OpenBLAS:

make DYNAMIC_ARCH=1
make install PREFIX="inst/$(hostname)"

Use sessionInfo() in R to find the default BLAS library location.
Bind the absolute path of the installed BLAS library to the path provided by sessionInfo. For example:
```
"${OpenBLAS_lib}:/usr/local/lib/R/lib/libRblas.so"
```
where OpenBLAS_lib="/qg-10/data/AGR-QG/Gogna/computing_containers/OpenBLAS/inst/qg-10.ipk-gatersleben.de/lib/libopenblas.so"
Optionally, you can benchmark OpenBLAS using this script:
```
https://mac.r-project.org/benchmarks/R-benchmark-25.R
```
OpenBLAS completed this benchmark in 8 seconds on qg-10.

Python Environment Setup

To set up the Python environment inside the container, follow these steps:

Run the following command inside the container to create a requirements file:

echo -e "## for DL\n\
tensorflow==2.8\n\
tensorboard==2.8\n\
pyarrow==5.0.0\n\
matplotlib==3.5.1\n\
pandas==1.4\n\
scikit-learn==1.0.2\n\
patsy==0.5.2\n\
protobuf==3.19.6\n\
keras-tuner==1.1.3\n\
ipykernel==6.22.0\n\
## for ML\n\
xgboost==2.0.0\n\
## for doit\n\
graphviz==0.20\n\
doit==0.36.0\n\
pygraphviz==1.9\n\
import_deps==0.2.0" > "/proj/requirements.txt"

You can add or remove libraries as needed in the requirements.txt file.

Set up the Python environment:

python3 -m venv /proj/py_env
source /proj/py_env/bin/activate
pip3 --no-cache-dir install -r /proj/requirements.txt

Ensure the environment is loaded each time the container starts:

echo "source /proj/py_env/bin/activate" > /proj/.bash_profile

Add this environment to Jupyter:
```
python3 -m ipykernel install --user --name=py_env
```
Note: You may need to run this agian and referh if the notebook does not detect your kernel (see step 5.).
Refresh Jupyter and select py_env from the kernel dropdown in the top-right corner of the notebook.

This setup will ensure you have a reproducible environment for your project, including all necessary dependencies for both genomic predictions with BGLR and machine learning tasks.

Using this directory

Try to run the command in a terminal inside the container bash cd /proj/run/Py doit This should work and output "works" on your terminal
Try running the notebook file in notebooks directory.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
notebooks		notebooks
run/Py		run/Py
src/Py		src/Py
README.md		README.md
manage_cc		manage_cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tutorial for runnning a CNN for genomic preidiction

First steps

Requirements

Starting the container

OpenBLAS Setup

Python Environment Setup

Using this directory

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tutorial for runnning a CNN for genomic preidiction

First steps

Requirements

Starting the container

OpenBLAS Setup

Python Environment Setup

Using this directory

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages