Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility Service contribution #17

Open
wants to merge 3 commits into
base: stable
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions compss/runtime/scripts/system/chameleon/chameleon_init
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ extra_configure() {
____ ___ __ __ ____ ____
/ ___/ _ \\| \\/ | _ \\/ ___| ___
| | | | | | |\\/| | |_) \\___ \\/ __|
| |__| |_| | | | | __/ ___) \\__ \\
| |__| |_| | | | | __/ ___) \\__ \\
\\ \\____\\___/|_| |_|_| |____/|___/

Welcome to COMPSs v2.1 at Chameleon!
Welcome to COMPSs v3.1 at Chameleon!
EOT

echo "127.0.1.1 COMPSsMaster" >> /etc/hosts
Expand Down
103 changes: 52 additions & 51 deletions compss/runtime/scripts/utils/chameleon_cluster_setup
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash -e
#!/bin/bash -e

#=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#
Expand All @@ -12,20 +12,14 @@
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


# Setting up COMPSs_HOME
if [ -z "${COMPSS_HOME}" ]; then
COMPSS_HOME="$( cd "$( dirname "${BASH_SOURCE[0]}" )"/../../.. && pwd )/"
fi
if [ ! "${COMPSS_HOME: -1}" = "/" ]; then
COMPSS_HOME="${COMPSS_HOME}/"
fi
export COMPSS_HOME=${COMPSS_HOME}
# Setting up COMPSS_HOME
export COMPSS_HOME="/opt/COMPSs/"

##########################################################
# Script variables
user=cc
instanceCreationTime=10 # Iterations over 30s
sshUpTime=8 # Iterations over 30s
sshUpTime=8 # Iterations over 30s
randomID=$RANDOM
tmpFile=/tmp/compss-workers-${randomID}.tmp
HALF_MIN=30s
Expand All @@ -38,30 +32,29 @@
sleep 2s

# Prompt messages to get information
echo "Provide the name of the COMPSs Master Instance (this instance):"
read -r masterName
echo "Provide the reservation ID to deploy COMPSs:"
read -r reservationId
echo "Provide the number of COMPSs Workers:"
read -r numWorkers
echo " "
read -rp "Provide the name of the COMPSs Master Instance (this instance): " masterName
read -rp "Provide the reservation ID to deploy COMPSs: " reservationId
read -rp "Provide the number of COMPSs Workers: " numWorkers
echo "Type 1 if connected via fabnetv4 network otherwise type 2 if connected via sharednet1"
read choice

NETWORK="sharednet1"

##########################################################
# Retrieve other information
echo "* Retrieving configuration parameters from Chameleon..."
image=$(nova show "$masterName" | grep image | tr "|" "\\t" | awk '{ print $2 }')
netId=$(neutron net-list | grep sharednet1 | tr "|" "\\t" | awk '{ print $1 }')

image=$(openstack server show "$masterName" -f value -c image | awk '{print $2}' | sed 's/[()]//g')
netId=$(openstack network list | grep "$NETWORK" | awk '{print $2}')

##########################################################
# Launch workers
# Launch workers
echo "* Launching workers..."
# Insert COMPSs Master key to OpenStack. Create workers with COMPSsMaster key authorized
nova keypair-add --pub_key /home/cc/.ssh/id_rsa.pub COMPSsMaster${randomID}
openstack keypair create --public-key /home/cc/.ssh/id_rsa.pub COMPSsMaster${randomID}

# Create workers
for (( i=1; i<=numWorkers; i++ )); do
cmd="nova boot --flavor baremetal --image $image --key-name COMPSsMaster${randomID} --nic net-id=$netId --hint reservation=$reservationId COMPSsWorker$i"
cmd="openstack server create --flavor baremetal --image $image --key-name COMPSsMaster${randomID} --nic net-id=$netId --hint reservation=$reservationId COMPSsWorker$i"
echo "$cmd"
$cmd
sleep $SLEEP_BETWEEN_WORKER_CREATION
Expand All @@ -78,34 +71,50 @@

for (( i=1; i<=numWorkers; i++ )); do
# Wait for each worker
cmd_status=$(nova list | grep "COMPSsWorker$i" | tr "|" "\\t" | awk '{ print $3 }')
cmd_status=$(openstack server list | grep "COMPSsWorker$i" | awk '{print $6}')
while [ "$cmd_status" != "ACTIVE" ]; do
sleep ${HALF_MIN}
cmd_status=$(nova list | grep "COMPSsWorker$i" | tr "|" "\\t" | awk '{ print $3 }')
cmd_status=$(openstack server list | grep "COMPSsWorker$i" | awk '{print $6}')
done
echo " - COMPSsWorker$i is ACTIVE"
done


##########################################################
# Retrieving COMPSs Workers information
echo "* Retrieving COMPSs Workers information..."

echo "# Automatically added hostnames by chameleon_cluster_setup" > $tmpFile
workerIPs=""
for (( i=1; i<=numWorkers; i++ )); do
workerIP=$(nova show COMPSsWorker$i | grep "network" | tr "|" "\\t" | awk '{ print $3 }' | tr "," "\\t" | awk '{ print $1 }')
workerIP=$(openstack server show COMPSsWorker$i -f value -c addresses | tr ',' '\n' | grep -oP '\d+\.\d+\.\d+\.\d+' | head -n 1)
# Update worker list
workerIPs="$workerIPs $workerIP"
# Update hosts tmp file"
echo "$workerIP COMPSsWorker$i" >> $tmpFile
# Update hosts tmp file
echo "$workerIP COMPSsWorker$i" >> $tmpFile
# Log worker IP
echo " - COMPSsWorker$i has IP = $workerIP"
done

echo "Debugging Information:"
echo "user=$user"
echo "instanceCreationTime=$instanceCreationTime"
echo "sshUpTime=$sshUpTime"
echo "randomID=$randomID"
echo "tmpFile=$tmpFile"
echo "HALF_MIN=$HALF_MIN"
echo "SLEEP_BETWEEN_WORKER_CREATION=$SLEEP_BETWEEN_WORKER_CREATION"
echo "masterName=$masterName"
echo "reservationId=$reservationId"
echo "numWorkers=$numWorkers"
echo "NETWORK=$NETWORK"
echo "image=$image"
echo "netId=$netId"
echo "workerIPS=$workerIPs"
echo " "


# Adding configuration to COMPSs Master /etc/hosts file
sudo /bin/bash -c "cat $tmpFile >> /etc/hosts"
masterIP=$(nova show "$masterName" | grep "network" | tr "|" "\\t" | awk '{ print $3 }' | tr "," "\\t" | awk '{ print $1 }')
sudo bash -c "cat $tmpFile >> /etc/hosts"
masterIP=$(openstack server show "$masterName" -f value -c addresses | tr ',' '\n' | grep -oP '\d+\.\d+\.\d+\.\d+' | head -n 1)
echo "$masterIP COMPSsMaster" >> $tmpFile

# Configuring COMPSs Workers
Expand All @@ -117,68 +126,60 @@
printf "\\n"

for workerIP in $workerIPs; do
scp -o StrictHostKeyChecking=no $tmpFile $user@"$workerIP":$tmpFile
# shellcheck disable=SC2029
ssh -t -t -o StrictHostKeyChecking=no -o BatchMode=yes -o ChallengeResponseAuthentication=no $user@"$workerIP" "sudo /bin/bash -c 'cat $tmpFile >> /etc/hosts'"
# shellcheck disable=SC2029
ssh -t -t -o StrictHostKeyChecking=no -o BatchMode=yes -o ChallengeResponseAuthentication=no $user@"$workerIP" "rm -f $tmpFile"
done
scp -o StrictHostKeyChecking=no $tmpFile $user@$workerIP:$tmpFile
ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ChallengeResponseAuthentication=no $user@$workerIP "sudo bash -c 'cat $tmpFile >> /etc/hosts'"
ssh -o StrictHostKeyChecking=no -o BatchMode=yes -o ChallengeResponseAuthentication=no $user@$workerIP "rm -f $tmpFile"
done

# Clean tmpfile
rm -f $tmpFile


##########################################################
# Update COMPSs project / resources files
echo "* Updating COMPSs project and resources files..."
project="${COMPSS_HOME}Runtime/configuration/xml/projects/default_project.xml"
resources="${COMPSS_HOME}Runtime/configuration/xml/resources/default_resources.xml"

echo ""
echo "Provide the application path:"
read -r appDir
read -rp "Provide the application path: " appDir

#
# PROJECT.XML
#
# shellcheck source=../system/xmls/generate_project.sh
# shellcheck disable=SC1091
source "${COMPSS_HOME}Runtime/scripts/system/xmls/generate_project.sh"

# Init project file
init "${project}"
# Add header (from generate_project.sh)
add_header
# Add master information (from generate_project.sh)
add_master_node ""
add_master_node 4 1 0 16 ""
# Add workers (from generate_project.sh)
for (( i=1; i<=numWorkers; i++ )); do
add_compute_node "COMPSsWorker$i" "/opt/COMPSs/" "/tmp/COMPSsWorker$i" "$user" "$appDir" "" "" "" ""
echo "$i"
add_compute_node "COMPSsWorker$i" "/opt/COMPSs/" "/tmp/COMPSsWorker$i" "$user" "$appDir" "" "" "" ""
done
# Close project (from generate_project.sh)
add_footer

#
# RESOURCES.XML
#
# shellcheck source=../system/xmls/generate_resources.sh
# shellcheck disable=SC1091
source "${COMPSS_HOME}Runtime/scripts/system/xmls/generate_resources.sh"

# Init resources file
init "${resources}"
# Add header (from generate_resources.sh)
add_header
# Add workers
for (( i=1; i<=numWorkers; i++ )); do
add_compute_node "COMPSsWorker$i" "24" "0" "0" "125" "43001" "43102" "" ""
echo "$i"
add_compute_node "COMPSsWorker$i" "24" "0" "125" "43001" "43102" "" ""
done
# Close resources (from generate_resources.sh)
add_footer


##########################################################
# End
echo "SUCCESS!"
exit

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 33 additions & 0 deletions compss/tools/reproducibility_service/APP-REQ/ro-crate-info.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# NOTE: NO NEED TO FILL BELOW INFORMATION AS IT WILL BE FILLED AUTOMATICALLY BY THE RS SERVICE
COMPSs Workflow Information:
name: Name of your COMPSs application
description: Detailed description of your COMPSs application
license: Apache-2.0
sources: [/absolute_path_to/dir_1/, relative_path_to/dir_2/, main_file.py, relative_path/aux_file_1.py, /abs_path/aux_file_2.py]
sources_main_file: my_main_file.py
# CURRENTLY ONLY DATA PERSISTENCE FALSE IS SUPPORTED BY THE RS SERVICE
data_persistence: False

# NOTE: AUTHORS INFORMATION WILL BE EXTRACTED FROM THE CRATE
Authors:
- name: Author_1 Name
e-mail: [email protected]
orcid: https://orcid.org/XXXX-XXXX-XXXX-XXXX
organisation_name: Institution_1 name
ror: https://ror.org/XXXXXXXXX
# Find them in ror.org
- name: Author_2 Name
e-mail: [email protected]
orcid: https://orcid.org/YYYY-YYYY-YYYY-YYYY
organisation_name: Institution_2 name
ror: https://ror.org/YYYYYYYYY
# Find them in ror.org

# NOTE: PLEASE FILL THE SUBMITTER INFORMATION
Submitter:
name: Name
e-mail: [email protected]
orcid: https://orcid.org/XXXX-XXXX-XXXX-XXXX
organisation_name: Submitter Institution name
ror: https://ror.org/XXXXXXXXX
# Find them in ror.org
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
69 changes: 69 additions & 0 deletions compss/tools/reproducibility_service/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# COMPSs-Reproducibility-Service

<p align="center">
<img src="./APP-REQ/logo.png" alt="Logo">
</p>

This is an automatic reproducibility service designed to help reproduce COMPSs workflows on your local machine or on a SLURM cluster. Below are the prerequisites and instructions for proper functioning.

## Pre-requisites

- COMPSs must be installed on your local machine, or the COMPSs module must be loaded on the cluster. For installation guidance, refer to the [COMPSs Official Installation Guide](https://compss-doc.readthedocs.io/en/stable/Sections/01_Installation.html).
- Ensure that all dependencies for the experiment you wish to reproduce are satisfied on the machine where you want to resubmit the application.

## How to Use

- Take the remote URL to the workflow (i.e. from WorkflowHub) or the path to the RO-Crate (a folder or a zip file) and pass it as the first argument to the service:
```bash
python3 reproducibility_service.py <link_or_path>
- The rest of the steps are self-explanatory and occur as interactions with the program, allowing the following features:

## Features

1. **Provenance Generation**: The program prompts you for a provenance flag (`-p` flag for `runcompss`). It automatically fetches the experiment details from the metadata and only asks for the `Submitter` details.

2. **New Dataset Feature**: If you want to reproduce the same experiment with a new dataset, simply provide the path to the new dataset.
> **Note**: The new dataset should follow the exact same directory structure as the old one for the paths to be correctly mapped.

3. **Flag Addition**: You can review the `runcompss` command line generated by the service and pass additional flags according to the needs of your new run.

4. **File Verification**: The service verifies file integrity against metadata such as file size or modification date. It generates a status table displaying the results of the verification.
<p align="center">
<img src="./APP-REQ/status_table.png" alt="Logo" style="width: 75%; height: auto;">
</p>

5. **Sub-directory Feature**: The service execution occurs in a separate subdirectory named `reproducibility_service_{timestamp}`, ensuring that it does not interfere with the current working directory (cwd).

6. **Results**: Any results generated by the experiment are stored in `reproducibility_service_{timestamp}/Results`. If provenance is requested, the generated RO-Crate is also stored in this directory.

7. **Logging**: Logs from the reproducibility service, such as `err.log`, `out.log`, and `rs_log`, are stored in `reproducibility_service_{timestamp}/log`.

## Known Issues (or Future Plans)

- Third party software dependencies: neither automatic detection nor loading those dependencies on a SLURM cluster are implemented. Currently, they need to be solved manually by the user.
- No support for workflows with `data_persistence = False` with all datasets as remote files.

### Experiment Requirements

1. If a folder path is provided in the `compss_submission_command_line`, the path should end with a `/`.
2. The service does not support experiments with file paths inside the source code, as these paths cannot be easily mapped.
3. The `data_persistence = False` examples are only supposed to work on the original SLURM cluster where paths related to the experiment are accessible (i.e. the new Submitter may need to request access permissions).

---
### How to Use via Chameleon

If you're unsure how to create an instance on Chameleon, please refer to the official documentation: [Chameleon Documentation](https://chameleoncloud.readthedocs.io/en/latest/index.html).

To utilize this service or run any COMPSs experiments, you can create an instance of the Ubuntu 22.04 appliance with COMPSs 3.3.1 pre-installed. You can find the appliance here: [Ubuntu 22.04 with COMPSs 3.3.1](https://www.chameleoncloud.org/appliances/121/).

After successfully creating an instance of the appliance, execute the following command to set up the environment:
```bash
sudo ./working_scripts/basic_config.sh start
```

Once the setup is complete, you can proceed to run any COMPSs experiments of your choice.

> **Note:** Since Chameleon allows access to remote networks, you can directly clone the Reproducibility Service as well as the RO-Crate of the experiment you want to reproduce.

---
I hope you find this service helpful!
Loading