Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add section on high availability setups to Galaxy Interactive Tools training #5179

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
189 changes: 184 additions & 5 deletions topics/admin/tutorials/interactive-tools/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ contributors:
- slugger70
- hexylena
- abretaud
- kysrpex
tags:
- ansible
- interactive-tools
Expand Down Expand Up @@ -224,7 +225,7 @@ When an Interactive Tool's Docker container starts, it will be assigned a random

![Galaxy Interactive Tools Proxy Diagram](../../images/interactive-tools/gxit-proxy-diagram.png "Galaxy Interactive Tools Proxy Diagram")

As you can see, the client only ever speaks to nginx on the Galaxy server running on the standard https port (443), never directly to the interactive tool (which may be running on a node that does not even have a public IP address). The mapping of GxIT invocation and its corresponding host/port is kept in a SQLite database known as the *Interactive Tools Session Map*, and the path to this database is important, since both Galaxy and the proxy need access to it.
As you can see, the client only ever speaks to nginx on the Galaxy server running on the standard https port (443), never directly to the interactive tool (which may be running on a node that does not even have a public IP address). By default, the mapping of GxIT invocation and its corresponding host/port is kept in a SQLite database known as the *Interactive Tools Session Map*, and the path to this database is important, since both Galaxy and the proxy need access to it.

The GIE Proxy is written in [Node.js][nodejs] and requires some configuration. Thankfully there is an Ansible role, [usegalaxy_eu.gie_proxy][usegalaxy_eu-gie_proxy], that can install the proxy and its dependencies, and configure it for you. As usual, have a look through the [README][usegalaxy_eu-gie_proxy-readme] and [defaults][usegalaxy_eu-gie_proxy-defaults] to investigate which variables you might need to set before continuing.

Expand Down Expand Up @@ -258,7 +259,7 @@ The GIE Proxy is written in [Node.js][nodejs] and requires some configuration. T
> gie_proxy_git_version: main
> gie_proxy_setup_nodejs: nodeenv
> gie_proxy_virtualenv_command: "{{ pip_virtualenv_command }}"
> gie_proxy_nodejs_version: "10.13.0"
> gie_proxy_nodejs_version: "14.21.3"
> gie_proxy_virtualenv: /srv/galaxy/gie-proxy/venv
> gie_proxy_setup_service: systemd
> gie_proxy_sessions_path: "{{ galaxy_mutable_data_dir }}/interactivetools_map.sqlite"
Expand Down Expand Up @@ -295,7 +296,7 @@ The GIE Proxy is written in [Node.js][nodejs] and requires some configuration. T
> > <solution-title></solution-title>
> >
> > 1. A new Python venv was created at `/srv/galaxy/gie-proxy/venv`
> > 2. Node.js version 10.13.0 was installed in to the venv
> > 2. Node.js version 14.21.3 was installed in to the venv
> > 3. The proxy was cloned to `/srv/galaxy/gie-proxy/proxy`
> > 4. The proxy's Node dependencies were installed to `/srv/galaxy/gie-proxy/proxy/node_modules` using the venv's `npm`
> > 5. A systemd service unit was installed at `/etc/systemd/system/galaxy-gie-proxy.service`
Expand Down Expand Up @@ -893,8 +894,186 @@ Once the playbook run is complete and your Galaxy server has restarted, run the
>
{: .question }

# Final Notes
## High availability setup with PostgresSQL (Optional)

As mentioned at the beginning of this tutorial, Galaxy Interactive Tools are a relatively new and rapidly evolving feature. At the time of writing, there is no official documentation for Interactive Tools. Please watch the [Galaxy Release Notes][galaxy-release-notes] for updates, changes, new documentation, and bug fixes.
> <comment-title></comment-title>
> This section is **only relevant if you are running a high-availability** setup, meaning that you have multiple copies of Galaxy running behind a load balancer.
>
> If you have installed Galaxy following the [Galaxy Installation with Ansible]({% link topics/admin/tutorials/ansible-galaxy/tutorial.md %}) tutorial, or are completing this tutorial as part of a [Galaxy Admin Training][gat] course, please skip this section, as you are then _not_ running a high-availability setup.
{: .comment}

In a _high availability_ setup, multiple redundant copies of Galaxy run simultaneously behind a load balancer to minimize downtime and service interruptions.

As explained in [one of the previous sections](#installing-the-interactive-tools-proxy), the Galaxy Interactive Tools Proxy redirects requests to each Interactive Tool's host and port. By default, the mapping of GxIT invocations to their corresponding host/port is kept in a SQLite database known as the _Interactive Tools Session Map_.

By design, [SQLite is the wrong choice for high availability setups][sqlite_situations_where_a_client_server_rdbms_may_work_better], the showstopper being that the SQLite database file would have to be shared over a network filesystem, which are usually associated with too high latencies for RDBMS use. For this reason, Galaxy and the Interactive Tools Proxy can also store the **Session Map in a PostgreSQL database**.

[sqlite_situations_where_a_client_server_rdbms_may_work_better]: https://www.sqlite.org/whentouse.html#situations_where_a_client_server_rdbms_may_work_better

> <hands-on-title>Preparing the database</hands-on-title>
>
> First, you need to create a database for the Interactive Tools Proxy.
>
> > <warning-title></warning-title>
> > Do **not** use the Galaxy database for this purpose. The main Galaxy database is reserved for Galaxy's core functionality, and Interactive Tools have not yet reached this stage. Since Galaxy does not expect to find the Interactive Tools Session Map in this database, storing it there can lead to errors.
> {: .warning }
>
> <br>
>
> 1. Access PostgresSQL **in your database server**.
>
> > <code-in-title>Bash</code-in-title>
> > ```bash
> > sudo -iu postgres psql
> > ```
> > {: data-cmd="true"}
> {: .code-in}
>
> > <code-out-title>SQL</code-out-title>
> > ```
> > psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1))
> > Type "help" for help.
> >
> > postgres=#
> > ```
> {: .code-out}
>
> 2. Create a `gxitproxy` database to store the Interactive Tools Session Map.
>
> > <code-in-title>SQL</code-in-title>
> > ```sql
> > CREATE DATABASE gxitproxy;
> > ```
> > {: data-cmd="true"}
> {: .code-in}
>
> > <code-out-title>SQL</code-out-title>
> > ```
> > CREATE DATABASE
> > ```
> {: .code-out}
>
> 3. For simplicity, the same user that operates on the Galaxy main database, typically named `galaxy`, is also going to operate on this one. Make this user the owner of the new database.
>
> > <code-in-title>SQL</code-in-title>
> > ```sql
> > ALTER DATABASE gxitproxy OWNER TO galaxy;
> > ```
> > {: data-cmd="true"}
> {: .code-in}
>
> > <code-out-title>SQL</code-out-title>
> > ```
> > ALTER DATABASE
> > ```
> {: .code-out}
>
> 4. Sign out of the `postgres` database using `exit`. Then connect to the `gxitproxy` database as `galaxy`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could replace steps 1+2+3 with the single CLI command: sudo -u postgres createdb -O galaxy gxitproxy, which would avoid students potentially missing the "logout and log in again" steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes the hands-on significantly shorter too.

Addressed in 72e272f.

>
> > <code-in-title>SQL</code-in-title>
> > ```sql
> > exit
> > ```
> > {: data-cmd="true"}
> {: .code-in}
>
> > <code-in-title>Bash</code-in-title>
> > ```bash
> > sudo -iu galaxy psql -d gxitproxy
> > ```
> > {: data-cmd="true"}
> {: .code-in}
>
> > <code-out-title>SQL</code-out-title>
> > ```
> > psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1))
> > Type "help" for help.
> >
> > gxitproxy=#
> > ```
> {: .code-out}
>
> 5. Create a `gxitproxy` table in the new database.
>
> > <code-in-title>SQL</code-in-title>
> > ```sql
> > CREATE TABLE IF NOT EXISTS gxitproxy (key TEXT, key_type TEXT, token TEXT, host TEXT, port INTEGER, info TEXT, PRIMARY KEY (key, key_type));
> > ```
> > {: data-cmd="true"}
> {: .code-in}
>
> > <code-out-title>SQL</code-out-title>
> > ```
> > CREATE TABLE
> > ```
> {: .code-out}
>
> This is enough to let Galaxy and the Interactive Tool Proxy store the Interactive Tools Session Map in PostgreSQL. But there is a catch: when the Interactive Tool Proxy uses SQLite, it knows the database has changed because it watches the file for changes. When using Postgres, this mechanism is not available. By default, the proxy simply polls the database at regular intervals. To let the user access interactive tools as fast as possible, the proxy can also be notified of updates via [PostgreSQL asynchronous notifications](https://www.postgresql.org/docs/16/libpq-notify.html). To enable them, you have to create a PostgreSQL trigger that sends a NOTIFY message to the channel `gxitproxy` every time the table `gxitproxy` changes.
>
> <br>
>
> {:start="6"}
> 6. Run the following commands to create to create a function that sends a NOTIFY message to the channel `gxitproxy` and a trigger that runs the function every time the table `gxitproxy` changes.
kysrpex marked this conversation as resolved.
Show resolved Hide resolved
>
> > <code-in-title>SQL</code-in-title>
> > ```sql
> > CREATE OR REPLACE FUNCTION notify_gxitproxy()
> > RETURNS trigger AS $$
> > BEGIN
> > PERFORM pg_notify('gxitproxy', 'Table "gxitproxy" changed');
> > RETURN NEW;
> > END;
> > $$ LANGUAGE plpgsql;
> >
> > CREATE TRIGGER gxitproxy_notify
> > AFTER INSERT OR UPDATE OR DELETE ON gxitproxy
> > FOR EACH ROW EXECUTE FUNCTION notify_gxitproxy();
> > ```
> > {: data-cmd="true"}
> {: .code-in}
>
> > <code-out-title>SQL</code-out-title>
> > ```
> > CREATE FUNCTION
> > CREATE TRIGGER
> > ```
> {: .code-out}
>
{: .hands_on}

The next step is configuring Galaxy and the Interactive Tool Proxy to use the new database.

> <hands-on-title>Configure Galaxy and the Interactive Tool Proxy</hands-on-title>
>
> 1. Adjust your `group_vars/galaxyservers.yml` file as follows.
>
> {% raw %}
> ```yaml
> # ... existing configuration options ... #
>
> galaxy_config:
> galaxy:
> # ... existing configuration options in the `galaxy` section ...
> # interactivetools_map: "{{ gie_proxy_sessions_path }}" # comment, remove or leave this line in place (it will be overridden by the option below)
> interactivetools_map_sqlalchemy: "{{ gie_proxy_sessions_path }}"
> # ... other existing configuration options in the `galaxy` section ...
>
> # ... other existing configurations ... #
>
> gie_proxy_sessions_path: "postgresql:///gxitproxy?host=/var/run/postgresql"
> ```
> {% endraw %}
>
> 2. Run the playbook:
>
> ```
> ansible-playbook galaxy.yml
> ```
>
{: .hands_on}

That's it, once the playbook run is complete, both Galaxy and the Interactive Tools Proxy will be storing the Interactive Tools Session Map in PostgreSQL.

# Final Notes
As mentioned at the beginning of this tutorial, Galaxy Interactive Tools are a relatively new and rapidly evolving feature. At the time of writing, there is no official documentation for Interactive Tools. Please watch the [Galaxy Release Notes][galaxy-release-notes] for updates, changes, new documentation, and bug fixes.
hexylena marked this conversation as resolved.
Show resolved Hide resolved
[galaxy-release-notes]: https://docs.galaxyproject.org/en/master/releases/index.html
Loading