increase memory of quast #1073

paulzierep · 2024-01-15T14:04:14Z

Some quast jobs take forever, I assume this is due to low memory allowance (was 12 GB so far: https://github.com/galaxyproject/tpv-shared-database/blob/3be0403ffc960effd180c65fa0e2242dfe5e6aa9/tools.yml#L2121C1-L2123C12); but ideally I would like to work on a solution similar to #881 if an admin can query it for me.

mira-miracoli · 2024-01-16T10:26:42Z

I hope this makes sense and helps (my statistics course was already 6 years ago):

(venv) stats@sn06:~/mira$ cat quast  |tail -n +2| awk '{print$15}'| grep -o '[0-9]*'| histogram.py --percentage --max 265                                                                    
# NumSamples = 34055; Min = 0.00; Max = 265.00
# 7136 values outside of min/max
# Mean = 127811.4829246806636323594204; Variance = 102472553574407.7959718309167; SD = 10122872.79256278; Median 34                                                                          
# each ∎ represents a count of 195
    0.0000 -    26.5000 [ 14676]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 43.09                                                                          
   26.5000 -    53.0000 [  8138]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 23.90
   53.0000 -    79.5000 [  1425]: ∎∎∎∎∎∎∎ 4.18
   79.5000 -   106.0000 [   753]: ∎∎∎ 2.21
  106.0000 -   132.5000 [   834]: ∎∎∎∎ 2.45
  132.5000 -   159.0000 [   275]: ∎ 0.81
  159.0000 -   185.5000 [   238]: ∎ 0.70
  185.5000 -   212.0000 [   246]: ∎ 0.72
  212.0000 -   238.5000 [   159]:  0.47
  238.5000 -   265.0000 [   175]:  0.51

using https://github.com/mira-miracoli/data_hacks/blob/patch-1/data_hacks/histogram.py

paulzierep · 2024-01-29T09:38:05Z

I don't fine the time to properly set up the rules atm, can we merge this PR for now ? @bgruening

bgruening · 2024-01-29T09:41:08Z

Can you please that: https://github.com/usegalaxy-eu/infrastructure-playbook/pull/1073/files#diff-ff91c17e82694a84945958b09ddc38e4535d1f99ee1fb0ed594a8cd4fecceca7R733

Not very smart but better then allocating to much memory to every run.

paulzierep · 2024-01-29T11:19:15Z

Can you please that: https://github.com/usegalaxy-eu/infrastructure-playbook/pull/1073/files#diff-ff91c17e82694a84945958b09ddc38e4535d1f99ee1fb0ed594a8cd4fecceca7R733

Not very smart but better then allocating to much memory to every run.

You mean to couple the memory on the input size? I think the problem is, that the deciding factor is mainly the content of the bacterial community (i.e. many species will lead to a lot of ram usage and few species few ram usage) ...
Best option I see atm is to only increase the memory for quast if the meta option is used. Is it possible to make rules based on the tool parameters ?

bgruening · 2024-01-29T11:32:34Z

Yes, this should be possible. https://github.com/galaxyproject/tpv-shared-database/blob/main/tools.yml#L438

paulzierep · 2024-01-31T14:27:27Z

The problem with this tool is, that memory is litte related to the input size. As shown in the stats figure.

Since all jobs that reported issues where related to metagenomic analysis, this simple approach should work to only increase the memory for those jobs. Another input option that should be considered is the co and not co-assembly option.
Is it possbile to query specific tool parameters from the DB. Maybe if an admin could help me out with this one, I can try to investigate if the rules can be more fine grained for different parameters instead of inputs.

Another appoach in the long run could be to add inputs to the tool, that help to allocate memory, i.e. run kraken first on the tool (or count the number of contigs) and then based on these metrics decide what memory the jobs should get.

paulzierep · 2024-01-31T14:29:12Z

Can an admin check what I did wrong. I used this as https://github.com/galaxyproject/tpv-shared-database/blob/efd5b95033bb59fa66d2d5f0d0c43edce2a1c24b/tools.yml#L438 template.

sanjaysrikakulam · 2024-02-01T16:45:18Z

The problem with this tool is, that memory is litte related to the input size. As shown in the stats figure.

Since all jobs that reported issues where related to metagenomic analysis, this simple approach should work to only increase the memory for those jobs. Another input option that should be considered is the co and not co-assembly option. Is it possbile to query specific tool parameters from the DB. Maybe if an admin could help me out with this one, I can try to investigate if the rules can be more fine grained for different parameters instead of inputs.

Another appoach in the long run could be to add inputs to the tool, that help to allocate memory, i.e. run kraken first on the tool (or count the number of contigs) and then based on these metrics decide what memory the jobs should get.

At an initial glance through the DB, I could not find a table that might contain the jobs/tools parameters individually or explicitly. However, the job table contains every job's command_line. You can extract this data for your tool of interest and then look into it (it might be a tedious process. I will try to dig through the codebase to see if the job parameters are stored anywhere explicitly).

SQL query:

select id, command_line from job where tool_id ilike '%toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/%';

You can add in additional conditions, for example, to look for jobs where type metagenome was used (based on my understanding of the tool's source, if someone uses the type metagenome, the metaquast would be used otherwise, quast)

SQL query:

select id, command_line from job where tool_id ilike '%toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/%' and command_line ilike '%meta%';

sanjaysrikakulam · 2024-02-01T17:16:10Z

OK, found the table job_parameter through which you can get the parameters for every job

select * from job_parameter where job_id=<enter the job id here>;

I mapped the job ID to a different column in the job_parameter table earlier and got empty results (hence, I didn't share the info about the table), but I just realized my mistake, so here is a solution.

Also, you can join the previously posted SQL query with this one as well,

select jp.* from job_parameter jp inner join job j on jp.job_id = j.id where j.tool_id ilike '%toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/%';

mira-miracoli · 2024-02-02T08:53:52Z

files/galaxy/tpv/tools.yml

+    - id: metagenome
+      if: helpers.job_args_match(job, app, {'assembly': {'type': 'Metagenome'}})
      cores: 20
      mem: 80


could it be assembly.type?
(I just briefly had a look in the wrapper, I coud also be wrong

increase memory

afc699d

paulzierep requested a review from mira-miracoli January 15, 2024 14:17

mira-miracoli self-assigned this Jan 16, 2024

paulzierep added 4 commits January 31, 2024 15:06

increase only for assembly

5c64239

linting

a236b9f

linting

28febb1

linting

cb1672c

bgruening and others added 4 commits January 31, 2024 21:07

Update tools.yml

30d3687

Update tools.yml

56915aa

Update tools.yml

280922e

Fix YAML nested mappings

8f1a1ef

mira-miracoli reviewed Feb 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

increase memory of quast #1073

increase memory of quast #1073

paulzierep commented Jan 15, 2024

mira-miracoli commented Jan 16, 2024 •

edited

Loading

paulzierep commented Jan 29, 2024

bgruening commented Jan 29, 2024

paulzierep commented Jan 29, 2024

bgruening commented Jan 29, 2024

paulzierep commented Jan 31, 2024

paulzierep commented Jan 31, 2024

sanjaysrikakulam commented Feb 1, 2024

sanjaysrikakulam commented Feb 1, 2024 •

edited

Loading

mira-miracoli Feb 2, 2024 •

edited

Loading

increase memory of quast #1073

Are you sure you want to change the base?

increase memory of quast #1073

Conversation

paulzierep commented Jan 15, 2024

mira-miracoli commented Jan 16, 2024 • edited Loading

paulzierep commented Jan 29, 2024

bgruening commented Jan 29, 2024

paulzierep commented Jan 29, 2024

bgruening commented Jan 29, 2024

paulzierep commented Jan 31, 2024

paulzierep commented Jan 31, 2024

sanjaysrikakulam commented Feb 1, 2024

sanjaysrikakulam commented Feb 1, 2024 • edited Loading

mira-miracoli Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

mira-miracoli commented Jan 16, 2024 •

edited

Loading

sanjaysrikakulam commented Feb 1, 2024 •

edited

Loading

mira-miracoli Feb 2, 2024 •

edited

Loading