Improper type coercion in WDL scatter with conditional #5055

stxue1 · 2024-08-09T19:34:20Z

This WDL workflow (from the WDL unit tests) does not work with toil-wdl-runner, but does in miniwdl.

version 1.1

task tail {
  input {
    Pair[File, Int] to_tail
  }

  command <<<
  tail -n ~{to_tail.right} ~{to_tail.left}
  >>>

  output {
    Array[String] lines = read_lines(stdout())
  }
}

workflow serde_pair {
  input {
    Map[File, Int] to_tail
  }

  scatter (item in as_pairs(to_tail)) {
    call tail {
      input: to_tail = item
    }
    Pair[String, String]? two_lines = 
      if item.right >= 2 then (tail.lines[0], tail.lines[1]) else None
  }

  output {
    Map[String, String] tails_of_two = as_map(select_all(two_lines))
  }
}

Example call:

toil-wdl-runner /home/heaucques/Documents/wdl-conformance-tests/unit_tests/serde_pair.wdl -i '{"serde_pair.to_tail":{"unit_tests/data/cities.txt":2,"unit_tests/data/hello.txt":1}}'

[2024-08-09T12:12:30-0700] [MainThread] [C] [toil.wdl.wdltoil] Could not evaluate outputs because:
	
	🚨🚨🚨
	(/home/heaucques/Documents/wdl-conformance-tests/unit_tests/serde_pair.wdl Ln 31 Col 40) 
	🚨🚨🚨
	
	[2024-08-09T12:12:30-0700] [MainThread] [C] [toil.worker] Worker crashed with traceback:
	Traceback (most recent call last):
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/Expr.py", line 124, in eval
	    ans = self._eval(env, stdlib)
	          ^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/Expr.py", line 1142, in _eval
	    return f(self, env, stdlib)
	           ^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/StdLib.py", line 232, in __call__
	    return self._call_eager(expr, [arg.eval(env, stdlib=stdlib) for arg in expr.arguments])
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/StdLib.py", line 1147, in _call_eager
	    collected = super()._call_eager(expr, arguments)
	                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/StdLib.py", line 1116, in _call_eager
	    assert isinstance(pairty, Type.Pair)
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	AssertionError
	
	The above exception was the direct cause of the following exception:
	
	Traceback (most recent call last):
	  File "/home/heaucques/Documents/toil/src/toil/worker.py", line 439, in workerScript
	    job._runner(jobGraph=None, jobStore=job_store, fileStore=fileStore, defer=defer)
	  File "/home/heaucques/Documents/toil/src/toil/job.py", line 2984, in _runner
	    returnValues = self._run(jobGraph=None, fileStore=fileStore)
	                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/job.py", line 2895, in _run
	    return self.run(fileStore)
	           ^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 147, in decorated
	    return decoratee(*args, **kwargs)
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 3269, in run
	    output_bindings = evaluate_output_decls(self._workflow.outputs, unwrap(self._bindings), standard_library)
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 641, in evaluate_output_decls
	    output_value = evaluate_decl(output_decl, all_bindings, standard_library)
	                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 1253, in evaluate_decl
	    return evaluate_named_expression(node, node.name, node.type, node.expr, environment, stdlib)
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 1234, in evaluate_named_expression
	    value = expression.eval(environment, stdlib)
	            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/Expr.py", line 130, in eval
	    raise Error.EvalError(self, str(exn)) from exn
	WDL.Error.EvalError

I think I tracked it down to an issue with type coercion in the WDLArrayBindingsJob:
We keep track of all observed types bound to a name:

toil/src/toil/wdl/wdltoil.py

Lines 3110 to 3114 in 7d4d074

    
           observed_types = [] 
        
           for env in new_bindings: 
        
               binding_type = env.resolve(name).type if env.has_binding(name) else None 
        
               if binding_type not in observed_types: 
        
                   observed_types.append(binding_type)

In this workflow, there are 2 types that are detected:
observed_types: [Pair[String,String], Any?]

As a result, the supertype becomes Any

toil/src/toil/wdl/wdltoil.py

Line 3116 in 7d4d074

supertype: WDL.Type.Base = get_supertype(observed_types)

The two detected types are the result of the two scatter jobs (technically WDLWorkflowNodeJob's) created:

toil/src/toil/wdl/wdltoil.py

Line 3059 in 7d4d074

    
           gather_job = WDLArrayBindingsJob([j.rv() for j in scatter_jobs], bindings, wdl_options=self._wdl_options)

The input ({"serde_pair.to_tail":{"unit_tests/data/cities.txt":2,"unit_tests/data/hello.txt":1}}) has two elements which creates 2 scatter jobs according to the workflow.

Each one of those scatter jobs get one of the input elements, either "unit_tests/data/cities.txt":2 or "unit_tests/data/hello.txt":1. According to the computed expression: one of the workflows will compute the value ("Chicago","Piscataway") of type Pair[String,String] while the other will compute None with type Any?. This is because item.right >= 2 is only true in one of the jobs.

The proper type should be Pair[String, String]?, and the second computed value of None should not coerce the type to Any?. Maybe instead of coercing to the supertype of the observed types, we should attempt to coerce to the written type in the WDL workflow? Alternatively, we could carry through the optional context, but that would require a lot of retooling as miniwdl seems to have different type representations for a WDL value vs a WDL type.

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1628

The text was updated successfully, but these errors were encountered:

stxue1 mentioned this issue Sep 20, 2024

Change supertype calculation when combining scatters #5101

Merged

19 tasks

adamnovak linked a pull request Sep 24, 2024 that will close this issue

Change supertype calculation when combining scatters #5101

Merged

19 tasks

adamnovak closed this as completed in #5101 Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improper type coercion in WDL scatter with conditional #5055

Improper type coercion in WDL scatter with conditional #5055

stxue1 commented Aug 9, 2024 •

edited by unito-bot

Loading

Improper type coercion in WDL scatter with conditional #5055

Improper type coercion in WDL scatter with conditional #5055

Comments

stxue1 commented Aug 9, 2024 • edited by unito-bot Loading

stxue1 commented Aug 9, 2024 •

edited by unito-bot

Loading