Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper type coercion in WDL scatter with conditional #5055

Closed
stxue1 opened this issue Aug 9, 2024 · 0 comments · Fixed by #5101
Closed

Improper type coercion in WDL scatter with conditional #5055

stxue1 opened this issue Aug 9, 2024 · 0 comments · Fixed by #5101

Comments

@stxue1
Copy link
Contributor

stxue1 commented Aug 9, 2024

This WDL workflow (from the WDL unit tests) does not work with toil-wdl-runner, but does in miniwdl.

version 1.1

task tail {
  input {
    Pair[File, Int] to_tail
  }

  command <<<
  tail -n ~{to_tail.right} ~{to_tail.left}
  >>>

  output {
    Array[String] lines = read_lines(stdout())
  }
}

workflow serde_pair {
  input {
    Map[File, Int] to_tail
  }

  scatter (item in as_pairs(to_tail)) {
    call tail {
      input: to_tail = item
    }
    Pair[String, String]? two_lines = 
      if item.right >= 2 then (tail.lines[0], tail.lines[1]) else None
  }

  output {
    Map[String, String] tails_of_two = as_map(select_all(two_lines))
  }
}

Example call:

toil-wdl-runner /home/heaucques/Documents/wdl-conformance-tests/unit_tests/serde_pair.wdl -i '{"serde_pair.to_tail":{"unit_tests/data/cities.txt":2,"unit_tests/data/hello.txt":1}}'
[2024-08-09T12:12:30-0700] [MainThread] [C] [toil.wdl.wdltoil] Could not evaluate outputs because:
	
	🚨🚨🚨
	(/home/heaucques/Documents/wdl-conformance-tests/unit_tests/serde_pair.wdl Ln 31 Col 40) 
	🚨🚨🚨
	
	[2024-08-09T12:12:30-0700] [MainThread] [C] [toil.worker] Worker crashed with traceback:
	Traceback (most recent call last):
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/Expr.py", line 124, in eval
	    ans = self._eval(env, stdlib)
	          ^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/Expr.py", line 1142, in _eval
	    return f(self, env, stdlib)
	           ^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/StdLib.py", line 232, in __call__
	    return self._call_eager(expr, [arg.eval(env, stdlib=stdlib) for arg in expr.arguments])
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/StdLib.py", line 1147, in _call_eager
	    collected = super()._call_eager(expr, arguments)
	                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/StdLib.py", line 1116, in _call_eager
	    assert isinstance(pairty, Type.Pair)
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	AssertionError
	
	The above exception was the direct cause of the following exception:
	
	Traceback (most recent call last):
	  File "/home/heaucques/Documents/toil/src/toil/worker.py", line 439, in workerScript
	    job._runner(jobGraph=None, jobStore=job_store, fileStore=fileStore, defer=defer)
	  File "/home/heaucques/Documents/toil/src/toil/job.py", line 2984, in _runner
	    returnValues = self._run(jobGraph=None, fileStore=fileStore)
	                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/job.py", line 2895, in _run
	    return self.run(fileStore)
	           ^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 147, in decorated
	    return decoratee(*args, **kwargs)
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 3269, in run
	    output_bindings = evaluate_output_decls(self._workflow.outputs, unwrap(self._bindings), standard_library)
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 641, in evaluate_output_decls
	    output_value = evaluate_decl(output_decl, all_bindings, standard_library)
	                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 1253, in evaluate_decl
	    return evaluate_named_expression(node, node.name, node.type, node.expr, environment, stdlib)
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/src/toil/wdl/wdltoil.py", line 1234, in evaluate_named_expression
	    value = expression.eval(environment, stdlib)
	            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	  File "/home/heaucques/Documents/toil/venv3.12/lib/python3.12/site-packages/WDL/Expr.py", line 130, in eval
	    raise Error.EvalError(self, str(exn)) from exn
	WDL.Error.EvalError

I think I tracked it down to an issue with type coercion in the WDLArrayBindingsJob:
We keep track of all observed types bound to a name:

toil/src/toil/wdl/wdltoil.py

Lines 3110 to 3114 in 7d4d074

observed_types = []
for env in new_bindings:
binding_type = env.resolve(name).type if env.has_binding(name) else None
if binding_type not in observed_types:
observed_types.append(binding_type)

In this workflow, there are 2 types that are detected:
observed_types: [Pair[String,String], Any?]

As a result, the supertype becomes Any

supertype: WDL.Type.Base = get_supertype(observed_types)

The two detected types are the result of the two scatter jobs (technically WDLWorkflowNodeJob's) created:

gather_job = WDLArrayBindingsJob([j.rv() for j in scatter_jobs], bindings, wdl_options=self._wdl_options)

The input ({"serde_pair.to_tail":{"unit_tests/data/cities.txt":2,"unit_tests/data/hello.txt":1}}) has two elements which creates 2 scatter jobs according to the workflow.

Each one of those scatter jobs get one of the input elements, either "unit_tests/data/cities.txt":2 or "unit_tests/data/hello.txt":1. According to the computed expression: one of the workflows will compute the value ("Chicago","Piscataway") of type Pair[String,String] while the other will compute None with type Any?. This is because item.right >= 2 is only true in one of the jobs.

The proper type should be Pair[String, String]?, and the second computed value of None should not coerce the type to Any?. Maybe instead of coercing to the supertype of the observed types, we should attempt to coerce to the written type in the WDL workflow? Alternatively, we could carry through the optional context, but that would require a lot of retooling as miniwdl seems to have different type representations for a WDL value vs a WDL type.

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1628

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant