Description
What version of Pydra are you using?
> pydra.__version__
'0.18'
also, interactive mode via jupyter notebook
.
What were you trying to do?
I am trying to combine outer and element-wise products in a task or workflow splitter, where the result of an outer product is combined in an element-wise product with a single variable. In pydra
syntax, this is: my_task.split((["a","b"],"c"))
. This can either happen, when each of the variable is explicitly defined by the user, or when the result of a previous split is recombined to do some group analysis and then split again the same way.
What did you expect will happen?
I expect that the splitter mytask.split((["a","b"],"c"))
works for one or both of two variants to define the variables:
Variant 1: c is a one-dimensional list of the same length as (a x b) (flattened)
a = [1,2] # len(a) == 2
b = [3,4,5] # len(b) == 3
c = [6,7,8,9,10,11] # len(c) == 6 == 2*3
Variant 2: c is a list of the same shape as (a x b)
a = [1,2] # len(a) = 2
b = [3,4,5] # len(b) = 3
c = [[6,7,8],[9,10,11]] # shape(c) == (2,3) == shape(itertools.product(a,b))
Variant 1 would actually be preferred, since it corresponds to the pure-python version of this product:
> list(zip(product(a, b), c))
[((1, 3), 6),
((1, 4), 7),
((1, 5), 8),
((2, 3), 9),
((2, 4), 10),
((2, 5), 11)]
What actually happened?
In both cases, pydra
failed to perform the split. Variant 1 raises a ValueError
(correctly) pointing out, that the shapes of the two sides of the element-wise product don't agree. (Could be improved to point out the shapes that it is comparing, but probably works as intended.) Variant 2 throws the same error, although the shapes should actually agree.
Can you replicate the behavior? If yes, how?
Results from an interactive jupyter notebook
:
# Preamble
import nest_asyncio
nest_asyncio.apply()
import pydra
@pydra.mark.task
@pydra.mark.annotate({"return": {"out_number": float,
}})
def sum_of_three(a: float = 0, b: float = 0, c: float = 0):
result = a + b + c
print(f"{a} + {b} + {c} = {result}")
return result
Example 1: Working with no split
my_sum = sum_of_three(a=1,
b=2,
c=3)
my_sum()
my_sum.result()
Result:
1 + 3 + 5 = 9
Result(output=Output(out_number=9), runtime=None, errored=False)
Example 2: Working with all inner products
my_sum = sum_of_three(a= [1,2],
b=[3,4],
c=[5,6,7,8])
my_sum.split(["a","b","c"])
my_sum()
my_sum.result()
Result:
1 + 3 + 6 = 10
2 + 4 + 5 = 11
2 + 3 + 5 = 10
... skipping 5 results
[Result(output=Output(out_number=9), runtime=None, errored=False),
Result(output=Output(out_number=10), runtime=None, errored=False),
Result(output=Output(out_number=10), runtime=None, errored=False),
... skipping 5 results]
Example 3: Failing when length matches but not shape
my_sum = sum_of_three(a=[1,2],
b=[3,4,5],
c= [6,7,8,9,10,11])
my_sum.split((["a","b"],"c"))
my_sum()
my_sum.result()
Result:
ValueError Traceback (most recent call last)
/tmp/ipykernel_3684992/2447886248.py in <module>
3 c= [6,7,8,9,10,11])
4 my_sum.split((["a","b"],"c"))
----> 5 my_sum()
6 my_sum.result()
# [... skipping 8 entries]
/.../site-packages/pydra/engine/state.py in splits(self, splitter_rpn)
1024 if shape_L != shape_R:
1025 raise ValueError(
-> 1026 f"Operands {term_R} and {term_L} do not have same shape"
1027 )
1028 newshape = shape_R
ValueError: Operands sum_of_three.c and (<itertools.product object at 0x7fe29d9be960>, (2, 3)) do not have same shape
This version seems to work as intended. However, I would argue, that the actual shape of the outer product [a,b]
is not (2,3)
but rather consider it as a parameter vector of length 6 (and 2 parameters per entry, but that shouldn't matter), which is thus the same shape or rather length as the parameter vector c
. As I pointed out above, this corresponds to how the built-in function for element-wise recombinations zip
works for this example. The following produces the desired combination of parameters:
> list(zip(product([1,2], [3,4,5]), [6,7,8,9,10,11]))
[((1, 3), 6),
((1, 4), 7),
((1, 5), 8),
((2, 3), 9),
((2, 4), 10),
((2, 5), 11)]
Example 4: Failing although shapes are correct.
my_sum = sum_of_three(a=[1,2],
b=[3,4,5],
c=[[6,7,8],[9,10,11]])
my_sum.split((["a","b"],"c"))
my_sum()
my_sum.result()
Result (same error):
ValueError Traceback (most recent call last)
/tmp/ipykernel_3690938/1846468932.py in <module>
3 c=[[6,7,8],[9,10,11]])
4 my_sum.split((["a","b"],"c"))
----> 5 my_sum()
6 my_sum.result()
# [... skipping 8 entries]
/.../lib/python3.9/site-packages/pydra/engine/helpers_state.py in splits(splitter_rpn, inputs, inner_inputs, cont_dim)
486 if token == ".":
487 if shape["L"] != shape["R"]:
--> 488 raise ValueError(
489 "Operands {} and {} do not have same shape.".format(
490 terms["R"], terms["L"]
ValueError: Operands sum_of_three.c and (<itertools.product object at 0x7f9fabaa3c80>, (2, 3)) do not have same shape.
Curiously, the shape test fails again, although they should be the same. The pure-python interpretation of this code fails to produce the desired output, which makes sense, because zip
projects along the first dimension only (!):
> list(zip(product([1,2], [3,4,5]), [[6,7,8],[9,10,11]]))
[((1, 3), [6, 7, 8]), ((1, 4), [9, 10, 11])]
Here the first two entries of the product [a,b]
are broadcast together with the two internal lists of c
.
Conclusion
In my opinion, it makes the most sense to implement the element-wise splitter product the same way as zip
works. This feels the most intuitive and corresponds to parameter variant 1/code example 3.
Preliminary Implementation Thoughts
The function pydra.engine.helpers_state.input_shape
is used to compute the shape that is supposed to be compared. It is an interesting starting point, because
- It explicitly states
# TODO: have to be changed for inner splitter (sometimes different length)
- It seems unnecessarily complicated. In my opinion, it makes sense to simply replace the entire function with
numpy.shape
and then think about how the outer product and element-wise (here: inner) product should be interpreted intuitively. In fact, I did exactly this as a dirty fix for the simple case, where a, b and c each contain only one element, but that alone does not solve the problem for larger lists.
The outer product already successfully use/mirror itertools.product
. I think, the inner/element-wise product should simply be replaced by zip
or a re-implementation of that behaviour and that's it. It might require some testing and careful thinking, but I recon, the entire split code in pydra.engine.helpers_state
can be simplified a lot that way and become more stable at the same time.
Final thoughts
As always, thanks a lot for the work you have put into pydra
. I think it shows a lot of promise, but still needs some kinks to be ironed out for productive use as a nipype
replacement. As I have gathered extensive experience with debugging function-based workflows in the current version of pydra
, I would like to contribute some insights and hopefully code, if time allows. I would like to hear your feedback, though, before attempting to put my thoughts into code.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status