Skip to content

Splitter unable to resolve combination of outer and element-wise products ([a x b] * c) #532

Open
@dafrose

Description

@dafrose

What version of Pydra are you using?

> pydra.__version__
'0.18'

also, interactive mode via jupyter notebook.

What were you trying to do?

I am trying to combine outer and element-wise products in a task or workflow splitter, where the result of an outer product is combined in an element-wise product with a single variable. In pydra syntax, this is: my_task.split((["a","b"],"c")). This can either happen, when each of the variable is explicitly defined by the user, or when the result of a previous split is recombined to do some group analysis and then split again the same way.

What did you expect will happen?

I expect that the splitter mytask.split((["a","b"],"c")) works for one or both of two variants to define the variables:

Variant 1: c is a one-dimensional list of the same length as (a x b) (flattened)

a = [1,2]  # len(a) == 2
b = [3,4,5]  # len(b) == 3
c = [6,7,8,9,10,11]  # len(c) == 6 == 2*3

Variant 2: c is a list of the same shape as (a x b)

a = [1,2]  # len(a) = 2
b = [3,4,5]  # len(b) = 3
c = [[6,7,8],[9,10,11]]  # shape(c) == (2,3) == shape(itertools.product(a,b))

Variant 1 would actually be preferred, since it corresponds to the pure-python version of this product:

> list(zip(product(a, b), c))
[((1, 3), 6),
 ((1, 4), 7),
 ((1, 5), 8),
 ((2, 3), 9),
 ((2, 4), 10),
 ((2, 5), 11)]

What actually happened?

In both cases, pydra failed to perform the split. Variant 1 raises a ValueError (correctly) pointing out, that the shapes of the two sides of the element-wise product don't agree. (Could be improved to point out the shapes that it is comparing, but probably works as intended.) Variant 2 throws the same error, although the shapes should actually agree.

Can you replicate the behavior? If yes, how?

Results from an interactive jupyter notebook:

# Preamble
import nest_asyncio
nest_asyncio.apply()
import pydra

@pydra.mark.task
@pydra.mark.annotate({"return": {"out_number": float,
                           }})
def sum_of_three(a: float = 0, b: float = 0, c: float = 0):
    result = a + b + c
    print(f"{a} + {b} + {c} = {result}")
    return result

Example 1: Working with no split

my_sum = sum_of_three(a=1,
                     b=2,
                     c=3)
my_sum()
my_sum.result()

Result:

1 + 3 + 5 = 9
Result(output=Output(out_number=9), runtime=None, errored=False)

Example 2: Working with all inner products

my_sum = sum_of_three(a= [1,2],
                     b=[3,4],
                     c=[5,6,7,8])
my_sum.split(["a","b","c"])
my_sum()
my_sum.result()

Result:

1 + 3 + 6 = 10
2 + 4 + 5 = 11
2 + 3 + 5 = 10
... skipping 5 results
[Result(output=Output(out_number=9), runtime=None, errored=False),
 Result(output=Output(out_number=10), runtime=None, errored=False),
 Result(output=Output(out_number=10), runtime=None, errored=False),
... skipping 5 results]

Example 3: Failing when length matches but not shape

my_sum = sum_of_three(a=[1,2],
                     b=[3,4,5],
                     c= [6,7,8,9,10,11])
my_sum.split((["a","b"],"c"))
my_sum()
my_sum.result()

Result:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_3684992/2447886248.py in <module>
      3                      c= [6,7,8,9,10,11])
      4 my_sum.split((["a","b"],"c"))
----> 5 my_sum()
      6 my_sum.result()

# [... skipping 8 entries]

/.../site-packages/pydra/engine/state.py in splits(self, splitter_rpn)
   1024                     if shape_L != shape_R:
   1025                         raise ValueError(
-> 1026                             f"Operands {term_R} and {term_L} do not have same shape"
   1027                         )
   1028                     newshape = shape_R

ValueError: Operands sum_of_three.c and (<itertools.product object at 0x7fe29d9be960>, (2, 3)) do not have same shape

This version seems to work as intended. However, I would argue, that the actual shape of the outer product [a,b] is not (2,3) but rather consider it as a parameter vector of length 6 (and 2 parameters per entry, but that shouldn't matter), which is thus the same shape or rather length as the parameter vector c. As I pointed out above, this corresponds to how the built-in function for element-wise recombinations zip works for this example. The following produces the desired combination of parameters:

> list(zip(product([1,2], [3,4,5]), [6,7,8,9,10,11]))
[((1, 3), 6),
 ((1, 4), 7),
 ((1, 5), 8),
 ((2, 3), 9),
 ((2, 4), 10),
 ((2, 5), 11)]

Example 4: Failing although shapes are correct.

my_sum = sum_of_three(a=[1,2],
                     b=[3,4,5],
                     c=[[6,7,8],[9,10,11]])
my_sum.split((["a","b"],"c"))
my_sum()
my_sum.result()

Result (same error):

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_3690938/1846468932.py in <module>
      3                      c=[[6,7,8],[9,10,11]])
      4 my_sum.split((["a","b"],"c"))
----> 5 my_sum()
      6 my_sum.result()

# [... skipping 8 entries]

/.../lib/python3.9/site-packages/pydra/engine/helpers_state.py in splits(splitter_rpn, inputs, inner_inputs, cont_dim)
    486             if token == ".":
    487                 if shape["L"] != shape["R"]:
--> 488                     raise ValueError(
    489                         "Operands {} and {} do not have same shape.".format(
    490                             terms["R"], terms["L"]

ValueError: Operands sum_of_three.c and (<itertools.product object at 0x7f9fabaa3c80>, (2, 3)) do not have same shape.

Curiously, the shape test fails again, although they should be the same. The pure-python interpretation of this code fails to produce the desired output, which makes sense, because zip projects along the first dimension only (!):

> list(zip(product([1,2], [3,4,5]), [[6,7,8],[9,10,11]]))
[((1, 3), [6, 7, 8]), ((1, 4), [9, 10, 11])]

Here the first two entries of the product [a,b] are broadcast together with the two internal lists of c.

Conclusion

In my opinion, it makes the most sense to implement the element-wise splitter product the same way as zip works. This feels the most intuitive and corresponds to parameter variant 1/code example 3.

Preliminary Implementation Thoughts

The function pydra.engine.helpers_state.input_shape is used to compute the shape that is supposed to be compared. It is an interesting starting point, because

  1. It explicitly states # TODO: have to be changed for inner splitter (sometimes different length)
  2. It seems unnecessarily complicated. In my opinion, it makes sense to simply replace the entire function with numpy.shape and then think about how the outer product and element-wise (here: inner) product should be interpreted intuitively. In fact, I did exactly this as a dirty fix for the simple case, where a, b and c each contain only one element, but that alone does not solve the problem for larger lists.

The outer product already successfully use/mirror itertools.product. I think, the inner/element-wise product should simply be replaced by zip or a re-implementation of that behaviour and that's it. It might require some testing and careful thinking, but I recon, the entire split code in pydra.engine.helpers_state can be simplified a lot that way and become more stable at the same time.

Final thoughts

As always, thanks a lot for the work you have put into pydra. I think it shows a lot of promise, but still needs some kinks to be ironed out for productive use as a nipype replacement. As I have gathered extensive experience with debugging function-based workflows in the current version of pydra, I would like to contribute some insights and hopefully code, if time allows. I would like to hear your feedback, though, before attempting to put my thoughts into code.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

v1.0

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions