Splitter unable to resolve combination of outer and element-wise products ([a x b] * c)

# What version of Pydra are you using?

```python
> pydra.__version__
'0.18'
```
also, interactive mode via `jupyter notebook`.

# What were you trying to do?

I am trying to combine outer and element-wise products in a task or workflow splitter, where the result of an outer product is combined in an element-wise product with a single variable. In `pydra` syntax, this is: `my_task.split((["a","b"],"c"))`. This can either happen, when each of the variable is explicitly defined by the user, or when the result of a previous split is recombined to do some group analysis and then split again the same way. 

# What did you expect will happen?

I expect that the splitter `mytask.split((["a","b"],"c"))` works for one or both of two variants to define the variables:

### Variant 1: c is a one-dimensional list of the same length as (a x b) (flattened)

```python
a = [1,2]  # len(a) == 2
b = [3,4,5]  # len(b) == 3
c = [6,7,8,9,10,11]  # len(c) == 6 == 2*3
```

### Variant 2: c is a list of the same shape as (a x b)

```python
a = [1,2]  # len(a) = 2
b = [3,4,5]  # len(b) = 3
c = [[6,7,8],[9,10,11]]  # shape(c) == (2,3) == shape(itertools.product(a,b))
```

Variant 1 would actually be preferred, since it corresponds to the pure-python version of this product:

```python
> list(zip(product(a, b), c))
[((1, 3), 6),
 ((1, 4), 7),
 ((1, 5), 8),
 ((2, 3), 9),
 ((2, 4), 10),
 ((2, 5), 11)]
```


# What actually happened?

In both cases, `pydra` failed to perform the split. Variant 1 raises a `ValueError` (correctly) pointing out, that the shapes of the two sides of the element-wise product don't agree. (Could be improved to point out the shapes that it is comparing, but probably works as intended.) Variant 2 throws the same error, although the shapes should actually agree. 

# Can you replicate the behavior? If yes, how?

Results from an interactive `jupyter notebook`:

```python
# Preamble
import nest_asyncio
nest_asyncio.apply()
import pydra

@pydra.mark.task
@pydra.mark.annotate({"return": {"out_number": float,
                           }})
def sum_of_three(a: float = 0, b: float = 0, c: float = 0):
    result = a + b + c
    print(f"{a} + {b} + {c} = {result}")
    return result
```

### Example 1: Working with no split

```python
my_sum = sum_of_three(a=1,
                     b=2,
                     c=3)
my_sum()
my_sum.result()
```

Result:
```python
1 + 3 + 5 = 9
Result(output=Output(out_number=9), runtime=None, errored=False)
```

### Example 2: Working with all inner products

```python
my_sum = sum_of_three(a= [1,2],
                     b=[3,4],
                     c=[5,6,7,8])
my_sum.split(["a","b","c"])
my_sum()
my_sum.result()
```

Result:
```python
1 + 3 + 6 = 10
2 + 4 + 5 = 11
2 + 3 + 5 = 10
... skipping 5 results
[Result(output=Output(out_number=9), runtime=None, errored=False),
 Result(output=Output(out_number=10), runtime=None, errored=False),
 Result(output=Output(out_number=10), runtime=None, errored=False),
... skipping 5 results]
```

### Example 3: Failing when length matches but not shape

```python
my_sum = sum_of_three(a=[1,2],
                     b=[3,4,5],
                     c= [6,7,8,9,10,11])
my_sum.split((["a","b"],"c"))
my_sum()
my_sum.result()
```

Result:

```python
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_3684992/2447886248.py in <module>
      3                      c= [6,7,8,9,10,11])
      4 my_sum.split((["a","b"],"c"))
----> 5 my_sum()
      6 my_sum.result()

# [... skipping 8 entries]

/.../site-packages/pydra/engine/state.py in splits(self, splitter_rpn)
   1024                     if shape_L != shape_R:
   1025                         raise ValueError(
-> 1026                             f"Operands {term_R} and {term_L} do not have same shape"
   1027                         )
   1028                     newshape = shape_R

ValueError: Operands sum_of_three.c and (<itertools.product object at 0x7fe29d9be960>, (2, 3)) do not have same shape
```

This version seems to work as intended. However, I would argue, that the actual shape of the outer product `[a,b]` is not `(2,3)` but rather consider it as a parameter vector of length 6 (and 2 parameters per entry, but that shouldn't matter), which is thus the same shape or rather length as the parameter vector `c`. As I pointed out above, this corresponds to how the built-in function for element-wise recombinations `zip` works for this example. The following produces the desired combination of parameters:

```python
> list(zip(product([1,2], [3,4,5]), [6,7,8,9,10,11]))
[((1, 3), 6),
 ((1, 4), 7),
 ((1, 5), 8),
 ((2, 3), 9),
 ((2, 4), 10),
 ((2, 5), 11)]
```

### Example 4: Failing although shapes are correct.

```python
my_sum = sum_of_three(a=[1,2],
                     b=[3,4,5],
                     c=[[6,7,8],[9,10,11]])
my_sum.split((["a","b"],"c"))
my_sum()
my_sum.result()
```
Result (same error):
```python
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_3690938/1846468932.py in <module>
      3                      c=[[6,7,8],[9,10,11]])
      4 my_sum.split((["a","b"],"c"))
----> 5 my_sum()
      6 my_sum.result()

# [... skipping 8 entries]

/.../lib/python3.9/site-packages/pydra/engine/helpers_state.py in splits(splitter_rpn, inputs, inner_inputs, cont_dim)
    486             if token == ".":
    487                 if shape["L"] != shape["R"]:
--> 488                     raise ValueError(
    489                         "Operands {} and {} do not have same shape.".format(
    490                             terms["R"], terms["L"]

ValueError: Operands sum_of_three.c and (<itertools.product object at 0x7f9fabaa3c80>, (2, 3)) do not have same shape.
```

Curiously, the shape test fails again, although they should be the same. The pure-python interpretation of this code fails to produce the desired output, which makes sense, because `zip` projects along the first dimension only (!):

```python
> list(zip(product([1,2], [3,4,5]), [[6,7,8],[9,10,11]]))
[((1, 3), [6, 7, 8]), ((1, 4), [9, 10, 11])]
```

Here the first two entries of the product `[a,b]` are broadcast together with the two internal lists of `c`. 

# Conclusion

In my opinion, it makes the most sense to implement the element-wise splitter product the same way as `zip` works. This feels the most intuitive and corresponds to parameter variant 1/code example 3.

# Preliminary Implementation Thoughts

The function `pydra.engine.helpers_state.input_shape` is used to compute the shape that is supposed to be compared. It is an interesting starting point, because 

1. It explicitly states `# TODO: have to be changed for inner splitter (sometimes different length)`
2. It seems unnecessarily complicated. In my opinion, it makes sense to simply replace the entire function with `numpy.shape` and then think about how the outer product and element-wise (here: inner) product should be interpreted intuitively. In fact, I did exactly this as a dirty fix for the simple case, where a, b and c each contain only one element, but that alone does not solve the problem for larger lists.

The outer product already successfully use/mirror `itertools.product`. I think, the inner/element-wise product should simply be replaced by `zip` or a re-implementation of that behaviour and that's it. It might require some testing and careful thinking, but I recon, the entire split code in `pydra.engine.helpers_state` can be simplified a lot that way and become more stable at the same time. 

# Final thoughts 

As always, thanks a lot for the work you have put into `pydra`. I think it shows a lot of promise, but still needs some kinks to be ironed out for productive use as a `nipype` replacement. As I have gathered extensive experience with debugging function-based workflows in the current version of `pydra`, I would like to contribute some insights and hopefully code, if time allows. I would like to hear your feedback, though, before attempting to put my thoughts into code.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Splitter unable to resolve combination of outer and element-wise products ([a x b] * c) #532

What version of Pydra are you using?

What were you trying to do?

What did you expect will happen?

Variant 1: c is a one-dimensional list of the same length as (a x b) (flattened)

Variant 2: c is a list of the same shape as (a x b)

What actually happened?

Can you replicate the behavior? If yes, how?

Example 1: Working with no split

Example 2: Working with all inner products

Example 3: Failing when length matches but not shape

Example 4: Failing although shapes are correct.

Conclusion

Preliminary Implementation Thoughts

Final thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Splitter unable to resolve combination of outer and element-wise products ([a x b] * c) #532

Description

What version of Pydra are you using?

What were you trying to do?

What did you expect will happen?

Variant 1: c is a one-dimensional list of the same length as (a x b) (flattened)

Variant 2: c is a list of the same shape as (a x b)

What actually happened?

Can you replicate the behavior? If yes, how?

Example 1: Working with no split

Example 2: Working with all inner products

Example 3: Failing when length matches but not shape

Example 4: Failing although shapes are correct.

Conclusion

Preliminary Implementation Thoughts

Final thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions