[treeplayer] Fix TTreeFormula vector index into vector-of-vectors#22601
[treeplayer] Fix TTreeFormula vector index into vector-of-vectors#22601guitargeek wants to merge 1 commit into
Conversation
Test Results 21 files 21 suites 3d 8h 7m 44s ⏱️ For more details on these failures, see this check. Results for commit 37e4805. ♻️ This comment has been updated with latest results. |
| TEST_F(GH19290, NoRegressions) | ||
| { | ||
| EXPECT_EQ(GH19290EvalAll(fTree, "vv1[0]"), (std::vector<double>{2, 1})); | ||
| EXPECT_EQ(GH19290EvalAll(fTree, "vv1[1]"), (std::vector<double>{3, 4})); |
There was a problem hiding this comment.
| EXPECT_EQ(GH19290EvalAll(fTree, "vv1[1]"), (std::vector<double>{3, 4})); | |
| EXPECT_EQ(GH19290EvalAll(fTree, "vv1[1]"), (std::vector<double>{...})); | |
| EXPECT_EQ(GH19290EvalAll(fTree, "vv1[0][1]"), (std::vector<double>{...})); | |
| EXPECT_EQ(GH19290EvalAll(fTree, "vv1[1][1]"), (std::vector<double>{...})); |
for good measures
When a vector-type branch was used as the index of a vector-of-vectors
branch, TTreeFormula computed the wrong number of instances and emitted
out-of-bounds errors. For example, with v2 a vector<int> used as an index:
vv1[v2][0] looped over the summed size of all sub-vectors of vv1
instead of the length of v2, printing a spurious extra
instance and "index out of bound" errors.
vv1[0][v2] silently dropped its last instance.
Two issues in the multi-variable-dimension machinery:
1. DefineDimensions registered the inner jagged dimension as a variable
dimension even when the outer dimension is selected through a variable
"gather" index (-2) and the inner dimension is pinned to a constant.
In that configuration the formula does not iterate over the jagged
sub-dimension, so the manager must not sum the physical sub-sizes; the
number of instances is simply the length of the index variable.
2. GetRealInstance performed a premature out-of-bounds check against the
physical sub-size for a variable inner index (-2), using the raw
instance number (the index of the index variable) rather than the
evaluated index. The real bounds are already checked after evaluating
the index variable.
Adds regression tests to treeplayer_regressions.
Closes root-project#19290 (originally JIRA ROOT-8726).
🤖 Done with the help of [Claude Code](https://claude.com/claude-code) (Claude Opus 4.8)
| // sub-sizes (e.g. vv1[v2][0] would loop over the total size of vv1 instead of | ||
| // the length of v2). So only register it when it actually varies during the | ||
| // iteration. See https://github.com/root-project/root/issues/19290 | ||
| bool pinnedInnerDuringGather = (fIndexes[code][0] == -2) && (fIndexes[code][fNdimensions[code]] >= 0); |
There was a problem hiding this comment.
This smell a little too specific. Can we verify whether we get the correct behavior for:
vv1[v2][v3]
There was a problem hiding this comment.
Well it doesn't work, but what do you expect to come out of it? This would first need a decision about the semantics.
Would you expect v2 and v3 to get zipped together to form index tuples (vectorized indexing)?
Like this, in pseudo code:
vv1 = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
v2 = [0, 1]
v3 = [0, 1]
vv1[v2][v3] resolves to vv1[[0, 1], [0, 1]], becoming [1 5]Or would you expect it to select slices (orthogonal indexing)?
vv1[v2][v3] resolves to [[1, 2],
[4, 5]]
Is the expected behavior documented anywhere? If not, and it given that this doesn't work before and after this PR, I think a) the fact that nobody requested it and b) the semantic ambiguity leads me to the conclusion that this is a candidate for a known limitation, and we should just make sure there is a proper error.
But if you pick a semantic and it turns out to be easy to implement, we can also try to go for that.
By the way, NumPy uses the index tuple convention for its bracket operator:
import numpy as np
vv1 = np.array([[1, 2, 3], [4, 5, 6]])
v2 = np.array([0, 1])
v3 = np.array([0, 1])
print(vv1[v2, 0])
print(vv1[0, v3])
print(vv1[v2, v3])[1 4]
[1 2]
[1 5]So if I'd get to pick, I would go for this meaning.
When a vector-type branch was used as the index of a vector-of-vectors branch, TTreeFormula computed the wrong number of instances and emitted out-of-bounds errors. For example, with v2 a vector used as an index:
vv1[v2][0] looped over the summed size of all sub-vectors of vv1
instead of the length of v2, printing a spurious extra
instance and "index out of bound" errors.
vv1[0][v2] silently dropped its last instance.
Two issues in the multi-variable-dimension machinery:
DefineDimensions registered the inner jagged dimension as a variable dimension even when the outer dimension is selected through a variable "gather" index (-2) and the inner dimension is pinned to a constant. In that configuration the formula does not iterate over the jagged sub-dimension, so the manager must not sum the physical sub-sizes; the number of instances is simply the length of the index variable.
GetRealInstance performed a premature out-of-bounds check against the physical sub-size for a variable inner index (-2), using the raw instance number (the index of the index variable) rather than the evaluated index. The real bounds are already checked after evaluating the index variable.
Adds regression tests to treeplayer_regressions.
Closes #19290 (originally JIRA ROOT-8726).
🤖 Done with the help of Claude Code (Claude Opus 4.8)