Skip to content

Conversation

@kevinrr888
Copy link
Member

@kevinrr888 kevinrr888 commented Jan 7, 2026

This adds checks when adding an iterator that the given iterator does not conflict with any existing iterators. Conflict meaning same name or same priority. Iterators can be added several ways, and previously only TableOperations.attachIterator and NamespaceOperations.attachIterator would check for conflicts. This adds iterator conflict checks to:

  • Scanners at the time they are used
  • TableOperations.setProperty
  • TableOperations.modifyProperties
  • NewTableConfiguration.attachIterator
  • NamespaceOperations.attachIterator (was previously only checking for conflicts with iterators in the namespace, now also checks for conflicts with iterators in the tables of the namespace)
  • NamespaceOperations.setProperty
  • NamespaceOperations.modifyProperties
  • CloneConfiguration.Builder.setPropertiesToSet

This also accounts for the several ways in which conflicts can arise:

  • Iterators that are attached directly to a table (either through TableOperations.attachIterator, TableOperations.setProperty, or TableOperations.modifyProperties)
  • Iterators that are attached to a namespace, inherited by a table (either through NamespaceOperations.attachIterator, NamespaceOperations.setProperty, or NamespaceOperations.modifyProperties)
  • Conflicts with default table iterators (if the table has them)
  • Adding the exact iterator already present should not fail

This commit also adds a new IteratorConflictsIT to test all of the above.

Part of #6030

This commit adds checks when adding an iterator that the given iterator does not conflict with any existing iterators. Conflict meaning same name or same priority. Iterators can be added several ways, and previously only TableOperations.attachIterator and NamespaceOperations.attachIterator would check for conflicts. This commit adds iterator conflict checks to:
- Scanner.addScanIterator
- TableOperations.setProperty
- TableOperations.modifyProperties
- NewTableConfiguration.attachIterator

Note that this does not add conflict checks to NamespaceOperations.setProperty or NamespaceOperations.modifyProperties, these will be done in another commit.

This commit also accounts for the several ways in which conflicts can arise:
- Iterators that are attached directly to a table (either through TableOperations.attachIterator, TableOperations.setProperty, or TableOperations.modifyProperties)
- Iterators that are attached to a namespace, inherited by a table (either through NamespaceOperations.attachIterator, NamespaceOperations.setProperty, or NamespaceOperations.modifyProperties)
- Conflicts with default table iterators (if the table has them)
- Adding the exact iterator already present should not fail

This commit also adds a new IteratorConflictsIT to test all of the above.

Part of apache#6030
Adds conflict checks to:
- NamespaceOperations.attachIterator (was previously only checking for conflicts with iterators in the namespace, now also checks for conflicts with iterators in the tables of the namespace)
- NamespaceOperations.setProperty (check conflicts with namespace iterators and all tables in the namespace)
- NamespaceOperations.modifyProperties (check conflicts with namespace iterators and all tables in the namespace)

New tests to IteratorConflictsIT to test the above
@kevinrr888 kevinrr888 added this to the 2.1.5 milestone Jan 7, 2026
@kevinrr888 kevinrr888 self-assigned this Jan 7, 2026
Copy link
Member Author

@kevinrr888 kevinrr888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From running sunny day tests and all the tests I have changed in this PR, noticed that I unknowingly added new permission requirements to at least TableOperations.create() (new required permission ALTER_NAMESPACE) and Scanner.addScanIterator() (new required permission ALTER_TABLE). I imagine this is a blocker for these changes at this point, but let me know if it's not. I'll look into an alternative to avoid these permissions. See changes to ConditionalWriterIT, ScanIteratorIT, and ShellServerIT for examples of the failures I encountered.

Checks are now done server side as of cb2eccb, avoiding these permission requirements.

@kevinrr888
Copy link
Member Author

kevinrr888 commented Jan 8, 2026

Transferring to WIP until I resolve #6040 (review)
Edit: Addressed

@kevinrr888 kevinrr888 marked this pull request as draft January 8, 2026 15:53
@ctubbsii
Copy link
Member

ctubbsii commented Jan 8, 2026

Discussed iterator conflicts today, and here's a summary of some key points:

  1. Conflict within the config: In configuration, no two iterators at the same scope (scan, minc, majc) may be able to have the same priority.
    • This applies only to the complete view of the TableConfiguration, with all inherited properties from parent configs (namespace, system, ...), so it is okay, for example, if a table config set at the namespace level is overridden in part at the table level, so that the one single iterator at that scope and priority has configuration that spans across two levels of the configs. What is important is that the resulting view of the TableConfiguration when trying to construct an iterator stack, will not show any two different iterators at the same scope with the same priority.
    • Checks could be in place when editing table/namespace configuration to ensure a priority isn't "doubled up". A user who wishes to replace iteratorA with iteratorB at the same priority would have to remove iteratorA before adding iteratorB, or would have to use modifyProperties to atomically mutate the properties to remove and add at the same time, in order to avoid an error. This alone, however, does not guarantee that there isn't a conflict. If iteratorA had been set at the namespaceN.tableT, but iteratorB was being added to namespaceN, we would have to check that there isn't a conflict with any of the tables in namespaceN. That's not exactly practical, so we may just want to check that there isn't a conflict at the level being modified, and rely on later checks when setting up the iterator stack to verify that there isn't a conflict overall.
    • Note: this probably would be easier to deconflict if our iterator configs used a different property key scheme that was more overrideable atomically, like table.<scope>.iterator.<priority>=class,opt1key=opt1val,opt2key=opt2val,.... so that it wouldn't be possible to have conflict between namespace and table configs, because one would fully override the other. But, that's not what we have today.
  2. Conflict in user-supplied iterators for a specific scan/compaction: No two iterators in a single client-initiated operation's settings may have the same priority.
    • This is for things like scan-time iterators set in the API for a scan, and passed over the RPC, rather than iterators set in configuration on the server-side.
    • This also applies to any other place where we might be able to specify iterators that aren't in the configuration (compactions, conditional mutations, etc.)
    • We could check for conflicts set on the scanner easily, but would have to rely on the server-side setting up the iterator stack to ensure no conflicts between the user iterators and those set on the table.
  3. Conflict between configured iterators and user-supplied iterators: The complete iterator stack for an operation may not have any iterators running at the same priority, regardless of whether it came from the configuration or from the client API/RPC request.
    • To address this, we can simply check the full iterator stack when it is being constructed on the server-side, and fail the operation if any priorities are reused, regardless of where they came from.
    • Alternatively, we could treat one as overriding another, but I don't think that's a very good idea.
    • As a follow-on improvement here, we could treat all configured iterators as higher priority than all client operation-specific (scan-time/compaction-time) iterators:
      1. Instead of ordering three configured iterators and two user-supplied iterators by priority alone, as in C1, U2, C3, U4, C5, we would instead order them as C1, C3, C5, U2, U4.
      2. This enables stronger security guarantees by preventing a user-supplied iterator from seeing data that is filtered out in a administrator-configured iterator.
      3. This prevents bugs that could be caused by a user-supplied iterator that transforms data in a way that a subsequent administrator-configured iterator won't be able to handle.
      4. This is a behavior change, and may break some people's (ill-advised) uses, but I think it is better overall.
      5. This would also open the possibility of having a cleaner client-side API, because you don't actually have to specify priority numbers on the client. Instead, clients only need to order user-supplied iterators with respect to other user-supplied iterators, and won't need a priority number to indicate a global ordering that includes the configured iterators for a table. So, we could have an API something like: scanner.map(iterator1).map(iterator2).map(iterator3).scan().

- Moves the iterator conflict check for create table from client side to server side.
- Checking if iterators added to scanner conflict with those already set on the table moved from client side to server side.
- Adds iterator conflict checks to CloneConfiguration.Builder.setPropertiesToSet. This check is done server side.
- Adds testing to IteratorConflictsIT for CloneConfiguration.Builder.setPropertiesToSet
@kevinrr888 kevinrr888 marked this pull request as ready for review January 13, 2026 18:49
@keith-turner
Copy link
Contributor

@kevinrr888 why did you choose to do this work in 2.1 instead of in main? Seems there is chance if introducing new bugs in scans or compactions. Also may make config that used to work stop working (that is probably a good thing overall as it can help detect existing problems, but could introduce temporary pain). I am not opposed to making this change in 2.1, but was just curious.

@kevinrr888
Copy link
Member Author

@kevinrr888 why did you choose to do this work in 2.1 instead of in main? Seems there is chance if introducing new bugs in scans or compactions. Also may make config that used to work stop working (that is probably a good thing overall as it can help detect existing problems, but could introduce temporary pain). I am not opposed to making this change in 2.1, but was just curious.

@keith-turner I had already started this work in 2.1 with #5990 thinking this was a one-off issue with NewTableConfiguration. I did not anticipate follow on work requiring changes in as many areas, so continued with 2.1 learning the scope of the issue as I went. I also thought this validation would be good to have in the earliest version possible since it is essentially a bug. I would be fine refactoring this for main if we think this is too risky or undesired for 2.1.

@keith-turner
Copy link
Contributor

I would be fine refactoring this for main if we think this is too risky or undesired for 2.1.

There are benefits and risk with this change. Maybe the best way to get the benefits and lower the risk is to make these changes only warn in 2.1 and fail in 4.0? That way things that were working in 2.1.4 and earlier do not blow up in 2.1.5, but still work and get a warning that iterator config is not correct and could lead to non-deterministic behavior.

@kevinrr888
Copy link
Member Author

make these changes only warn in 2.1 and fail in 4.0

This is good with me, I'll change this

Also fixed a bug where I was calling regex.matches(str) instead of
str.matches(regex)
Comment on lines -177 to -180
} catch (NumberFormatException e) {
throw new AccumuloException("Bad value for existing iterator setting: "
+ property.getKey() + "=" + property.getValue());
}
Copy link
Contributor

@keith-turner keith-turner Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the new code is not translating this NumberFormatException, is this change in exceptions going to ripple through to the client API? When we parse the props to IteratorSetting we could maintain these exceptions in the new code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this does change the exceptions the client might receive from this. They will always receive an AccumuloException before and after these changes, but they no longer receive AccumuloExceptions for two things: AccumuloExceptions if parts.length != 2 or the priority isn't a number. With my changes, I check properties against a regex (see new static vars in IteratorConfigUtil), ignoring those that don't match. So if parts.length != 2 or the priority isn't a number, it is quietly ignored as not being an iterator-related property.

I can add back AccumuloExceptions for these two cases if you think that is best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is quietly ignored as not being an iterator-related property.

We should not quietly ignore invalid properties under the iterator prefix, this could be a sign of a typo. If its a typo then the user could expect something to be working and it silently does not. Would be good to fail or warn for that. Made a comment elsewhere about the scope, seems this new code and the old code ignores invalid scopes. Not completely sure about this this though.

kevinrr888 and others added 4 commits January 16, 2026 16:50
Iterator option conflicts were not considered in new impl of checkIteratorConflicts
Co-authored-by: Keith Turner <kturner@apache.org>
@kevinrr888
Copy link
Member Author

I pushed 469a83e to have the newly added iterator conflict checks only log instead of throw in 2.1. In merging to main, all checks will throw.
Still need to update the new IteratorConflictsIT, which currently expects these checks to throw

Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking at this. Feel free to ignore some of the comments, can be follow on issues. I keep noticing existing problems when looking at this.

EnumSet.of(IteratorScope.scan), Map.of(IteratorScope.scan, picIteratorSettings),
false);
} catch (AccumuloException e) {
throw new IllegalArgumentException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this fail the scan? If so should this warn? If we do warn does that mean this code is untestable in 2.1?

Copy link
Member Author

@kevinrr888 kevinrr888 Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this fail the scan? If so should this warn?

This method call will only log and not throw an AccumuloException (note false param). It is impossible for this to throw an AccumuloException, so I could change IllegalArgumentException to an assertion error or something to make this more clear that it can't happen.

This is not the case for all conflict check methods in IteratorConfigUtil, as some perform table ops, namespace ops, etc. so they can still throw an AccumuloException, but this specific method cannot throw an AccumuloException when false is provided. I considered writing different methods with different signatures (e.g., one that "throws AccumuloException" and one that does not), but that made things even more bloated in IteratorConfigUtil, and this approach couldn't be applied to all conflict check methods (since as I mentioned some still needed to throw an AccumuloException for other reasons).

If we do warn does that mean this code is untestable in 2.1?

I'm working on testing the warnings via checking the logs (working on changing IteratorConflictsIT), which is proving to be difficult, but I think it's possible

}
}

public static void checkIteratorConflicts(Map<String,String> props, IteratorSetting iterToCheck,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing this code and the exsiting code in this area, noticed there are multiple places that parse and serialize iter config. This code is adding to that existing pattern. Was wondering if we could avoid this for the new code and experimented to see what was possible. Came up w/ the following. Its very clunky, but avoids duplicating code to parse iterator config. Feel free to ignore this, was just trying to understand what we could do. Can do things in follow on issues, would like to narrow the set of changes for this rather than widen them.

 public static List<IteratorSetting> propertiesToIteratorSettings(IteratorScope scope,AccumuloConfiguration config, Consumer<Map<String, Map<String, String>>> extraOptsConsumer){
    Map<String, Map<String, String>> allOptions = new HashMap<>();
    List<IterInfo> info = parseIterConf(scope, List.of(), allOptions, config);
    List<IteratorSetting> iterSettings = new ArrayList<>(info.size());
    info.forEach(iterInfo -> {
      var options = allOptions.remove(iterInfo.getIterName());
      var iterSetting =new IteratorSetting(iterInfo.getPriority(), iterInfo.getIterName(), iterInfo.getClassName());
      if(options != null){
        iterSetting.addOptions(options);
      }

      iterSettings.add(iterSetting);
    });

    if(!allOptions.isEmpty()) {
      // these are itertor options w/ no iterator definition
      extraOptsConsumer.accept(allOptions);
    }

    return iterSettings;
  }

  public static void checkIteratorConflicts(Map<String,String> props, IteratorSetting iterToCheck,
      EnumSet<IteratorScope> iterScopesToCheck) throws AccumuloException {
    // parse the props map
    Map<IteratorScope,List<IteratorSetting>> existingIters =
        new HashMap<>(IteratorScope.values().length);

    for(var scope : iterScopesToCheck) {
      var iterSettings = propertiesToIteratorSettings(scope, new ConfigurationCopy(props), extraOpts->{
        // TODO this used throw an exception w/ the first extra option it saw,not its all extra so could be a much larger message which may cause problems for logging
        String msg = String.format("iterator options conflict for %s : %s",
                iterToCheck.getName(), extraOpts);
        throw new AccumuloException(new IllegalArgumentException(msg));
      });
      existingIters.put(scope,iterSettings);
    }

    // check if the given iterator conflicts with any existing iterators
    checkIteratorConflicts(iterToCheck, iterScopesToCheck, existingIters);
  }

This would be a follow on issue, but it would be nice to consolidate parsing/serializing in the existing code. Maybe its better address this comprehensively in a PR focused on this single problem rather than partially here for only new code, not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #6074

Comparator.comparingInt(IterInfo::getPriority);

private static final String ITERATOR_PROP_REGEX =
("^" + Property.TABLE_ITERATOR_PREFIX.getKey() + "(" + Arrays.stream(IteratorScope.values())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way these regexes are used for validation they ignore properties with the iterator prefix that do not have a valid scope. Like maybe table.iterator.sacn.fooFilter would be completely ignored and would not flagged as a problem. Seems like the old validation code had this same problem, so this is not a new behavior AFAICT. This should probably be a follow on issue to look for invalid scopes following the iterator prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I could do is:

  1. if they don't match the regex, but they do begin with the table iterator prefix table.iterator. I can throw an exception.

This will add new validation/new cases where an exception will be thrown (for example, previously only a couple things were validated about the iterator property like the string in the expected spot for priority had to be an int, and the property value had to be length 2 when split by ",").

If we go with (1), user code could now throw exceptions where they weren't before, but this is probably fine as their code is broken anyways

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done in follow on, not necessarily needed for this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by #6074. It suggest the centralized parsing code validate this.

Comment on lines -177 to -180
} catch (NumberFormatException e) {
throw new AccumuloException("Bad value for existing iterator setting: "
+ property.getKey() + "=" + property.getValue());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is quietly ignored as not being an iterator-related property.

We should not quietly ignore invalid properties under the iterator prefix, this could be a sign of a typo. If its a typo then the user could expect something to be working and it silently does not. Would be good to fail or warn for that. Made a comment elsewhere about the scope, seems this new code and the old code ignores invalid scopes. Not completely sure about this this though.

kevinrr888 and others added 3 commits January 22, 2026 14:43
- Avoid two look up for some comparisons
- Fix some bugs with the iterator conflict checks done on clone table

Co-authored-by: Keith Turner <kturner@apache.org>
@kevinrr888
Copy link
Member Author

@keith-turner - I believe I addressed and/or responded to all non-follow on issues for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants