Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making --error-on-invalid-index the default behavior #422

Closed
zubeyr94 opened this issue Sep 25, 2024 · 15 comments
Closed

Making --error-on-invalid-index the default behavior #422

zubeyr94 opened this issue Sep 25, 2024 · 15 comments

Comments

@zubeyr94
Copy link
Contributor

zubeyr94 commented Sep 25, 2024

Hello, currently the default behavior in pg_repack is to repack tables even if tables have invalid indexes on them. Users can force pg_repack to error when there are invalid indexes on tables before they are repacked by providing -error-on-invalid-index parameter on pg_repack 1.5.0 and later versions.

However, repacking tables with invalid indexes would cause the index to be corrupted, because repack will skip repacking the invalid index and old heap tuple id in the index now be pointing to incorrect tuples on the relation. As a safety feature it would make sense to make --error-on-invalid-index the default behavior and there could be an option like --warn-on-invalid-index or --no-error-on-invalid-index for users who want to keep the old behavior.

@MasahikoSawada
Copy link
Contributor

However, repacking tables with invalid indexes would cause the index to be corrupted. As a safety feature it would make sense to make --error-on-invalid-index the default behavior

What do exactly mean by safety feature? It's right that invalid indexes could corrupted after repacking a table, but invalid indexes are not used unless they are reindexed.

@zubeyr94
Copy link
Contributor Author

Postgres continues to update indexes if indisready set to true, even if index is invalid, ie indisvalid is false. On a corrupted index, checks made during index insertion could cause transactions to rollback, eg this check could fail. Until users realize this and drop/reindex the indexes they won't be able to unblock the transaction doing the insertion. Instead of taking the hit while running regular workload, it should be preferable to take the hit during maintenance work, ie while running pg_repack and make users aware that they need to reindex or drop before completing the repack.

@MasahikoSawada
Copy link
Contributor

I think that transaction rollbacks caused by inserting indexes tuples to invalid indexes can happen immediately since indexes becoming invalid, regardless of pg_repack. That is, if we make the --error-on-invalid-index behavior default, we will enforce users to rebuild or drop invalid indexes before pg_repack by default. I guess the current behavior is more flexible
in terms of the timing of rebuilding or dropping invalid indexes.

Also, I'm concerned that this change would break the compatibility of tools that use --error-on-invalind-index option.

Another idea would be to have pg_repack use command options via an environment variable, say PG_REPACK_OPTIONS, and merge the options to the specified command options.

@zubeyr94
Copy link
Contributor Author

zubeyr94 commented Oct 2, 2024

Invalid indexes does not mean a corrupted index most of the time. However, repacking a table without reindexing the invalid indexes definitely creates corrupted(index with stale entries) invalid indexes. This increases the likelihood of causing an unavailability. Failing the maintenance work should be more preferable over potentially causing unavailability.

For backward compatibility, we can keep --error-on-invalind-index but this will be no-op if we make this default behavior.

We could potentially let users to determine the behavior when an invalid index is detected, eg reindex, drop or error.

@MasahikoSawada
Copy link
Contributor

Invalid indexes does not mean a corrupted index most of the time. However, repacking a table without reindexing the invalid indexes definitely creates corrupted(index with stale entries) invalid indexes. This increases the likelihood of causing an unavailability.

Understood. It would be better to have pg_repack stop if it detects invalid indexes to prevent such a situation. Could you provide test scenario for the corruption on invalid indexes with pg_repack?

Also, it might be worth considering to have pg_repack stop only if there is an invalid but ready indexes.

For backward compatibility, we can keep --error-on-invalind-index but this will be no-op if we make this default behavior.

I agree not to remove --error-no-indvalid-index. Perhaps we can add --no-error-on-invalid-index option to keep the current behavior (raising a warning). Specifying both options raises an error for conflict options.

Feedback is very welcome

@zubeyr94
Copy link
Contributor Author

zubeyr94 commented Oct 7, 2024

Understood. It would be better to have pg_repack stop if it detects invalid indexes to prevent such a situation. Could you provide test scenario for the corruption on invalid indexes with pg_repack?

yeah, I will try to come up with a minimalist test scenario that will show the issue.

Also, it might be worth considering to have pg_repack stop only if there is an invalid but ready indexes.

I think this makes sense. I did not take a close look at the code, but I am guessing having an index not indisready and not indisvalid is a possibility. And those indexes will not cause any threads to availability.

I agree not to remove --error-no-indvalid-index. Perhaps we can add --no-error-on-invalid-index option to keep the current behavior (raising a warning). Specifying both options raises an error for conflict options.

Feedback is very welcome

Can we drop the invalid indexes to unblock pg_repack, print a warning and continue? Anyways, users will not be able to use those indexes without reindexing.

@MasahikoSawada
Copy link
Contributor

yeah, I will try to come up with a minimalist test scenario that will show the issue.

Thanks!

Can we drop the invalid indexes to unblock pg_repack, print a warning and continue? Anyways, users will not be able to use those indexes without reindexing.

I'm concerned it might be too aggressive for the default behavior. I'm fine if it's an optional behavior.

@MasahikoSawada
Copy link
Contributor

Once zubeyr94 prepared a test case, I think we can proceed with this change if others also agree with it; making --error-on-invalid-index the default behavior.

As we discussed, this change would break the backward compatibility, but it would prevent the issue of failing INSERT/UPDATE to indexes by disallowing to repack tables having invalid indexes.

There is room for discussion on how we change options. For example, we would keep --error-on-invalid-index option but would make it no-op. To keep the current behavior, we can add new option like --no-error-on-invalid-index or --warn-error-on-invalid-index.

Before proceeding with the discussion for details, I'd like to hear how others think about the change of making --error-on-invalid-index the default behavior. Feedback is very welcome.

@andreasscherbaum
Copy link
Collaborator

While it is breaking backwards compatibility, I believe that's an edge case anyway, and it improves the overall situation.

Definitely in favor of keeping the current option for a while, but retiring it. Otherwise this will break scripts, and that is not necessary.

@zubeyr94
Copy link
Contributor Author

Even though, it sounds aggressive to drop invalid indexes during pg_repack would not that provide better experience for users? When users run pg_repack they will be sure that it will complete and they will not be surprised with an error the other day when they look at the results of pg_repack.

zubeyr94 added a commit to zubeyr94/pg_repack that referenced this issue Nov 11, 2024
zubeyr94 added a commit to zubeyr94/pg_repack that referenced this issue Nov 11, 2024
zubeyr94 added a commit to zubeyr94/pg_repack that referenced this issue Nov 12, 2024
Skipping repacking the invalid indexes could lead to corrupting the
invalid indexes. In some rare cases, this might cause inserts to
tables to fail as changes to a table continuously be applied to
invalid indexes on it and updates to corrupted invalid indexes
potentially might fail by the checks done during insertion.
Therefore, by default do not repack a table with invalid indexes.
Repack only user explicitly specifies --no-error-on-invalid-index.
@zubeyr94
Copy link
Contributor Author

zubeyr94 commented Nov 12, 2024

I was able to cause invalid index corruption using pg_repack, however I was not able to produce a scenario which causes a transaction rollback. Btree insert does checks to ensure the consistency of the indexes however it does not catch all the corruptions. I still think that is better to skip repacking tables with invalid indexes to prevent possible unavailability in the future because of corrupted invalid indexes. I have published a pull request, please let me know about your thoughts.: #433

@andreasscherbaum
Copy link
Collaborator

I was able to cause invalid index corruption using pg_repack,

That is something which you can report upstream.

zubeyr94 added a commit to zubeyr94/pg_repack that referenced this issue Nov 13, 2024
Skipping repacking the invalid indexes could lead to corrupting the
invalid indexes. In some rare cases, this might cause inserts to
tables to fail as changes to a table continuously be applied to
invalid indexes on it and updates to corrupted invalid indexes
potentially might fail by the checks done during insertion.
Therefore, by default do not repack a table with invalid indexes.
Repack only user explicitly specifies --no-error-on-invalid-index.
@zubeyr94
Copy link
Contributor Author

I have informally communicated that checks in btree is not able to catch all the inconsistencies. They think that is expected and the best way is to use amcheck for any corruption check.

Invalid index corruption issue is not an issue in upstream, right. If we repack a table without reindexing any index on the table, the index will still be pointing to old htids, which will lead to corruptions.

za-arthur pushed a commit that referenced this issue Nov 28, 2024
Skipping repacking the invalid indexes could lead to corrupting the
invalid indexes. In some rare cases, this might cause inserts to
tables to fail as changes to a table continuously be applied to
invalid indexes on it and updates to corrupted invalid indexes
potentially might fail by the checks done during insertion.
Therefore, by default do not repack a table with invalid indexes.
Repack only user explicitly specifies --no-error-on-invalid-index.
@za-arthur
Copy link
Collaborator

The PR #433 was merged.

@zubeyr94
Copy link
Contributor Author

Thank you.

Just as a side note, making --error-on-invalid-index the default behavior definitely improved the general behavior, there is an edge case that if user tries to create an index after pg_repack is started on a table(passed invalid index check) and the index creation fails and lefts an invalid index on the table. The repack will still succeed, this might still cause corrupted invalid indexes, though this should be pretty rare case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants