Skip to content

Conversation

@Praveenraj-K
Copy link
Contributor

@Praveenraj-K Praveenraj-K commented Sep 24, 2025

IDP-1660 Fix dynamodb-backup-restore problem of double-base64-encoded Binary (B) values


Added

  • Added function to identify "B" binary attribute and decodes to bytes

@Praveenraj-K Praveenraj-K requested a review from a team as a code owner September 24, 2025 12:55
@Praveenraj-K Praveenraj-K requested review from briskt and hobbitronics and removed request for a team September 24, 2025 12:55
Copy link
Contributor

@briskt briskt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent beginning. Good work. I would recommend adding a unit test for your new function. That way you don't need to test with a real DynamoDB database. I believe there are some problems with the code thus far, but it's a great start. Thanks for working on it.

Comment on lines +292 to +293
return value
return item
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines are inconsistent. return value will return a string which is the base64-decoded equivalent of the B attribute value. return item will return the original dict. The calling code is not equipped to handle this difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, thanks @briskt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this, @Praveenraj-K.

Along the lines of what @briskt said, I think we will want to ensure the base64-decoded data is assigned back to the value of that B key in the structure being decoded, since DynamoDB will be expecting it there, so I think returning item is the desired behavior (once we've decoded the value in the item).

Some automated tests will probably help clarify that little details like that are all correct, as @briskt said. I don't know Python well enough to ensure we're handling things like data types properly in this code (e.g. base64.b64decode() apparently returns an array of bytes... does that need to be converted to a string using something like decode('utf-8')? I don't know.)

return value
return item
else:
return {k: decode_binary(v) for k, v in item.items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this recursive adds unnecessary complexity since we know what the incoming format is. Speaking of which, should we make this code fully general-purpose, not assuming any particular data schema? If so, this code will need to handle M and L types appropriately. If not, the code can probably be simplified a bit. @forevermatt what would you recommend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, i also thought of handling multiple attributes type like BS set, M, and L. But decided to go simple for now as we have issue only with the Binary types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briskt If we can do so easily enough, I think it's worth making this general purpose, so we can use it for any DynamoDB databases we have (now or in the future), like our mysql-backup-restore and postresql-backup-restore tools).

However, since we don't currently use Map, List, or BinarySet data types, would could potentially just return an error / throw an exception if those types are encountered, so that we don't need to add support for them now (when we don't need them) but to make it clear that the restore does not work for those kinds of values.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I certainly wouldn't want any one of us to write code that will likely never be utilized. Throwing an exception for an unsupported data type seems like a good way to handle it.
I'll throw this nag in here once more. This feels like reinventing the wheel since AWS does have restore capability ready for use. I know it would require jumping through different hoops with the Terraform state, but it feels more appropriate for our use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briskt I'm beginning to think you're right about that: it may be worth revisiting the decision to write the restore code ourselves instead of using AWS's ability to do that for us.

@devon-sil or @Praveenraj-K, thoughts? I think the downside was that it requires manually deleting the original DynamoDB table (in both regions) before we could do a "Restore from S3" to (re-)create the restored table with the correct name, then set up the replica (i.e. to make it a global table). I might be forgetting something, though.

Many of my notes from experimenting with various options are here: https://itse.youtrack.cloud/issue/IDP-1262

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this PR be closed or revised? @devon-sil what are your thoughts?

@Praveenraj-K
Copy link
Contributor Author

This is an excellent beginning. Good work. I would recommend adding a unit test for your new function. That way you don't need to test with a real DynamoDB database. I believe there are some problems with the code thus far, but it's a great start. Thanks for working on it.

Thanks @briskt :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants