-
Notifications
You must be signed in to change notification settings - Fork 0
Feature binaryfix #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…efore parsing the data
…need to add it in the decoder function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent beginning. Good work. I would recommend adding a unit test for your new function. That way you don't need to test with a real DynamoDB database. I believe there are some problems with the code thus far, but it's a great start. Thanks for working on it.
| return value | ||
| return item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines are inconsistent. return value will return a string which is the base64-decoded equivalent of the B attribute value. return item will return the original dict. The calling code is not equipped to handle this difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, thanks @briskt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work on this, @Praveenraj-K.
Along the lines of what @briskt said, I think we will want to ensure the base64-decoded data is assigned back to the value of that B key in the structure being decoded, since DynamoDB will be expecting it there, so I think returning item is the desired behavior (once we've decoded the value in the item).
Some automated tests will probably help clarify that little details like that are all correct, as @briskt said. I don't know Python well enough to ensure we're handling things like data types properly in this code (e.g. base64.b64decode() apparently returns an array of bytes... does that need to be converted to a string using something like decode('utf-8')? I don't know.)
| return value | ||
| return item | ||
| else: | ||
| return {k: decode_binary(v) for k, v in item.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making this recursive adds unnecessary complexity since we know what the incoming format is. Speaking of which, should we make this code fully general-purpose, not assuming any particular data schema? If so, this code will need to handle M and L types appropriately. If not, the code can probably be simplified a bit. @forevermatt what would you recommend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, i also thought of handling multiple attributes type like BS set, M, and L. But decided to go simple for now as we have issue only with the Binary types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briskt If we can do so easily enough, I think it's worth making this general purpose, so we can use it for any DynamoDB databases we have (now or in the future), like our mysql-backup-restore and postresql-backup-restore tools).
However, since we don't currently use Map, List, or BinarySet data types, would could potentially just return an error / throw an exception if those types are encountered, so that we don't need to add support for them now (when we don't need them) but to make it clear that the restore does not work for those kinds of values.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I certainly wouldn't want any one of us to write code that will likely never be utilized. Throwing an exception for an unsupported data type seems like a good way to handle it.
I'll throw this nag in here once more. This feels like reinventing the wheel since AWS does have restore capability ready for use. I know it would require jumping through different hoops with the Terraform state, but it feels more appropriate for our use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briskt I'm beginning to think you're right about that: it may be worth revisiting the decision to write the restore code ourselves instead of using AWS's ability to do that for us.
@devon-sil or @Praveenraj-K, thoughts? I think the downside was that it requires manually deleting the original DynamoDB table (in both regions) before we could do a "Restore from S3" to (re-)create the restored table with the correct name, then set up the replica (i.e. to make it a global table). I might be forgetting something, though.
Many of my notes from experimenting with various options are here: https://itse.youtrack.cloud/issue/IDP-1262
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this PR be closed or revised? @devon-sil what are your thoughts?
Thanks @briskt :) |
IDP-1660 Fix dynamodb-backup-restore problem of double-base64-encoded Binary (B) values
Added