Undoing bank's concatenated tables #1855
hseg
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
My bank hasn't yet figured out multi-sheet spreadsheets, so my monthly statements are formatted as a single Excel sheet with blank rows separating them. ie after the
in2csv
step below, I get a file like (Note for some of the complexity: The report is in Hebrew, which abbreviates using ' and ")To split them up into usable chunks, I use the following script:
What this does is:
Split the concatenated tables
Strip the initial/terminal whitespace from each cell
(probably overkill -- only the string cells probably need it, but I'm not sufficiently confident of that to restrict it, and I'd like the script not to break so easily on schema changes (though it'd be nice to detect those -- I'm having trouble figuring out how to get the headers of a csv file (which I could then diff against the old schema)))
Automatically reshape the split csvs to drop empty rows/cols
Comment out the metadata lines at the top
Fix the report sometimes using '' to represent "
Extract the table name (first line of metadata), and rename the file to
include that part. On the fence whether to strip the ", probably will end up
leaving it in
As you can see from the many TODO lines, I'm uncertain of the best approach here, and would appreciate feedback.
Beta Was this translation helpful? Give feedback.
All reactions