This repository contains tools for extracting and analyzing transaction enrichment data from DynamoDB tables.
Extracts enrichment data from the transaction-enrichment-store
table for EDA analysis.
Attempts to join enrichment data with transaction data (note: join relationship needs verification).
pip install -r requirements.txt
All scripts support multiple ways to specify AWS credentials:
python extract_enrichment_data.py --profile your-profile-name --sample 1000
export AWS_PROFILE=your-profile-name
python extract_enrichment_data.py --sample 1000
export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key
export AWS_SESSION_TOKEN=your-session-token # if using temporary credentials
python extract_enrichment_data.py --sample 1000
python extract_enrichment_data.py --profile your-profile --sample 1000
python extract_enrichment_data.py --profile your-profile
--sample N
: Extract only N enrichment records for testing--profile
: AWS profile name from ~/.aws/credentials--region
: AWS region (default: eu-west-1)--output
: Custom output filename
python extract_and_join_transactions.py --profile your-profile --sample 10
The enrichment extraction creates:
enrichment_data_YYYYMMDD_HHMMSS_sample_N.parquet
(sample mode)enrichment_data_YYYYMMDD_HHMMSS.parquet
(full mode)
The enrichment data contains:
- Basic Info: transaction_id, merchant, website, location
- Categorization: labels, recurrence, label_group
- Geographic: location_city, location_country, coordinates
- Additional: person, intermediaries, logos
- Quality Score: calculated enrichment completeness score
Use the included Jupyter notebook for comprehensive EDA:
jupyter notebook enrichment_eda.ipynb
The notebook provides:
- Data quality analysis
- Merchant and geographic insights
- Transaction categorization analysis
- Temporal patterns
- Enrichment quality scoring
- Coverage: 65% merchant data, 99% geographic data, 100% categorization
- Categories: Peer-to-peer transfers (13%), groceries (9%), e-commerce (7%)
- Geography: Primarily GB (47%), with US (3%) and other countries
- Quality: Average enrichment score of ~60/100
- Recurrence: 97% one-off transactions, 2% recurring, 1% subscription
extract_enrichment_data.py
- Main extraction scriptenrichment_eda.ipynb
- Comprehensive EDA notebookinspect_transactions_table.py
- Table schema analyzerdebug_transactions.py
- Join relationship debuggerrequirements.txt
- Python dependencies