Skip to content

codingfrog/proto-enrichment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transaction Enrichment Data Analysis

This repository contains tools for extracting and analyzing transaction enrichment data from DynamoDB tables.

Available Tools

1. Enrichment Data Extraction (extract_enrichment_data.py)

Extracts enrichment data from the transaction-enrichment-store table for EDA analysis.

2. Transaction Join Extraction (extract_and_join_transactions.py)

Attempts to join enrichment data with transaction data (note: join relationship needs verification).

Setup

pip install -r requirements.txt

AWS Authentication

All scripts support multiple ways to specify AWS credentials:

1. Using AWS Profile (Recommended)

python extract_enrichment_data.py --profile your-profile-name --sample 1000

2. Using Environment Variable

export AWS_PROFILE=your-profile-name
python extract_enrichment_data.py --sample 1000

3. Using AWS Environment Credentials

export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key
export AWS_SESSION_TOKEN=your-session-token  # if using temporary credentials
python extract_enrichment_data.py --sample 1000

Usage

Enrichment Data Analysis (Recommended)

Sample Mode (Test with 1000 records)

python extract_enrichment_data.py --profile your-profile --sample 1000

Full Extraction (89k records)

python extract_enrichment_data.py --profile your-profile

Options

  • --sample N: Extract only N enrichment records for testing
  • --profile: AWS profile name from ~/.aws/credentials
  • --region: AWS region (default: eu-west-1)
  • --output: Custom output filename

Transaction Join (Experimental)

python extract_and_join_transactions.py --profile your-profile --sample 10

Output

The enrichment extraction creates:

  • enrichment_data_YYYYMMDD_HHMMSS_sample_N.parquet (sample mode)
  • enrichment_data_YYYYMMDD_HHMMSS.parquet (full mode)

Data Structure

The enrichment data contains:

  • Basic Info: transaction_id, merchant, website, location
  • Categorization: labels, recurrence, label_group
  • Geographic: location_city, location_country, coordinates
  • Additional: person, intermediaries, logos
  • Quality Score: calculated enrichment completeness score

Analysis

Use the included Jupyter notebook for comprehensive EDA:

jupyter notebook enrichment_eda.ipynb

The notebook provides:

  • Data quality analysis
  • Merchant and geographic insights
  • Transaction categorization analysis
  • Temporal patterns
  • Enrichment quality scoring

Key Insights from Sample Data (5k records)

  • Coverage: 65% merchant data, 99% geographic data, 100% categorization
  • Categories: Peer-to-peer transfers (13%), groceries (9%), e-commerce (7%)
  • Geography: Primarily GB (47%), with US (3%) and other countries
  • Quality: Average enrichment score of ~60/100
  • Recurrence: 97% one-off transactions, 2% recurring, 1% subscription

Files

  • extract_enrichment_data.py - Main extraction script
  • enrichment_eda.ipynb - Comprehensive EDA notebook
  • inspect_transactions_table.py - Table schema analyzer
  • debug_transactions.py - Join relationship debugger
  • requirements.txt - Python dependencies

About

Transaction enrichment prototype

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •