Skip to content

Commit e7c0d57

Browse files
author
Diego Colombatto
committed
Add QuickSight dashboard, improve propensity signal, update README use case section
- Add scripts/create_dashboard.py: fully automated QuickSight dashboard (4 sheets) with Athena/Glue data layer, DIRECT_QUERY datasets, chart subtitles - Update inference_handler.py: pass through contextual columns (campaign, device, category, purchase_amount, impressions, clicks) for dashboard segmentation - Update generate_synthetic_data.py: embed campaign and category propensity multipliers so dashboard charts show meaningful variation across segments - Update undeploy/undeploy.py: add QuickSight resource cleanup (dashboard, analysis, dataset, datasource, IAM inline policy) - Update config.py: add QS_NOTIFICATION_EMAIL, make hardcoded values authoritative over env vars, improve validate() with require_qs_email flag - Update README: rewrite opening and use case section to lead with business value; add Step 6 (QuickSight dashboard) to timing table and project structure; update output format table to reflect passthrough columns; add QuickSight region prerequisite note
1 parent 370b973 commit e7c0d57

File tree

7 files changed

+1190
-19
lines changed

7 files changed

+1190
-19
lines changed

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,9 @@ uv-lockfile-tasks.md
5050

5151
# Kiro
5252
.kiro/
53+
54+
# Local test root directory
55+
local_test/
56+
57+
# Working docs
58+
DASHBOARD_PROPOSAL.md

README.md

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,11 @@
22

33
[![License: MIT-0](https://img.shields.io/badge/License-MIT--0-yellow.svg)](https://opensource.org/licenses/MIT-0)
44

5-
Self-contained, reusable demo for **Customer Propensity Scoring** using AWS Clean Rooms ML with custom training and inference containers.
5+
Self-contained, reusable, and customizable demo showing how an **advertiser** and a **retailer** can jointly predict which customers are most likely to make a purchase — without either party ever sharing their raw data with the other.
6+
7+
The advertiser contributes **ad engagement data** (impressions, clicks, time spent, device type, campaign) and the retailer contributes **purchase behavior data** (product categories, purchase amounts, site visits, conversion history). AWS Clean Rooms ML joins these datasets inside a secure collaboration, trains a propensity model on the combined signal, and scores every customer — all without exposing either party's underlying records.
8+
9+
The output is a ranked list of customers by purchase propensity, visualized in an Amazon QuickSight dashboard that shows which campaigns, categories, and segments drive the highest conversion intent.
610

711
This repo is a sample, to quickly get started with AWS Clean Rooms Custom ML models analysis; it's not meant for production usage AS-IS.
812

@@ -32,11 +36,15 @@ This repo is a sample, to quickly get started with AWS Clean Rooms Custom ML mod
3236

3337
## Use Case: Customer Propensity Scoring
3438

35-
**Scenario:** An advertiser and a retailer want to collaborate on predicting which customers are most likely to convert (make a purchase) based on combined ad engagement and purchase behavior data. Neither party wants to share their raw data with the other.
39+
An **advertiser** knows which users engaged with their ads — but not whether those users actually bought anything. A **retailer** knows which users purchased — but not which ads influenced them. Neither party is willing to share their raw customer data with the other.
40+
41+
By combining both datasets inside an AWS Clean Rooms collaboration, the model learns from the full picture: ad engagement signals from the advertiser and purchase behavior signals from the retailer. The result is a propensity score for every customer that neither party could have produced alone.
42+
43+
**What the advertiser gains:** a ranked list of users to prioritise for ad targeting, based on actual purchase signals — not just clicks.
3644

37-
**Solution:** AWS Clean Rooms ML enables both parties to contribute their data to a secure collaboration. AWS Clean Rooms joins the datasets on a shared key (`user_id`), trains a propensity model on the combined features, and runs inference — all without either party seeing the other's raw data.
45+
**What the retailer gains:** insight into which ad-exposed customers are most likely to buy, enabling smarter inventory planning and personalised offers.
3846

39-
**Business Value:** The advertiser can identify high-propensity users to target with ad campaigns, while the retailer gains insight into which ad-exposed customers are most likely to purchase — enabling better ad spend allocation and personalized marketing.
47+
**What neither party gives up:** their raw customer data. AWS Clean Rooms enforces that the join happens inside the secure collaboration — no raw records cross the boundary.
4048

4149
---
4250

@@ -51,6 +59,7 @@ This repo is a sample, to quickly get started with AWS Clean Rooms Custom ML mod
5159
| 3 | Build & Push Containers (CodeBuild) | ~7 min |
5260
| 4 | Setup Clean Rooms Infrastructure | ~31s |
5361
| 5 | Train Model & Run Inference | ~34 min |
62+
| 6 | Create QuickSight Dashboard (optional) | ~3 min |
5463
| **Total** | **End-to-end** | **~42 min** |
5564

5665
### Prerequisites
@@ -59,13 +68,16 @@ This repo is a sample, to quickly get started with AWS Clean Rooms Custom ML mod
5968
- AWS CLI configured with valid credentials
6069
- AWS account with AWS Clean Rooms ML access enabled
6170

71+
> **Optional — QuickSight Dashboard (Step 6):** If you plan to run `scripts/create_dashboard.py`, your `AWS_REGION` must be a region where Amazon QuickSight is available. QuickSight, Athena, Glue, and S3 must all be in the same region — cross-region Athena connections are not supported by QuickSight. Supported regions include `us-east-1`, `us-east-2`, `us-west-2`, `eu-west-1`, `eu-west-2`, `eu-west-3`, `eu-central-1`, `eu-north-1`, `ap-northeast-1`, `ap-northeast-2`, `ap-southeast-1`, `ap-southeast-2`, `ap-south-1`, `ca-central-1`, and others. See the [full list](https://docs.aws.amazon.com/quicksight/latest/user/regions-qs.html). Also set `QS_NOTIFICATION_EMAIL` in `config.py` to a valid email address — this is used for QuickSight account registration and is validated at script startup.
72+
6273
### Step 0: Configure Your Account
6374

6475
Edit `config.py` and set your values:
6576

6677
```python
67-
AWS_ACCOUNT_ID = "123456789012" # Your 12-digit AWS account ID
68-
AWS_REGION = "eu-north-1" # Your preferred region
78+
AWS_ACCOUNT_ID = "123456789012" # Your 12-digit AWS account ID
79+
AWS_REGION = "eu-north-1" # Your preferred region
80+
QS_NOTIFICATION_EMAIL = "your@email.com" # Optional: only needed for Step 6 (QuickSight)
6981
```
7082

7183
All scripts read from this single file — no other hardcoded values to change.
@@ -416,6 +428,14 @@ After successful inference, AWS Clean Rooms ML writes the output to the configur
416428
|--------|------|-------------|
417429
| propensity_score | float (0–1) | Predicted probability of conversion |
418430
| predicted_converter | int (0/1) | Binary prediction: 1 = likely converter |
431+
| ad_campaign_id | string | Ad campaign the record belongs to |
432+
| device_type | string | Device type (mobile, desktop, tablet, smart_tv) |
433+
| product_category | string | Product category browsed/purchased |
434+
| purchase_amount | float | Total purchase amount |
435+
| impressions | int | Number of ad impressions |
436+
| clicks | int | Number of ad clicks |
437+
438+
> **Note:** `user_id` is never present in the output — it is the Clean Rooms join key and is excluded from the ML input channel by design. The passthrough contextual columns (`ad_campaign_id`, `device_type`, etc.) come from the pre-joined data already approved for the inference channel and are used to power the QuickSight dashboard segmentation.
419439
420440
Example output rows:
421441

@@ -478,6 +498,7 @@ scripts/
478498
build_and_push.py ← Build containers via local Docker
479499
setup_cleanrooms.py ← Create Glue, IAM, collaboration, ML config
480500
run_cleanrooms_ml.py ← Create channels, train model, run inference
501+
create_dashboard.py ← Optional: create QuickSight dashboard (Step 6)
481502
test_training_local.py ← Test training locally (no AWS needed)
482503
sagemaker_training_job.py ← Optional: run training via SageMaker directly
483504
update_requirements.sh ← Regenerate container requirements.txt from lockfile
@@ -612,12 +633,13 @@ The undeploy script removes all resources in reverse dependency order:
612633

613634
1. **Clean Rooms ML** — inference jobs, trained models, ML input channels, algorithm associations, configured model algorithms
614635
2. **Clean Rooms** — ML configuration, table association analysis rules, table associations, configured tables, analysis rules, collaboration
615-
3. **AWS Glue** — tables and database (`cleanrooms_ml_demo`)
636+
3. **AWS Glue** — tables and database (`cleanrooms_ml_demo`), including dashboard tables (`inference_output`, `model_metrics`, `feature_importance`) if `create_dashboard.py` was run
616637
4. **Lake Formation** — permission grants for the data provider role
617-
5. **Amazon S3** — source and output buckets (empties all objects and versions first)
638+
5. **Amazon S3** — source and output buckets (empties all objects and versions first, including `dashboard-data/` CSVs)
618639
6. **Amazon ECR** — training and inference container repositories (including all images)
619640
7. **IAM** — all demo roles (`data-provider`, `model-provider`, `ml-config`, `query-runner`, `codebuild`, `sagemaker`)
620641
8. **CodeBuild** — build project and associated CloudWatch log groups
642+
9. **Amazon QuickSight** — dashboard, analysis, SPICE datasets, and Athena data source (if `create_dashboard.py` was run). The QuickSight account subscription itself is **not** deleted as it is account-wide.
621643

622644
> **Note:** IAM roles are global (not region-scoped), so they only need to be deleted once regardless of how many regions were used. The script handles this gracefully — if a role was already deleted by a previous region's undeploy run, it skips it.
623645

config.py

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,22 @@
1515
"""
1616

1717
# ─── REQUIRED: Set these to your values ───────────────────
18-
# Prefer environment variables; fall back to placeholder for local dev.
18+
# Edit the values below directly. These are the authoritative settings.
19+
# Environment variables AWS_ACCOUNT_ID / AWS_REGION are only used as
20+
# fallback when the placeholder values below have not been changed.
1921
import os as _os_cfg
20-
AWS_ACCOUNT_ID = _os_cfg.environ.get("AWS_ACCOUNT_ID", "123456789012")
21-
AWS_REGION = _os_cfg.environ.get("AWS_REGION", "us-east-1")
22+
23+
_ACCOUNT_DEFAULT = "123456789012"
24+
_REGION_DEFAULT = "eu-west-2"
25+
_EMAIL_DEFAULT = "your-email@example.com"
26+
27+
AWS_ACCOUNT_ID = _ACCOUNT_DEFAULT if _ACCOUNT_DEFAULT != "123456789012" else _os_cfg.environ.get("AWS_ACCOUNT_ID", "123456789012")
28+
AWS_REGION = _REGION_DEFAULT if _REGION_DEFAULT != "us-east-1" else _os_cfg.environ.get("AWS_REGION", "us-east-1")
29+
30+
# ─── OPTIONAL: Required only for scripts/create_dashboard.py ──
31+
# Email address for QuickSight account registration and admin user.
32+
# Must be a valid address — QuickSight sends subscription notifications to it.
33+
QS_NOTIFICATION_EMAIL = _EMAIL_DEFAULT if _EMAIL_DEFAULT != "your-email@example.com" else _os_cfg.environ.get("QS_NOTIFICATION_EMAIL", "your-email@example.com")
2234

2335
# ─── RUN ID (auto-generated, ensures unique bucket names) ─
2436
import os as _os
@@ -67,13 +79,16 @@ def _get_or_create_run_id():
6779
ROLE_QUERY_RUNNER = f"{PREFIX}-query-runner-role"
6880

6981

70-
def validate():
82+
def validate(require_qs_email=False):
7183
"""Call this at the start of any script to catch misconfiguration early."""
7284
errors = []
7385
if AWS_ACCOUNT_ID == "CHANGE_ME" or not AWS_ACCOUNT_ID.isdigit() or len(AWS_ACCOUNT_ID) != 12:
7486
errors.append(f"AWS_ACCOUNT_ID must be a 12-digit number, got: '{AWS_ACCOUNT_ID}'")
7587
if AWS_REGION == "CHANGE_ME" or not AWS_REGION:
7688
errors.append(f"AWS_REGION must be set, got: '{AWS_REGION}'")
89+
if require_qs_email and QS_NOTIFICATION_EMAIL == "your-email@example.com":
90+
errors.append("QS_NOTIFICATION_EMAIL must be set to a real email address in config.py "
91+
"(required for QuickSight account registration)")
7792
if errors:
7893
print("=" * 60)
7994
print("CONFIGURATION ERROR — edit config.py")

containers/inference/inference_handler.py

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,17 @@
44
"""
55
Inference handler for Customer Propensity Scoring model.
66
Compatible with: local, SageMaker Batch Transform, Clean Rooms ML.
7+
8+
Output columns:
9+
- propensity_score float (0-1)
10+
- predicted_converter int (0/1)
11+
- ad_campaign_id str ─┐
12+
- device_type str │ passthrough contextual columns
13+
- product_category str │ (present in Clean Rooms pre-joined input,
14+
- purchase_amount float │ no raw user identifiers re-introduced)
15+
- impressions int │
16+
- clicks int ─┘
17+
- user_id str (only when present in input, e.g. local/SageMaker mode)
718
"""
819

920
import os, json, logging, io
@@ -122,8 +133,21 @@ def predict(input_data, content_type="text/csv"):
122133
"propensity_score": np.round(probabilities, 4),
123134
"predicted_converter": predictions.astype(int),
124135
})
136+
137+
# Pass through contextual columns for dashboard segmentation.
138+
# These come from the Clean Rooms pre-joined input — no raw user
139+
# identifiers are re-introduced. user_id is only present in
140+
# local/SageMaker mode (never in the Clean Rooms execution path).
141+
PASSTHROUGH_COLS = [
142+
"ad_campaign_id", "device_type", "product_category",
143+
"purchase_amount", "impressions", "clicks",
144+
]
145+
for col in PASSTHROUGH_COLS:
146+
if col in df.columns:
147+
result[col] = df[col].values
148+
125149
if user_ids is not None:
126150
result.insert(0, "user_id", user_ids.values)
127151

128-
logger.info(f"Output shape: {result.shape}")
152+
logger.info(f"Output shape: {result.shape}, columns: {list(result.columns)}")
129153
return result.to_csv(index=False)

data/generate_synthetic_data.py

Lines changed: 28 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,26 @@
3434
DEVICES = ["mobile", "desktop", "tablet", "smart_tv"]
3535
CATEGORIES = ["electronics", "clothing", "home_garden", "sports", "beauty", "grocery", "toys"]
3636

37+
# Campaign effectiveness multipliers — drives visible variation in avg propensity by campaign
38+
CAMPAIGN_PROPENSITY_BOOST = {
39+
"camp_holiday": 0.18,
40+
"camp_summer_sale": 0.10,
41+
"camp_back_to_school": 0.04,
42+
"camp_spring": -0.05,
43+
"camp_clearance": -0.14,
44+
}
45+
46+
# Category affinity multipliers — drives visible variation in avg propensity by category
47+
CATEGORY_PROPENSITY_BOOST = {
48+
"electronics": 0.16,
49+
"sports": 0.08,
50+
"home_garden": 0.03,
51+
"clothing": -0.02,
52+
"toys": -0.07,
53+
"grocery": -0.12,
54+
"beauty": -0.18,
55+
}
56+
3757
BASE_DATE = datetime(2025, 1, 1)
3858

3959

@@ -53,13 +73,14 @@ def generate_advertiser_data():
5373
propensity = max(0.05, min(0.95, propensity))
5474

5575
for campaign in random.sample(CAMPAIGNS, num_campaigns):
56-
# Weaker propensity signal: more baseline randomness, less propensity-driven
57-
impressions = max(1, int(random.randint(5, 40) + 10 * propensity))
76+
# Apply campaign-level propensity boost so charts show meaningful variation
77+
campaign_propensity = max(0.05, min(0.95, propensity + CAMPAIGN_PROPENSITY_BOOST[campaign]))
78+
impressions = max(1, int(random.randint(5, 40) + 10 * campaign_propensity))
5879
device = random.choice(DEVICES)
5980
base_ctr = {"mobile": 0.08, "desktop": 0.05, "tablet": 0.06, "smart_tv": 0.03}[device]
60-
ctr = base_ctr * (0.5 + 1.0 * propensity) * random.uniform(0.6, 1.5)
81+
ctr = base_ctr * (0.5 + 1.0 * campaign_propensity) * random.uniform(0.6, 1.5)
6182
clicks = max(0, int(impressions * ctr))
62-
time_per_click = random.uniform(5, 30) * (0.5 + 1.0 * propensity)
83+
time_per_click = random.uniform(5, 30) * (0.5 + 1.0 * campaign_propensity)
6384
time_spent = round(clicks * time_per_click, 1) if clicks > 0 else 0
6485
event_date = random_date(BASE_DATE, BASE_DATE + timedelta(days=180))
6586

@@ -95,7 +116,9 @@ def generate_retailer_data():
95116

96117
num_categories = random.randint(1, 4)
97118
for category in random.sample(CATEGORIES, num_categories):
98-
site_visits = max(1, int(random.randint(3, 12) + 8 * base_propensity))
119+
# Apply category-level propensity boost so charts show meaningful variation
120+
cat_propensity = max(0.05, min(0.95, base_propensity + CATEGORY_PROPENSITY_BOOST[category]))
121+
site_visits = max(1, int(random.randint(3, 12) + 8 * cat_propensity))
99122

100123
avg_price = {"electronics": 150, "clothing": 45, "home_garden": 65,
101124
"sports": 55, "beauty": 30, "grocery": 25, "toys": 35}[category]

0 commit comments

Comments
 (0)