PulseRank Platform

Production-simulated marketplace ranking system with IPS position-bias correction, hybrid candidate generation, exposure governance, delayed attribution, and offline A/B decisioning — every result backed by a versioned JSON artifact.

Architecture

Sample Output

The Problem

Offline recommendation metrics lie. A model trained on position-biased click data looks 4× worse than it actually is — items shown at rank 1 receive 8× the clicks of rank-10 items regardless of quality, and naive NDCG bakes that position advantage directly into the score. PulseRank's baseline ranker scores NDCG@10 = 0.134 on biased labels; after IPS correction, the same model scores 0.522. The display layer's decisions were being charged to the model.

IPS Debiasing

The core technical contribution of PulseRank is treating position bias as a missing-data problem. Every impression is logged with its display rank at serve time. A position-click curve is estimated from the logged data — modeling the probability that a user clicks an item at position k given only that it was shown there. These propensity scores are used to reweight the evaluation:

IPS-NDCG = sum_i (rel_i / propensity_i) / ideal_DCG

IPS weights are clipped to control variance from low-propensity (low-rank) impressions. The result: NDCG@10 jumps from 0.134 (biased) to 0.522 (corrected) — not a model improvement, but an evaluation correction that removes the penalty for decisions the display layer made.

Metric	Value
Naive NDCG@10 (biased labels)	0.134
IPS-corrected NDCG@10	0.522
Relative improvement from debiasing	+289%

Position-Click Curve

The propensity estimator fits a position-based click model over the 90-day corpus of 41,320 impressions. Estimated click probability by rank follows an approximately log-linear decay — rank 1 receives ~8× the click rate of rank 10, consistent with known position bias in marketplace interfaces. These propensity estimates are stored in outputs/evidence/propensity_by_rank_report.json.

Hybrid Candidate Generation

Two complementary scorers produce candidates for each session:

Popularity Scorer: Global purchase-rate signal. Stable, high-coverage, biased toward bestsellers.
Content-Based Scorer: Item-feature similarity to session context. Better for tail items and fresh inventory.

Candidates are merged into a unified top-100 set. Hybrid Recall@100 = 0.690 — 69% of purchased items appear in the candidate set shown to each session. This recall figure is the ceiling for any ranker built on top.

Exposure Governance

A pure relevance ranker concentrates impressions on high-popularity sellers, starving long-tail inventory of visibility. PulseRank applies three post-ranker constraints:

MMR Diversity: Marginally-relevant re-ranking to increase within-list diversity.
Seller Gini Governance: Gini coefficient constraint on seller share of impressions. Seller Gini after rerank = 0.582.
Catalog Coverage: Coverage@10 = 0.226 — 22.6% of catalog items appear in at least one session's top-10.

These are the real-world constraints absent from pure relevance rankers in portfolio projects.

Delayed Attribution

Marketplace purchase intent does not resolve at click time. PulseRank attributes conversions within a 30-day window to the last relevant impression, correctly modeling the delayed decision cycle. 539 purchases are attributed across 4,132 sessions. Attribution reports are in outputs/evidence/conversion_attribution_report.json.

A/B Decision Framework

The offline A/B simulation replays control and treatment rankers on a temporally-held-out segment. The split is strictly temporal — no purchase labels from the test period leak into the training data.

Current decision: HOLD_SIMULATED. IPS-corrected metrics show improvement, but the relative lift in the holdout does not clear the significance threshold at the current traffic volume. The structured decision artifact in outputs/evidence/ab_simulation_results.json logs the decision rationale, effect size, and recommended action.

Evidence Artifacts

outputs/evidence/
  corpus_summary.json                 corpus statistics and schema validation
  candidate_generation_report.json    recall@k across popularity + content
  candidate_recall_report.json        per-session recall breakdown
  ranking_baseline_report.json        NDCG@10, MRR, hit rate, temporal eval
  offline_eval_log.json               per-session evaluation log
  model_registry.json                 versioned model + config snapshot
  bias_correction_report.json         IPS before/after comparison
  propensity_by_rank_report.json      propensity estimates per display rank
  diversity_report.json               MMR, Gini, novelty metrics
  diversity_guardrail_log.json        per-rerank constraint decisions
  catalog_coverage_report.json        coverage@k across catalog
  conversion_attribution_report.json  attributed labels, window analysis
  ab_simulation_results.json          control vs treatment decision
  metasignal_integration_events.json  15 structured observability events
  failure_recovery_report.json        15 failure + recovery scenarios

Key Results

Evidence	Result
Sessions	4,132
Items	650
Sellers	80
Impressions	41,320
Purchases	539
Display-rank coverage	1.0
Hybrid Recall@100	0.690
Holdout NDCG@10 (biased)	0.134
IPS-weighted NDCG@10	0.522
Seller Gini after rerank	0.582
Catalog coverage@10	0.226
Offline A/B decision	HOLD_SIMULATED
Failure scenarios	15
JSON evidence artifacts	33

Quick Start

git clone https://github.com/sidharthkriplani/pulserank_platform
cd pulserank_platform
pip install -r requirements.txt
python scripts/seed_demo.py
python scripts/show_demo_report.py
open outputs/dashboard/index.html

Interview Defense

Full design rationale, architecture decisions, and expected interview questions with answers:

docs/defense/PulseRank_Interview_Defense_v3.pdf

Covers: IPS debiasing mathematics, position-click curve estimation, hybrid candidate generation, exposure governance constraints, delayed attribution window design, offline A/B methodology, and production failure modes.

Part of Applied LLM Systems Portfolio

This project is part of a 13-repo portfolio targeting Applied LLM Systems Engineer, MLOps, and Technical AI PM roles.

Applied Systems (LangGraph pipelines):

Project	Domain	Primary Failure Mode
LendFlow	Financial underwriting	When to stop or escalate
AgentReliabilityLab	Cyber threat triage	When to stop or escalate
NexusSupply	Supplier risk intelligence	Conflicting signal fusion

Platforms & Auditors (domain-agnostic tooling):

Project	What It Audits / Builds
InferenceLens	Inference cost/quality tradeoffs — Pareto frontier, routing rules
RiskFrame	ML model lifecycle — champion/challenger, drift, fairness
MetaSignal	A/B experiment validity — CUPED, guardrail-first, SRM
DevPulse	Version-safe RAG — conflict detection, LLM-Last architecture
PulseRank	Marketplace ranking — IPS debiasing, MMR diversity
TrialCheck	A/B readout audit — SRM, peeking, underpowered tests
FeatureLeakageLens	Pre-training leakage — target, temporal, overlap
GoldenSetAuditor	LLM/RAG eval dataset quality
DocIngestQA	RAG document ingestion quality — 11 deterministic checks
MetricLens	Metric movement decomposition — mix shift vs rate shift

Scope & Production Gaps

PulseRank is a production-simulated ranking system. The following table makes explicit what is and isn't implemented — so every claim is defensible in an interview.

Capability	Status	Notes
IPS position-bias debiasing	✅ Implemented	Click × (1/propensity(rank)), clip max_weight=5.0
MMR diversity reranking	✅ Implemented	λ=0.70, max_per_seller=2, category_cap=50%
Delayed attribution window	✅ Implemented	Configurable attribution window with holdout
Offline A/B simulation	✅ Implemented	4,132 sessions, 15 failure scenarios, HOLD_SIMULATED verdict
Exposure governance	✅ Implemented	Seller Gini + category concentration constraints
Online A/B testing	❌ Not implemented	No real traffic; requires production infrastructure
Contextual bandits / RL	❌ Not implemented	Explicit scope boundary — would require live feedback loop
Real marketplace data	❌ Not implemented	All data is synthetic (seeded, reproducible)
Live streaming infrastructure	❌ Not implemented	Event emission is simulated; no Kafka/Kinesis
Real seller/buyer users	❌ Not implemented	650 synthetic items, 80 synthetic sellers

The offline A/B decision is explicitly labeled HOLD_SIMULATED to distinguish it from a live experiment result. Every major metric is backed by an executable script and a JSON artifact — 33 artifacts total across the evidence dashboard.

Truth Boundary

PulseRank is solo-built, non-production, and production-simulated. It does not claim real deployment, real users, real online A/B testing, real RL or contextual bandits, or real streaming infrastructure. Every major claim is backed by an executable script and a JSON artifact. The offline A/B decision is explicitly labeled HOLD_SIMULATED to distinguish it from a live experiment result.

Resume-Safe Claim

Built PulseRank, a production-simulated marketplace ranking system with display-rank impression logging, hybrid candidate generation, temporal holdout ranking evaluation, IPS position-bias correction (NDCG@10 0.134 → 0.522), delayed conversion attribution, seller/category exposure governance, offline A/B simulation, 15 failure recovery scenarios, MetaSignal-compatible event emission, and a GitHub Pages evidence dashboard covering 33 JSON artifacts.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
configs		configs
data		data
docs		docs
outputs		outputs
scripts		scripts
src/pulserank		src/pulserank
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PulseRank Platform

Architecture

Sample Output

The Problem

IPS Debiasing

Position-Click Curve

Hybrid Candidate Generation

Exposure Governance

Delayed Attribution

A/B Decision Framework

Evidence Artifacts

Key Results

Quick Start

Interview Defense

Part of Applied LLM Systems Portfolio

Scope & Production Gaps

Truth Boundary

Resume-Safe Claim

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PulseRank Platform

Architecture

Sample Output

The Problem

IPS Debiasing

Position-Click Curve

Hybrid Candidate Generation

Exposure Governance

Delayed Attribution

A/B Decision Framework

Evidence Artifacts

Key Results

Quick Start

Interview Defense

Part of Applied LLM Systems Portfolio

Scope & Production Gaps

Truth Boundary

Resume-Safe Claim

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages