Skip to content

cas1m1r/Titleist-Plus-Plus

Repository files navigation

🔍 Titleist++

An evolution of Titleist, focused on detecting typosquatted or deceptive domains in near real time using Certificate Transparency (CT) logs, Levenshtein similarity, and optional DNS resolution.


🧩 Overview

Titleist++ monitors new domains appearing in public Certificate Transparency logs, stores them locally, and compares each to a reference corpus (e.g., top Alexa or Tranco domains) to flag suspiciously similar names.

Unlike the original Titleist, this version:

  • Runs fully independent of CertStream, harvesting directly from major CT log maintainers.
  • Logs raw domain events to SQLite, enabling scalable, asynchronous analysis.
  • Performs batched similarity scans and can stream results as batches complete.
  • Optionally resolves domains to IPs, ASN, or geolocation for enrichment.

⚙️ Setup

Requirements

  • Python 3.10+
  • sqlite3 (built-in)
  • Recommended: requests, tldextract, Levenshtein, dnspython, rich

Installation

git clone https://github.com/cas1m1r/Titleist-Plus-Plus.git
cd Titleist-Plus-Plus
pip install -r requirements.txt

🧠 Usage

1. Harvest CT logs

Start the CT log harvester to populate the local database:

python harvest_ct.py

This script continuously fetches certificate entries and extracts domain names into ct_data.db.


2. Run typosquatting analysis

Once the database has data, analyze new entries in batches:

python analyze_domains.py

You can specify options:

python analyze_domains.py --batch-size 1000 --resolver --threshold 2

Options:

Flag Description
--batch-size Number of domains to compare per batch
--resolver Resolve suspicious domains to IP and check A/AAAA records
--threshold Maximum Levenshtein distance to consider a “hit”
--log Write results to hits.log

3. Example Output

[Batch 7/50]
HIT  gm.com  → go.com
HIT  us.com  → ups.com
HIT  cn.com  → cnn.com
HIT  discovery.com  → discover.com
HIT  aon.com  → aol.com
...

If --resolver is enabled:

HIT  gm.com  → go.com   (A: 198.51.100.45, ASN: AS16509 Amazon)

Each batch prints incrementally as results are found.


🧱 Architecture

        ┌─────────────────────┐
        │   CT Harvester      │
        │ (multiple feeds)    │
        └─────────┬───────────┘
                  │
                  ▼
           SQLite Database
                  │
                  ▼
         Typosquat Analyzer
        (Levenshtein distance)
                  │
                  ▼
          Enriched Output
        (Resolver, ASN, etc.)

This separation allows the harvester and analyzer to run independently — scalable across threads or machines.


🧮 Example Query

You can inspect the stored domains manually:

sqlite3 ct_data.db
SELECT * FROM domains ORDER BY timestamp DESC LIMIT 10;

🧭 Roadmap

  • CT harvester with SQLite storage
  • Batched Levenshtein comparison
  • Optional DNS resolver
  • Web dashboard for real-time results, control, further analysis, etc.
  • Visualization of domain similarity clusters

🧑‍💻 Author

cas1m1r — research, design, and architecture
Inspired by early Titleist work on CertStream and the vision of open, transparent infrastructure monitoring.


📜 License

MIT License. See LICENSE for details. Titleist++

An evolution of Titleist, focused on detecting typosquatted or deceptive domains in near real time using Certificate Transparency (CT) logs, Levenshtein similarity, and optional DNS resolution.

Releases

No releases published

Packages

 
 
 

Contributors

Languages