Skip to content

A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.

License

Notifications You must be signed in to change notification settings

BaseMax/okala-database-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🛒 Okala Database Crawler

A robust PHP-based crawler to extract and save product data from Okala including stores, categories, and product details in structured JSON format.


📦 What It Does

  • Crawls multiple store pages from Okala
  • Iterates through multiple category slugs
  • Downloads and stores:
    • Product search result pages
    • Product detail.json
    • Product features.json
  • Saves all data in structured folders under /data/
  • Fully supports UTF-8/Persian characters
  • Respects existing files to avoid redundant requests

🗂 Folder Structure

data/
├── search/
│   └── {store_id}/{category_slug}/{page}.json
├── product/
│   └── {product_id}/
│       ├── features.json
│       └── {store_id}/detail.json


🚀 Usage

✅ Requirements

  • PHP 7.4+ with curl and json extensions
  • Git (for automated commit + push loop)
  • Internet access

📥 Clone the Repo

git clone https://github.com/BaseMax/okala-database-crawler.git
cd okala-database-crawler

🧪 Run the Crawler

php crawler.php

🔁 Auto Git Push (Optional)

To automatically commit and push updated JSON data every 5 minutes:

crawler.bat

💡 Useful when you're running long crawling jobs and want a backup of progress on GitHub.


🛠 Customization

You can edit the following in crawler.php:

  • Stores list ($stores)
  • Categories list ($categories)
  • Fetch delay (usleep(250_000) for 250ms between requests)

🧼 Features

  • ✅ Automatic file structure and directory creation
  • ✅ Skips already downloaded data (but still verifies products)
  • ✅ Handles self-signed SSL issues via cURL
  • ✅ UTF-8 safe JSON storage (e.g., Persian: فارسی)
  • ✅ Color-coded CLI output for easier tracking

🤝 Contributions

PRs welcome! Please fork the repo and submit your improvements.


📬 Contact

Have questions or ideas? Reach out via GitHub Issues.


📄 License

MIT License

© 2025 Max Base

About

A robust, UTF-8 compliant PHP-based crawler designed to extract structured product data from Okala. This tool efficiently scrapes and saves store information, category slugs, and detailed product listings into organized JSON files. Ideal for data analysis, backup, or integration into other systems.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published