A robust PHP-based crawler to extract and save product data from Okala including stores, categories, and product details in structured JSON format.
- Crawls multiple store pages from Okala
- Iterates through multiple category slugs
- Downloads and stores:
- Product search result pages
- Product detail.json
- Product features.json
- Saves all data in structured folders under
/data/
- Fully supports UTF-8/Persian characters
- Respects existing files to avoid redundant requests
data/
├── search/
│ └── {store_id}/{category_slug}/{page}.json
├── product/
│ └── {product_id}/
│ ├── features.json
│ └── {store_id}/detail.json
- PHP 7.4+ with
curl
andjson
extensions - Git (for automated commit + push loop)
- Internet access
git clone https://github.com/BaseMax/okala-database-crawler.git
cd okala-database-crawler
php crawler.php
To automatically commit and push updated JSON data every 5 minutes:
crawler.bat
💡 Useful when you're running long crawling jobs and want a backup of progress on GitHub.
You can edit the following in crawler.php
:
- Stores list (
$stores
) - Categories list (
$categories
) - Fetch delay (
usleep(250_000)
for 250ms between requests)
- ✅ Automatic file structure and directory creation
- ✅ Skips already downloaded data (but still verifies products)
- ✅ Handles self-signed SSL issues via cURL
- ✅ UTF-8 safe JSON storage (e.g., Persian: فارسی)
- ✅ Color-coded CLI output for easier tracking
PRs welcome! Please fork the repo and submit your improvements.
Have questions or ideas? Reach out via GitHub Issues.
MIT License
© 2025 Max Base