Modern async-first Python SDK for Bright Data APIs with dataclass payloads, Jupyter notebooks, comprehensive platform support, and CLI tool - built for data scientists and developers.
- β¨ Features
- π Jupyter Notebooks
- π¦ Installation
- π Quick Start
- π What's New in v2.0.0
- ποΈ Architecture
- π API Reference
- π₯οΈ CLI Usage
- πΌ Pandas Integration
- π¨ Dataclass Payloads
- π§ Advanced Usage
- π§ͺ Testing
- ποΈ Design Philosophy
- π Documentation
- π§ Troubleshooting
- π€ Contributing
- π Project Stats
- π License
- π Links
- π‘ Examples
- π― Roadmap
- π Acknowledgments
- π Why Choose This SDK?
- π 5 Jupyter Notebooks - Complete tutorials from quickstart to batch processing
- πΌ Pandas Integration - Native DataFrame support with examples
- π Data Analysis Ready - Built-in visualization, export to CSV/Excel
- π° Cost Tracking - Budget management and cost analytics
- π Progress Bars - tqdm integration for batch operations
- πΎ Caching Support - joblib integration for development
- π Async-first architecture with sync wrappers for compatibility
- π¨ Dataclass Payloads - Runtime validation, IDE autocomplete, helper methods
- π Web scraping via Web Unlocker proxy service
- π SERP API - Google, Bing, Yandex search results
- π¦ Platform scrapers - LinkedIn, Amazon, ChatGPT, Facebook, Instagram
- π― Dual namespace -
scrape(URL-based) +search(discovery) - π₯οΈ CLI Tool -
brightdatacommand for terminal usage
- π 100% type safety - Dataclasses + TypedDict definitions
- β 502+ comprehensive tests - Unit, integration, and E2E
- β‘ Resource efficient - Single shared AsyncEngine
- π¨ Rich result objects - Timing, cost tracking, method tracking
- π .env file support - Automatic loading via python-dotenv
- π‘οΈ SSL error handling - Helpful guidance for certificate issues
- π Function-level monitoring - Track which SDK methods are used
Perfect for data scientists! Interactive tutorials with examples:
- 01_quickstart.ipynb - Get started in 5 minutes
- 02_pandas_integration.ipynb - Work with DataFrames
- 03_amazon_scraping.ipynb - Amazon deep dive
- 04_linkedin_jobs.ipynb - Job market analysis
- 05_batch_processing.ipynb - Scale to 1000s of URLs
pip install brightdata-sdkOr install from source:
git clone https://github.com/brightdata/sdk-python.git
cd sdk-python
pip install -e .Set your API token as an environment variable:
export BRIGHTDATA_API_TOKEN="your_api_token_here"
export BRIGHTDATA_CUSTOMER_ID="your_customer_id" # OptionalOr use a .env file (automatically loaded):
# .env
BRIGHTDATA_API_TOKEN=your_api_token_here
BRIGHTDATA_CUSTOMER_ID=your_customer_id # OptionalOr pass credentials directly:
from brightdata import BrightDataClient
client = BrightDataClient(
token="your_api_token",
customer_id="your_customer_id" # Optional
)from brightdata import BrightDataClient
# Initialize client (auto-loads token from environment)
client = BrightDataClient()
# Scrape any website (sync wrapper)
result = client.scrape.generic.url("https://example.com")
if result.success:
print(f"Success: {result.success}")
print(f"Data: {result.data[:200]}...")
print(f"Time: {result.elapsed_ms():.2f}ms")
else:
print(f"Error: {result.error}")from brightdata import BrightDataClient
from brightdata.payloads import AmazonProductPayload, LinkedInJobSearchPayload
client = BrightDataClient()
# Amazon with validated payload
payload = AmazonProductPayload(
url="https://amazon.com/dp/B123456789",
reviews_count=50 # Runtime validated!
)
print(f"ASIN: {payload.asin}") # Helper property
result = client.scrape.amazon.products(**payload.to_dict())
# LinkedIn job search with validation
job_payload = LinkedInJobSearchPayload(
keyword="python developer",
location="New York",
remote=True
)
print(f"Remote search: {job_payload.is_remote_search}")
jobs = client.search.linkedin.jobs(**job_payload.to_dict())import pandas as pd
from brightdata import BrightDataClient
client = BrightDataClient()
# Scrape multiple products
urls = ["https://amazon.com/dp/B001", "https://amazon.com/dp/B002"]
results = []
for url in urls:
result = client.scrape.amazon.products(url=url)
if result.success:
results.append({
'title': result.data.get('title'),
'price': result.data.get('final_price'),
'rating': result.data.get('rating'),
'cost': result.cost
})
# Convert to DataFrame
df = pd.DataFrame(results)
print(df.describe())
# Export to CSV
df.to_csv('products.csv', index=False)# Scrape specific product URLs
result = client.scrape.amazon.products(
url="https://amazon.com/dp/B0CRMZHDG8",
timeout=65
)
# Extract reviews with filters
result = client.scrape.amazon.reviews(
url="https://amazon.com/dp/B0CRMZHDG8",
pastDays=30,
keyWord="quality",
numOfReviews=100
)
# Scrape seller information
result = client.scrape.amazon.sellers(
url="https://amazon.com/sp?seller=AXXXXXXXXX"
)
# NEW: Search Amazon by keyword and filters
result = client.search.amazon.products(
keyword="laptop",
min_price=50000, # $500 in cents
max_price=200000, # $2000 in cents
prime_eligible=True,
condition="new"
)
# Search by category
result = client.search.amazon.products(
keyword="wireless headphones",
category="electronics"
)# URL-based extraction
result = client.scrape.linkedin.profiles(
url="https://linkedin.com/in/johndoe"
)
result = client.scrape.linkedin.jobs(
url="https://linkedin.com/jobs/view/123456"
)
result = client.scrape.linkedin.companies(
url="https://linkedin.com/company/microsoft"
)
result = client.scrape.linkedin.posts(
url="https://linkedin.com/feed/update/..."
)
# Discovery/search operations
result = client.search.linkedin.jobs(
keyword="python developer",
location="New York",
remote=True,
experienceLevel="mid"
)
result = client.search.linkedin.profiles(
firstName="John",
lastName="Doe"
)
result = client.search.linkedin.posts(
profile_url="https://linkedin.com/in/johndoe",
start_date="2025-01-01",
end_date="2025-12-31"
)# Send single prompt to ChatGPT
result = client.scrape.chatgpt.prompt(
prompt="Explain Python async programming",
country="us",
web_search=True
)
# Batch prompts
result = client.scrape.chatgpt.prompts(
prompts=["What is Python?", "What is JavaScript?", "Compare them"],
web_searches=[False, False, True]
)# Scrape posts from profile
result = client.scrape.facebook.posts_by_profile(
url="https://facebook.com/profile",
num_of_posts=10,
start_date="01-01-2025",
end_date="12-31-2025",
timeout=240
)
# Scrape posts from group
result = client.scrape.facebook.posts_by_group(
url="https://facebook.com/groups/example",
num_of_posts=20,
timeout=240
)
# Scrape specific post
result = client.scrape.facebook.posts_by_url(
url="https://facebook.com/post/123456",
timeout=240
)
# Scrape comments from post
result = client.scrape.facebook.comments(
url="https://facebook.com/post/123456",
num_of_comments=100,
start_date="01-01-2025",
end_date="12-31-2025",
timeout=240
)
# Scrape reels from profile
result = client.scrape.facebook.reels(
url="https://facebook.com/profile",
num_of_posts=50,
timeout=240
)# Scrape Instagram profile
result = client.scrape.instagram.profiles(
url="https://instagram.com/username",
timeout=240
)
# Scrape specific post
result = client.scrape.instagram.posts(
url="https://instagram.com/p/ABC123",
timeout=240
)
# Scrape comments from post
result = client.scrape.instagram.comments(
url="https://instagram.com/p/ABC123",
timeout=240
)
# Scrape specific reel
result = client.scrape.instagram.reels(
url="https://instagram.com/reel/ABC123",
timeout=240
)
# Discover posts from profile (with filters)
result = client.search.instagram.posts(
url="https://instagram.com/username",
num_of_posts=10,
start_date="01-01-2025",
end_date="12-31-2025",
post_type="reel",
timeout=240
)
# Discover reels from profile
result = client.search.instagram.reels(
url="https://instagram.com/username",
num_of_posts=50,
start_date="01-01-2025",
end_date="12-31-2025",
timeout=240
)# Google search
result = client.search.google(
query="python tutorial",
location="United States",
language="en",
num_results=20
)
# Access results
for item in result.data:
print(f"{item['position']}. {item['title']}")
print(f" {item['url']}")
# Bing search
result = client.search.bing(
query="python tutorial",
location="United States"
)
# Yandex search
result = client.search.yandex(
query="python tutorial",
location="Russia"
)For better performance with multiple operations, use async:
import asyncio
from brightdata import BrightDataClient
async def scrape_multiple():
# Use async context manager for engine lifecycle
async with BrightDataClient() as client:
# Scrape multiple URLs concurrently
results = await client.scrape.generic.url_async([
"https://example1.com",
"https://example2.com",
"https://example3.com"
])
for result in results:
print(f"Success: {result.success}")
asyncio.run(scrape_multiple())Important: When using *_async methods, always use the async context manager (async with BrightDataClient() as client). Sync wrappers (methods without _async) handle this automatically.
- β Amazon Search API - NEW parameter-based product discovery with correct dataset
- β LinkedIn Job Search Fixed - Now builds URLs from keywords internally
- β Trigger Interface - Manual trigger/poll/fetch control for all platforms
- β 29 Sync Wrapper Fixes - All sync methods work (scrapers + SERP API)
- β Batch Operations Fixed - Returns List[ScrapeResult] correctly
- β Auto-Create Zones - Now enabled by default (was opt-in)
- β
Improved Zone Names -
sdk_unlocker,sdk_serp,sdk_browser - β Full Sync/Async Examples - README now shows both patterns for all features
- β 5 Jupyter Notebooks - Complete interactive tutorials
- β Pandas Integration - Native DataFrame support with examples
- β Batch Processing Guide - Scale to 1000s of URLs with progress bars
- β Cost Management - Budget tracking and optimization
- β Visualization Examples - matplotlib/seaborn integration
- β Runtime Validation - Catch errors at instantiation time
- β
Helper Properties -
.asin,.is_remote_search,.domain, etc. - β IDE Autocomplete - Full IntelliSense support
- β
Default Values - Smart defaults (e.g.,
country="US") - β to_dict() Method - Easy API conversion
- β Consistent Model - Same pattern as result models
- β
brightdatacommand - Use SDK from terminal - β
Scrape operations -
brightdata scrape amazon products ... - β
Search operations -
brightdata search amazon products --keyword ... - β Output formats - JSON, pretty-print, minimal
- β Single AsyncEngine - Shared across all scrapers (8x efficiency)
- β Resource Optimization - Reduced memory footprint
- β Enhanced Error Messages - Clear, actionable error messages
- β 500+ Tests Passing - Comprehensive test coverage (99.4%)
- β Amazon Search - Keyword-based product discovery
- β Facebook Scraper - Posts (profile/group/URL), Comments, Reels
- β Instagram Scraper - Profiles, Posts, Comments, Reels
- β Instagram Search - Posts and Reels discovery with filters
The SDK provides a clean, intuitive interface organized by operation type:
client = BrightDataClient()
# URL-based extraction (scrape namespace)
client.scrape.amazon.products(url="...")
client.scrape.linkedin.profiles(url="...")
client.scrape.facebook.posts_by_profile(url="...")
client.scrape.instagram.profiles(url="...")
client.scrape.generic.url(url="...")
# Parameter-based discovery (search namespace)
client.search.amazon.products(keyword="...", min_price=..., max_price=...)
client.search.linkedin.jobs(keyword="...", location="...")
client.search.instagram.posts(url="...", num_of_posts=10)
client.search.google(query="...")
client.scrape.chatgpt.prompt(prompt="...")
# Direct service access (advanced)
client.web_unlocker.fetch(url="...")
client.crawler.discover(url="...") # Coming soonBrightDataClient- Main entry point with authentication and .env supportScrapeService- URL-based data extractionSearchService- Parameter-based discovery- Result Models -
ScrapeResult,SearchResult,CrawlResultwith method tracking - Platform Scrapers - Amazon, LinkedIn, ChatGPT, Facebook, Instagram with registry pattern
- SERP Services - Google, Bing, Yandex search
- Type System - 100% type safety with TypedDict
- Constants Module - Centralized configuration (no magic numbers)
- SSL Helpers - Platform-specific error guidance
- Function Detection - Automatic SDK function tracking for monitoring
client = BrightDataClient(
token="your_token", # Auto-loads from BRIGHTDATA_API_TOKEN if not provided
customer_id="your_customer_id", # Auto-loads from BRIGHTDATA_CUSTOMER_ID (optional)
timeout=30, # Default timeout in seconds
web_unlocker_zone="sdk_unlocker", # Web Unlocker zone name (default)
serp_zone="sdk_serp", # SERP API zone name (default)
browser_zone="sdk_browser", # Browser API zone name (default)
auto_create_zones=True, # Auto-create missing zones (default: True)
validate_token=False # Validate token on init (default: False)
)Environment Variables:
BRIGHTDATA_API_TOKEN- Your API token (required)BRIGHTDATA_CUSTOMER_ID- Your customer ID (optional)
Both are automatically loaded from environment or .env file.
# Test API connection
is_valid = await client.test_connection()
is_valid = client.test_connection_sync() # Synchronous version
# Get account information
info = await client.get_account_info()
info = client.get_account_info_sync()
print(f"Zones: {info['zone_count']}")
print(f"Active zones: {[z['name'] for z in info['zones']]}")The SDK can automatically create required zones if they don't exist, or you can manage zones manually.
Enable automatic zone creation when initializing the client:
client = BrightDataClient(
token="your_token",
auto_create_zones=True # Automatically create zones if missing
)
# Zones are created on first API call
async with client:
# sdk_unlocker, sdk_serp, and sdk_browser zones created automatically if needed
result = await client.scrape.amazon.products(url="...")List and manage zones programmatically:
# List all zones
zones = await client.list_zones()
zones = client.list_zones_sync() # Synchronous version
for zone in zones:
print(f"Zone: {zone['name']} (Type: {zone.get('type', 'unknown')})")
# Advanced: Use ZoneManager directly
from brightdata import ZoneManager
async with client.engine:
zone_manager = ZoneManager(client.engine)
# Ensure specific zones exist
await zone_manager.ensure_required_zones(
web_unlocker_zone="my_custom_zone",
serp_zone="my_serp_zone"
)Zone Creation API:
- Endpoint:
POST https://api.brightdata.com/zone - Zones are created via the Bright Data API
- Supported zone types:
unblocker,serp,browser - Automatically handles duplicate zones gracefully
All operations return rich result objects with timing and metadata:
result = client.scrape.amazon.products(url="...")
# Access data
result.success # bool - Operation succeeded
result.data # Any - Scraped data
result.error # str | None - Error message if failed
result.cost # float | None - Cost in USD
result.platform # str | None - Platform name (e.g., "linkedin", "amazon")
result.method # str | None - Method used: "web_scraper", "web_unlocker", "browser_api"
# Timing information
result.elapsed_ms() # Total time in milliseconds
result.get_timing_breakdown() # Detailed timing dict
# Serialization
result.to_dict() # Convert to dictionary
result.to_json(indent=2) # JSON string
result.save_to_file("result.json") # Save to fileThe SDK includes a powerful CLI tool:
# Help
brightdata --help
# Scrape Amazon product (URL is positional argument)
brightdata scrape amazon products \
"https://amazon.com/dp/B0CRMZHDG8"
# Search LinkedIn jobs
brightdata search linkedin jobs \
--keyword "python developer" \
--location "New York" \
--remote \
--output-file jobs.json
# Search Google (query is positional argument)
brightdata search google \
"python tutorial" \
--location "United States"
# Generic web scraping (URL is positional argument)
brightdata scrape generic \
"https://example.com" \
--response-format raw \
--output-format prettyScrape Operations:
brightdata scrape amazon products/reviews/sellersbrightdata scrape linkedin profiles/jobs/companies/postsbrightdata scrape facebook posts-profile/posts-group/comments/reelsbrightdata scrape instagram profiles/posts/comments/reelsbrightdata scrape chatgpt promptbrightdata scrape generic url
Search Operations:
brightdata search amazon productsbrightdata search linkedin jobs/profiles/postsbrightdata search instagram posts/reelsbrightdata search google/bing/yandexbrightdata search chatgpt
The CLI supports two different format parameters for different purposes:
Controls how results are displayed (available for ALL commands):
# JSON format (default) - Full structured output
brightdata scrape amazon products "https://amazon.com/dp/B123" --output-format json
# Pretty format - Human-readable with formatted output
brightdata scrape amazon products "https://amazon.com/dp/B123" --output-format pretty
# Minimal format - Just the data, no metadata
brightdata scrape amazon products "https://amazon.com/dp/B123" --output-format minimalControls what the API returns (generic scraper only):
# Raw format (default) - Returns HTML/text as-is
brightdata scrape generic "https://example.com" --response-format raw
# JSON format - API attempts to parse as JSON
brightdata scrape generic "https://api.example.com/data" --response-format jsonNote: You can combine both:
brightdata scrape generic "https://example.com" \
--response-format raw \
--output-format prettyPerfect for data analysis workflows:
import pandas as pd
from tqdm import tqdm
from brightdata import BrightDataClient
from brightdata.payloads import AmazonProductPayload
client = BrightDataClient()
# Batch scrape with progress bar
urls = ["https://amazon.com/dp/B001", "https://amazon.com/dp/B002"]
results = []
for url in tqdm(urls, desc="Scraping"):
payload = AmazonProductPayload(url=url)
result = client.scrape.amazon.products(**payload.to_dict())
if result.success:
results.append({
'asin': payload.asin,
'title': result.data.get('title'),
'price': result.data.get('final_price'),
'rating': result.data.get('rating'),
'cost': result.cost,
'elapsed_ms': result.elapsed_ms()
})
# Create DataFrame
df = pd.DataFrame(results)
# Analysis
print(df.describe())
print(f"Total cost: ${df['cost'].sum():.4f}")
print(f"Avg rating: {df['rating'].mean():.2f}")
# Export
df.to_csv('amazon_products.csv', index=False)
df.to_excel('amazon_products.xlsx', index=False)
# Visualization
import matplotlib.pyplot as plt
df.plot(x='asin', y='rating', kind='bar', title='Product Ratings')
plt.show()See notebooks/02_pandas_integration.ipynb for complete examples.
All payloads are now dataclasses with runtime validation:
from brightdata.payloads import AmazonProductPayload, AmazonReviewPayload
# Product with validation
payload = AmazonProductPayload(
url="https://amazon.com/dp/B123456789",
reviews_count=50,
images_count=10
)
# Helper properties
print(payload.asin) # "B123456789"
print(payload.domain) # "amazon.com"
print(payload.is_secure) # True
# Convert to API dict
api_dict = payload.to_dict() # Excludes None valuesfrom brightdata.payloads import LinkedInJobSearchPayload
payload = LinkedInJobSearchPayload(
keyword="python developer",
location="San Francisco",
remote=True,
experienceLevel="mid"
)
# Helper properties
print(payload.is_remote_search) # True
# Use with client
result = client.search.linkedin.jobs(**payload.to_dict())from brightdata.payloads import ChatGPTPromptPayload
payload = ChatGPTPromptPayload(
prompt="Explain async programming",
web_search=True
)
# Default values
print(payload.country) # "US" (default)
print(payload.uses_web_search) # True# Runtime validation catches errors early
try:
AmazonProductPayload(url="invalid-url")
except ValueError as e:
print(e) # "url must be valid HTTP/HTTPS URL"
try:
AmazonProductPayload(
url="https://amazon.com/dp/B123",
reviews_count=-1
)
except ValueError as e:
print(e) # "reviews_count must be non-negative"# Scrape multiple URLs concurrently
urls = [
"https://amazon.com/dp/B001",
"https://amazon.com/dp/B002",
"https://amazon.com/dp/B003"
]
results = client.scrape.amazon.products(url=urls)
for result in results:
if result.success:
print(f"{result.data['title']}: ${result.data['price']}")# Amazon reviews with filters
result = client.scrape.amazon.reviews(
url="https://amazon.com/dp/B123",
pastDays=7, # Last 7 days only
keyWord="quality", # Filter by keyword
numOfReviews=50 # Limit to 50 reviews
)
# LinkedIn jobs with extensive filters
result = client.search.linkedin.jobs(
keyword="python developer",
location="New York",
country="us",
jobType="full-time",
experienceLevel="mid",
remote=True,
company="Microsoft",
timeRange="past-week"
)All SDK methods support both sync and async patterns. Choose based on your needs:
# SYNC - Simple scripts
result = client.scrape.amazon.products(url="https://amazon.com/dp/B123")
# ASYNC - Concurrent operations
import asyncio
async def scrape_amazon():
async with BrightDataClient() as client:
result = await client.scrape.amazon.products_async(url="https://amazon.com/dp/B123")
return result
result = asyncio.run(scrape_amazon())# SYNC - Simple keyword search
result = client.search.amazon.products(keyword="laptop", prime_eligible=True)
# ASYNC - Batch keyword searches
async def search_amazon():
async with BrightDataClient() as client:
result = await client.search.amazon.products_async(
keyword="laptop",
min_price=50000,
max_price=200000,
prime_eligible=True
)
return result
result = asyncio.run(search_amazon())# SYNC - Single profile
result = client.scrape.linkedin.profiles(url="https://linkedin.com/in/johndoe")
# ASYNC - Multiple profiles concurrently
async def scrape_linkedin():
async with BrightDataClient() as client:
urls = ["https://linkedin.com/in/person1", "https://linkedin.com/in/person2"]
results = await client.scrape.linkedin.profiles_async(url=urls)
return results
results = asyncio.run(scrape_linkedin())# SYNC - Simple job search
result = client.search.linkedin.jobs(keyword="python", location="NYC", remote=True)
# ASYNC - Advanced search with filters
async def search_jobs():
async with BrightDataClient() as client:
result = await client.search.linkedin.jobs_async(
keyword="python developer",
location="New York",
experienceLevel="mid",
jobType="full-time",
remote=True
)
return result
result = asyncio.run(search_jobs())# SYNC - Quick Google search
result = client.search.google(query="python tutorial", location="United States")
# ASYNC - Multiple search engines concurrently
async def search_all_engines():
async with BrightDataClient() as client:
google = await client.search.google_async(query="python", num_results=10)
bing = await client.search.bing_async(query="python", num_results=10)
yandex = await client.search.yandex_async(query="python", num_results=10)
return google, bing, yandex
results = asyncio.run(search_all_engines())# SYNC - Single profile posts
result = client.scrape.facebook.posts_by_profile(
url="https://facebook.com/profile",
num_of_posts=10
)
# ASYNC - Multiple sources
async def scrape_facebook():
async with BrightDataClient() as client:
profile_posts = await client.scrape.facebook.posts_by_profile_async(
url="https://facebook.com/zuck",
num_of_posts=10
)
group_posts = await client.scrape.facebook.posts_by_group_async(
url="https://facebook.com/groups/programming",
num_of_posts=10
)
return profile_posts, group_posts
results = asyncio.run(scrape_facebook())# SYNC - Single profile
result = client.scrape.instagram.profiles(url="https://instagram.com/instagram")
# ASYNC - Profile + posts
async def scrape_instagram():
async with BrightDataClient() as client:
profile = await client.scrape.instagram.profiles_async(
url="https://instagram.com/instagram"
)
posts = await client.scrape.instagram.posts_async(
url="https://instagram.com/p/ABC123"
)
return profile, posts
results = asyncio.run(scrape_instagram())# SYNC - Single prompt
result = client.scrape.chatgpt.prompt(prompt="Explain Python", web_search=True)
# ASYNC - Batch prompts
async def ask_chatgpt():
async with BrightDataClient() as client:
result = await client.scrape.chatgpt.prompts_async(
prompts=["What is Python?", "What is JavaScript?"],
web_searches=[False, True]
)
return result
result = asyncio.run(ask_chatgpt())# SYNC - Single URL
result = client.scrape.generic.url(url="https://example.com")
# ASYNC - Concurrent scraping
async def scrape_multiple():
async with BrightDataClient() as client:
results = await client.scrape.generic.url_async([
"https://example1.com",
"https://example2.com",
"https://example3.com"
])
return results
results = asyncio.run(scrape_multiple())Use Sync When:
- β Simple scripts or notebooks
- β Single operations at a time
- β Learning or prototyping
- β Sequential workflows
Use Async When:
- β Scraping multiple URLs concurrently
- β Combining multiple API calls
- β Production applications
- β Performance-critical operations
Note: Sync wrappers (e.g., profiles()) internally use asyncio.run() and cannot be called from within an existing async context. Use *_async methods when you're already in an async function.
The SDK includes comprehensive SSL error handling with platform-specific guidance:
from brightdata import BrightDataClient
from brightdata.exceptions import SSLError
try:
client = BrightDataClient()
result = client.scrape.generic.url("https://example.com")
except SSLError as e:
# Helpful error message with platform-specific fix instructions
print(e)
# On macOS, suggests:
# - pip install --upgrade certifi
# - Running Install Certificates.command
# - Setting SSL_CERT_FILE environment variableCommon SSL fixes:
# Option 1: Upgrade certifi
pip install --upgrade certifi
# Option 2: Set SSL_CERT_FILE (macOS/Linux)
export SSL_CERT_FILE=$(python -m certifi)
# Option 3: Run Install Certificates (macOS python.org installers)
/Applications/Python\ 3.x/Install\ Certificates.commandRecent architectural refactoring includes:
All magic numbers moved to constants.py:
from brightdata.constants import (
DEFAULT_POLL_INTERVAL, # 10 seconds
DEFAULT_POLL_TIMEOUT, # 600 seconds
DEFAULT_TIMEOUT_SHORT, # 180 seconds
DEFAULT_TIMEOUT_MEDIUM, # 240 seconds
DEFAULT_COST_PER_RECORD, # 0.001 USD
)Results now track which method was used:
result = client.scrape.amazon.products(url="...")
print(result.method) # "web_scraper", "web_unlocker", or "browser_api"Automatic tracking of which SDK functions are called:
# Automatically detected and sent in API requests
result = client.scrape.linkedin.profiles(url="...")
# Internal: sdk_function="profiles" sent to Bright DataClean separation of concerns:
ScrapeService- URL-based extractionSearchService- Parameter-based discoveryCrawlerService- Web crawling (coming soon)WebUnlockerService- Direct proxy access
Platform-specific guidance for certificate issues:
from brightdata.utils.ssl_helpers import (
is_ssl_certificate_error,
get_ssl_error_message
)The SDK includes 365+ comprehensive tests:
# Run all tests
pytest tests/
# Run specific test suites
pytest tests/unit/ # Unit tests
pytest tests/integration/ # Integration tests
pytest tests/e2e/ # End-to-end tests
# Run with coverage
pytest tests/ --cov=brightdata --cov-report=html- Client is single source of truth for configuration
- Authentication "just works" with minimal setup
- Fail fast and clearly when credentials are missing/invalid
- Each platform is an expert in its domain
- Scrape vs Search distinction is clear and consistent
- Build for future - registry pattern enables intelligent routing
- 01_quickstart.ipynb - 5-minute getting started
- 02_pandas_integration.ipynb - DataFrame workflows
- 03_amazon_scraping.ipynb - Amazon deep dive
- 04_linkedin_jobs.ipynb - Job market analysis
- 05_batch_processing.ipynb - Scale to production
- examples/10_pandas_integration.py - Pandas integration
- examples/01_simple_scrape.py - Basic usage
- examples/03_batch_scraping.py - Batch operations
- examples/04_specialized_scrapers.py - Platform-specific
- All examples β
- API Reference
- Contributing Guidelines (See upstream repo)
If you encounter SSL certificate verification errors, especially on macOS:
SSL: CERTIFICATE_VERIFY_FAILED
The SDK will provide helpful, platform-specific guidance. Quick fixes:
# Option 1: Upgrade certifi
pip install --upgrade certifi
# Option 2: Set SSL_CERT_FILE environment variable
export SSL_CERT_FILE=$(python -m certifi)
# Option 3: Run Install Certificates (macOS with python.org installer)
/Applications/Python\ 3.x/Install\ Certificates.command
# Option 4: Install via Homebrew (if using Homebrew Python)
brew install ca-certificates# Error: BRIGHTDATA_API_TOKEN not found in environment
# Solution 1: Create .env file
echo "BRIGHTDATA_API_TOKEN=your_token" > .env
# Solution 2: Export environment variable
export BRIGHTDATA_API_TOKEN="your_token"
# Solution 3: Pass directly to client
client = BrightDataClient(token="your_token")# If you get import errors, ensure package is installed
pip install --upgrade brightdata-sdk
# For development installation
pip install -e .Contributions are welcome! Check the GitHub repository for contribution guidelines.
git clone https://github.com/brightdata/sdk-python.git
cd sdk-python
# Install with dev dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run tests
pytest tests/- Production Code: ~9,000 lines
- Test Code: ~4,000 lines
- Documentation: 5 Jupyter notebooks + 10 examples
- Test Coverage: 502+ tests passing (Unit, Integration, E2E)
- Supported Platforms: Amazon, LinkedIn, ChatGPT, Facebook, Instagram, Generic Web
- Supported Search Engines: Google, Bing, Yandex
- Type Safety: 100% (Dataclasses + TypedDict)
- Resource Efficiency: Single shared AsyncEngine
- Data Science Ready: Pandas, tqdm, joblib integration
- CLI Tool: Full-featured command-line interface
- Code Quality: Enterprise-grade, FAANG standards
MIT License - see LICENSE file for details.
- Bright Data - Get your API token
- API Documentation
- GitHub Repository
- Issue Tracker
from brightdata import BrightDataClient
# Initialize (auto-loads from .env or environment)
client = BrightDataClient()
# Test connection
if client.test_connection_sync():
print("β
Connected to Bright Data API")
# Get account info
info = client.get_account_info_sync()
print(f"Active zones: {info['zone_count']}")
# Scrape Amazon product
product = client.scrape.amazon.products(
url="https://amazon.com/dp/B0CRMZHDG8"
)
if product.success:
print(f"Product: {product.data[0]['title']}")
print(f"Price: {product.data[0]['final_price']}")
print(f"Rating: {product.data[0]['rating']}")
print(f"Cost: ${product.cost:.4f}")
# Search LinkedIn jobs
jobs = client.search.linkedin.jobs(
keyword="python developer",
location="San Francisco",
remote=True
)
if jobs.success:
print(f"Found {len(jobs.data)} jobs")
# Scrape Facebook posts
fb_posts = client.scrape.facebook.posts_by_profile(
url="https://facebook.com/zuck",
num_of_posts=10,
timeout=240
)
if fb_posts.success:
print(f"Scraped {len(fb_posts.data)} Facebook posts")
# Scrape Instagram profile
ig_profile = client.scrape.instagram.profiles(
url="https://instagram.com/instagram",
timeout=240
)
if ig_profile.success:
print(f"Profile: {ig_profile.data[0]['username']}")
print(f"Followers: {ig_profile.data[0]['followers_count']}")
# Search Google
search_results = client.search.google(
query="python async tutorial",
location="United States",
num_results=10
)
if search_results.success:
for i, item in enumerate(search_results.data[:5], 1):
print(f"{i}. {item.get('title', 'N/A')}")Run the included demo to explore the SDK interactively:
python demo_sdk.pyBuilt with best practices from:
- Modern Python packaging (PEP 518, 621)
- Async/await patterns
- Type safety (PEP 484, 544, dataclasses)
- Enterprise-grade engineering standards
- Data science workflows (pandas, jupyter)
- π Data Scientists - Jupyter notebooks, pandas integration, visualization examples
- π¨βπ» Developers - Type-safe API, comprehensive docs, CLI tool
- π’ Enterprises - Production-ready, well-tested, resource-efficient
- β Data Scientist Friendly - 5 Jupyter notebooks, pandas examples, visualization guides
- β Type Safe - Dataclass payloads with runtime validation
- β Enterprise Ready - 502+ tests, resource efficient, production-proven
- β Well Documented - Interactive notebooks + code examples + API docs
- β Easy to Use - CLI tool, intuitive API, helpful error messages
- β Actively Maintained - Regular updates, bug fixes, new features
Ready to start scraping? Get your API token at brightdata.com and try our quickstart notebook!