Scrapes LinkedIn and Indeed four times a day via GitHub Actions for new postings matching target titles, and emails a digest via Gmail SMTP.
- Titles (each queried separately against each board): Data Scientist, Machine Learning Engineer, Solutions Architect, Sales Engineer, Forward Deployed Engineer
- Locations: San Francisco Bay Area, San Diego Metropolitan Area, and US-Remote (three passes)
- Recency: postings from the last 24 hours
- Seniority filter: excludes titles containing
senior,sr,staff,principal,lead,manager,director,head of,vp,chief,distinguished,fellow, or roman-numeral levelsIII/IV— roles that typically require years of experience.
Edit TITLES, LOCATIONS, and EXCLUDED_SENIORITY_TERMS at the top of
scraper/main.py to change what's searched and what's
filtered.
GitHub Actions cron (4x/day)
-> scraper.main
-> scraper.linkedin.search(...) # per title, per location
-> scraper.indeed.search(...)
-> dedupe against seen_jobs.json
-> if any new: notify.send_digest(...) (Gmail SMTP)
-> commit updated seen_jobs.json back to the repo
State is persisted by committing seen_jobs.json back to the default branch,
so the next run's fresh VM knows what it has already reported.
git init -b main
git add .
git commit -m "Initial commit: job scraper"
# Create an empty private repo on github.com first, then:
git remote add origin https://github.com/<you>/job-search.git
git push -u origin mainOr with gh:
gh repo create job-search --private --source=. --pushThe GitHub Secret cannot be your regular Gmail password -- Google blocks SMTP logins with account passwords. You need an App Password, which requires 2-Step Verification on the account.
- Enable 2-Step Verification at https://myaccount.google.com/security.
- Go to https://myaccount.google.com/apppasswords.
- Create a new password named e.g.
job-search-actions. - Copy the 16-character password shown (you can paste it with or without spaces; Gmail accepts both).
In your repo on GitHub: Settings -> Secrets and variables -> Actions -> New repository secret. Add:
| Secret | Value |
|---|---|
GMAIL_USER |
your Gmail address |
GMAIL_APP_PASSWORD |
the 16-character App Password from step 2 |
NOTIFY_TO |
(optional) where to send alerts; defaults to GMAIL_USER |
The workflow commits seen_jobs.json back to the repo, so it needs write
access to contents.
Settings -> Actions -> General -> Workflow permissions -> Read and write permissions -> Save.
The permissions: contents: write at the top of the workflow is necessary
but not sufficient; the repo-level setting is a ceiling.
Actions tab -> job-scraper -> Run workflow -> Run workflow.
Expect the first run's email to contain many jobs (everything visible is "new"). Subsequent runs should only email deltas.
Cron 0 6,12,18,0 * * * (UTC) -> roughly 01:00, 07:00, 13:00, 19:00 ET.
GitHub cron is best-effort and can be delayed 5-15 minutes under load.
To change the cadence, edit the cron line in
.github/workflows/job-scraper.yml.
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
export GMAIL_USER=you@gmail.com
export GMAIL_APP_PASSWORD='xxxx xxxx xxxx xxxx'
export NOTIFY_TO=you@gmail.com
python -m scraper.mainThe scraper will read and write seen_jobs.json in the repo root; delete
it to force a "first run" where everything is treated as new.
- LinkedIn rate-limits by IP. Expect occasional empty-result runs or HTTP 429/999 from GitHub Actions runner IPs. The scraper logs and tolerates these.
- Indeed aggressively Cloudflare-challenges datacenter IPs. Many runs from GitHub Actions will return zero results. Treat it as a bonus; the primary signal is LinkedIn.
- CSS selectors on both sites change occasionally. If runs start
consistently returning zero jobs, check the action logs for
"0 cards parsed"messages and update the selectors in the respective module. - First run emails a large digest. Pre-seed
seen_jobs.jsonwith a previous snapshot if you want to avoid this.
- Secrets are encrypted at rest by GitHub and scrubbed from log output; workflows triggered by forks of public repos do not receive them.
- The workflow's
GITHUB_TOKENis ephemeral per-run and scoped by thepermissions:block. We request onlycontents: write. - Commits authored by the bot use
job-scraper-bot@users.noreply.github.comso they do not count as contributions for any real user and do not link to a profile.
- Runner:
ubuntu-24.04 - Actions:
actions/checkout@v4.2.2,actions/setup-python@v5.3.0 - Python: 3.11
- pip: 24.3.1
- Packages: see
requirements.txt