⬇️ `abx-dl` [VAPORWARE] (please make this!)

A simple all-in-one CLI tool to auto-detect and download everything available from a URL.
pip install abx-dl
abx-dl 'https://example.com/page/to/download'

Important

❈ NOT IMPLEMENTED YET Coming someday... read the Plugin Ecosystem Announcement (2024-10)
_{Release ETA: after archivebox v0.9.0} You should make this! Use https://deepwiki.com/archivebox/abx-pkg to set up the dependencies like yt-dlp, ffmpeg, chrome, etc. + a single global event queue and single worker process/actor for each.

✨ Ever wish you could yt-dlp, gallery-dl, wget, curl, puppeteer, etc. all in one command?

abx-dl is an all-in-one CLI tool for downloading URLs "by any means necessary".

It's useful for scraping, downloading, OSINT, digital preservation, and more.
abx-dl is built to provide a simpler one-shot CLI interface to the ArchiveBox archiving engine (it replaces the old archivebox oneshot command).

🍜 What does it save?

abx-dl --extract=title,favicon,headers,wget,media,singlefile,screenshot,pdf,dom,readability,git,... 'https://example.com'`

abx-dl gets everything by default, or you can tell it to --extract=... specific methods:

HTML, JS, CSS, images, etc. rendered with a headless browser
title, favicon, headers, outlinks, and other metadata
audio, video, subtitles, playlists, comments
snapshot of the page as a PDF, screenshot, and Singlefile HTML
article text, git source code
and much more...

🧩 How does it work?

Forget about writing janky manual crawling scripts with JS/Python/playwright/puppeteer/bash.

abx-dl renders all URLs passed in a fully-featured modern browser using puppeteer. It auto-detects a wide variety of embedded resources using plugins, and extracts discovered content out to raw files (mp4, png, txt, pdf, html, etc.) in the current working directory.

abx-dl collects all of your favorite powerful scraping and downloading tools, including: wget, wget-lua, curl, puppeteer, playwright, singlefile, readability, yt-dlp, forum-dl, and many more through the ABX Plugin Library (shared with ArchiveBox)...

You no longer have to deal with installing and configuring a bunch of tools individually.

⚙️ What options does it provide?

Pass --extract=<methods> to get only what you need, and set other config via env vars / args:

USER_AGENT, CHECK_SSL_VALIDITY, CHROME_USER_DATA_DIR/COOKIES_TXT
TIMEOUT=60, MAX_MEDIA_SIZE=750m, RESOLUTION=1440,2000, ONLY_NEW=True
and more here...

^{Configuration options apply seamlessly across all methods.}

📦 Install `Coming Soon...`

pip install abx-dl
abx-dl install           # optional: install any system packages needed

🔠 Usage

# Basic usage:
abx-dl [--help|--version] [--config|-c] [--extract=methods] [url]

Download everything

abx-dl 'https://example.com'
ls ./
# <see All Outputs below>

Download just title + screenshot

abx-dl --extract=title,screenshot 'https://example.com'
ls ./
# index.json  title.txt  screenshot.png

Download title + screenshot + html + media

abx-dl --extract=title,favicon,screenshot,singlefile,media 'https://example.com'
ls ./
# index.json  index.html  title.txt  favicon.ico  screenshot.png  singlefile.html  media/Some_video.mp4

Pass config options

Config can be persisted via file, set via env vars, or passed via CLI args.

# set per-user config in ~/.config/abx-dl/abx-dl.conf
abx-dl config --set CHECK_SSL_VALIDITY=True

# environment variables work too and are equivalent
env CHROME_USER_DATA_DIR=~/.config/abx-dl/personas/Default/chrome_profile

# pass per-run config as CLI args
abx-dl -c MAX_MEDIA_SIZE=250m --extract=title,singlefile,screenshot,media 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'

All Outputs

index.json, index.html
title.txt, title.json, headers.json, favicon.ico
example.com/*.{html,css,js,png...}, warc/ (saved with wget-lua)
screenshot.png, dom.html, output.pdf (rendered with chrome)
media/someVideo.mp4, media/subtitles, ... (downloaded with yt-dlp)
readability/, mercury/, htmltotext.txt (article text/markdown)
git/ (source code)
... and more via plugin library ...

For more advanced use with collections, parallel downloading, a Web UI + REST API, etc.
See: ArchiveBox/ArchiveBox

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
abx_dl		abx_dl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

⬇️ `abx-dl` [VAPORWARE] (please make this!)

🍜 What does it save?

🧩 How does it work?

⚙️ What options does it provide?

📦 Install `Coming Soon...`

🔠 Usage

Download everything

Download just title + screenshot

Download title + screenshot + html + media

Pass config options

All Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

ArchiveBox/abx-dl

Folders and files

Latest commit

History

Repository files navigation

⬇️ abx-dl [VAPORWARE] (please make this!)

🍜 What does it save?

🧩 How does it work?

⚙️ What options does it provide?

📦 Install Coming Soon...

🔠 Usage

Download everything

Download just title + screenshot

Download title + screenshot + html + media

Pass config options

All Outputs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

⬇️ `abx-dl` [VAPORWARE] (please make this!)

📦 Install `Coming Soon...`

Packages