|
1 | | -# wayback-go |
2 | | -A wayback machine site downloader |
| 1 | +# Wayback Go Downloader |
| 2 | + |
| 3 | +A command-line tool to download websites from the Wayback Machine, re-written in Go. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +This program is a Go port of the popular Ruby-based `wayback-machine-downloader` by hartator (available at [https://github.com/hartator/wayback-machine-downloader](https://github.com/hartator/wayback-machine-downloader)). It allows you to download all available snapshots of a given URL from the Internet Archive's Wayback Machine, saving them locally. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +* **Download Entire Websites:** Recursively downloads all files associated with a given URL from the Wayback Machine. |
| 12 | +* **Exact URL Download:** Option to download only the exact URL provided, without following links. |
| 13 | +* **Timestamp Filtering:** Specify `from` and `to` timestamps to download snapshots within a particular date range. |
| 14 | +* **Regex Filtering:** Include or exclude URLs based on regular expressions. |
| 15 | +* **All Timestamps:** Download all available timestamps for each file, not just the latest. |
| 16 | +* **Concurrency:** Utilizes multiple threads for faster downloads. |
| 17 | +* **List Only Mode:** Preview the list of files that would be downloaded in JSON format without actually downloading them. |
| 18 | +* **Error Handling:** Option to download all files, even those that return errors. |
| 19 | + |
| 20 | +## Installation |
| 21 | + |
| 22 | +To install `wayback-go`, you need to have Go installed on your system (Go 1.16 or later is recommended). |
| 23 | + |
| 24 | +1. **Clone the repository:** |
| 25 | + ```bash |
| 26 | + git clone https://github.com/your-username/wayback-go.git # Replace with actual repo URL |
| 27 | + cd wayback-go |
| 28 | + ``` |
| 29 | +2. **Build the executable:** |
| 30 | + ```bash |
| 31 | + go build -o wayback-go |
| 32 | + ``` |
| 33 | +3. **Move to your PATH (optional):** |
| 34 | + ```bash |
| 35 | + sudo mv wayback-go /usr/local/bin/ |
| 36 | + ``` |
| 37 | + |
| 38 | +## Usage |
| 39 | + |
| 40 | +```bash |
| 41 | +./wayback-go --url <URL> [options] |
| 42 | +``` |
| 43 | + |
| 44 | +### Options: |
| 45 | + |
| 46 | +* `--url <URL>`: The base URL to download from Wayback Machine (required). |
| 47 | +* `--exact-url`: Download only the exact URL. |
| 48 | +* `--dir <directory>`: Directory to save the downloaded files (defaults to `websites/<domain>`). |
| 49 | +* `--all-timestamps`: Download all available timestamps for each file. |
| 50 | +* `--from <timestamp>`: Download snapshots from this timestamp (e.g., `20060102150405`). |
| 51 | +* `--to <timestamp>`: Download snapshots to this timestamp (e.g., `20060102150405`). |
| 52 | +* `--only <regex>`: Only download URLs matching this regex filter. |
| 53 | +* `--exclude <regex>`: Exclude URLs matching this regex filter. |
| 54 | +* `--all`: Download all files, even if they return an error. |
| 55 | +* `--max-pages <number>`: Maximum number of snapshot pages to retrieve from Wayback Machine API (default: 100). |
| 56 | +* `--threads <number>`: Number of concurrent download threads (default: 1). |
| 57 | +* `--list`: Only list file URLs in JSON format, won't download anything. |
| 58 | +
|
| 59 | +### Examples: |
| 60 | +
|
| 61 | +1. **Download a website:** |
| 62 | + ```bash |
| 63 | + ./wayback-go --url https://example.com |
| 64 | + ``` |
| 65 | +2. **Download only a specific URL:** |
| 66 | + ```bash |
| 67 | + ./wayback-go --url https://example.com/page.html --exact-url |
| 68 | + ``` |
| 69 | +3. **Download with a specific output directory:** |
| 70 | + ```bash |
| 71 | + ./wayback-go --url https://example.com --dir my_archive |
| 72 | + ``` |
| 73 | +4. **Download snapshots from a specific date:** |
| 74 | + ```bash |
| 75 | + ./wayback-go --url https://example.com --from 20200101000000 --to 20201231235959 |
| 76 | + ``` |
| 77 | +5. **List files in JSON format:** |
| 78 | + ```bash |
| 79 | + ./wayback-go --url https://example.com --list |
| 80 | + ``` |
| 81 | +6. **Download with 5 concurrent threads:** |
| 82 | + ```bash |
| 83 | + ./wayback-go --url https://example.com --threads 5 |
| 84 | + ``` |
| 85 | +7. **Only download CSS files:** |
| 86 | + ```bash |
| 87 | + ./wayback-go --url https://example.com --only "\.css$" |
| 88 | + ``` |
| 89 | +
|
| 90 | +## Contributing |
| 91 | +
|
| 92 | +Contributions are welcome! Please feel free to open issues or submit pull requests. |
| 93 | +
|
| 94 | +## License |
| 95 | +
|
| 96 | +This project is licensed under the MIT License. See the `LICENSE` file for details. |
0 commit comments