Skip to content

gigachad80/del-packURLs

Repository files navigation

🚀 Project Name : del-packURLs

Maintenance

del-packURLs : Automates the process of Information Disclosure Vulnerability Discovery

📌 Overview

    del-packURLs web security automation tool designed to enhance information disclosure vulnerability discovery, particularly in bug hunting scenarios. It streamlines the process of extracting potentially sensitive files and information. The tool leverages the Wayback Machine CDX API to retrieve archived URLs for a given target domain, filtering for specific file extensions (e.g., apk, dll, exe, json, txt, pdf, zip etc...). A key feature is its 200 History Mode,' which identifies when these files were accessible with a 200 OK status code, addressing the challenge of locating resources that are currently unavailable (404 Page Not Found). This automation aims to improve efficiency for bug hunters by providing direct terminal access to this historical data, common for Linux users. Furthermore, the tool can integrate with AI models (Gemini, Claude, GPT) to provide intelligent suggestions on potentially sensitive PDF files. The tool also provides functionality to use concurrency."

🙃Why I created this

   The main reason for developing this was so that pentesters could efficiently perform the maximum number of information disclosure vulnerability finding tasks from the terminal. When I watched Lossec's video (or rather, I saw this reel first, then watched Lossec's video a few months later, and even observed some bug hunters), I noticed that most people perform these tasks manually instead of using the terminal. What kind of Linux user abandons the terminal to work manually? Also, I saw that when both encounter a '404 Page not found' error, they manually go to each link and enter it into the web archive to see when it was live (200 Status OK). With so many links, the user won't understand which one is good and sensitive, so I developed a solution where it fetches from the terminal, shows live archived links from the terminal itself, and provides AI recommendation of sensitive PDFs . I know that people can fetch using the curl command, but I used Golang to make it a bit faster. One thing to note here is that if the internet speed is fast while fetching with curl and slow while using Go, curl's result will come sooner, even though Go has good performance. But I thought, why use curl when I have the standard library? Performance + Fast Internet Speed 🗿

📚 Requirements & Dependencies

  • Golang

  • Python 3

📥 Installation Guide & Usage

  1. Git clone the repo :
git clone https://github.com/gigachad80/del-packURLs
  1. Go to del-packURLs directory and give permission to main.go or you can directly build from the source ( go build del-packURLs.go )
  2. Run command ./del-packURLs.go . Please note that either you can use whole syntax like this
  • ./del-packURLs -domain example.com ---- and rest of flags or just type the command
  • ./del-packURLs and it'll ask for domain & extension . Enter your target domain/URL and flags and run it .
  1. For help or menu guide , enter del-packURLs.go -h

🤨 How it is different from grep-backURLs ?

grep-backURLs del-packURLs
Uses keyword from keyword.txt to find sensitive data Uses Wayback CDX API and pre-defined keywords to find sensitive files
Finds all URLs Finds only files
Does not Use AI Use AI models like Gemini , Claude , GPT to suggest sensitive PDFs for analysis

Fun Fact : I developed both of them.🤓 My Repo for grep-backURLs: Repo link

📝 Roadmap / To-do

  • Add -load flag in syntax .
  • Update README.md with Demo sntax to use
  • Add Back to Main Menu functionality
  • Add more keywords for in sort-keywords.py for sensitive docs .
  • Support for Virustotal & Alienvault to fetch URLs just like CDX API.
  • Not sure but if possible, I'll integrate AI file analysis ( for image , text , pdf etc...)

Notes to be Taken / Imp. Points .

Note

  • PDF Suggestion & Analysis: AI will only recommend sensitive PDF while PyMuPDF will analyze it.
  • Decode URLs:Check go file line 219 if you need decoded URLs to fetch.'.
  • Requirements txt : pip installs all AI models by default , so if you wnt to use single AI model , then install only that.'.
  • Modify prompt : Check line 106 of ai-suggestor.py in to modify the prompt for suggestions.'.
  • Python : It uses python for Windows & python3 for Linux'.
  • AI testing: I have only tested Gemini so far, because ChatGPT and Claude's API keys are not free, that's why
  • Starting Download: Script shows "Downloading: [URL]" first.
  • File Not Found: "Not Found (404): [URL]" means URL is broken/removed.
  • Download Error: "Error downloading [URL]: [error details]" indicates a network issue.
  • PDF Processing Error: "Error processing PDF [URL]: [error]" means file isn't a valid PDF. Even if it's PDF, PyMuPDF lib. won't be unable to analyse that.
  • No Keywords: "No sensitive keywords found in: [URL]" means PDF text lacks defined terms.
  • Keywords Found: "Found keywords: [keywords] in [URL]" means terms were detected in PDF.
  • Keyword Found Color: Green output indicates keywords were successfully found.
  • Error Colors: Red output signals download or processing errors.
  • Ctrl+C with Concurrency: Ctrl+C will not stop immediately with concurrency (esp. with sort-keywords.py). It'll process & analyze all PDF then.
  • Output File: Sensitive URLs with keywords saved to 'sorted-keywords.txt'.

Tip

  • Concurrency Impact: Concurrency ("yes" flag) can speed up checks.
  • Use grep for sorting Found Keyword(s) from sorted-keywords.txt file.

🤔 Why This Name?

First, I decided to use both the Web Archive CDX API and Waybackpack (one for fetching and one for showing the 200 status of archived URLs). However, after trying a lot, Waybackpack didn't work. Then, one day, an idea suddenly came to me: why not just do it normally using the CDX API, which would show timestamps, status codes, and URLs? After modifying it a bit, it easily showed all the archived URLs that once had a 200 OK status code but are currently 404. So, even though I didn't end up using Waybackpack, it was my initial approach which refers to pack ,and del refers to deleted (404 Page not found). So, I named it del-packURLs.

⌚ Total Time taken in development , testing , trying diferent approaches & variations , debugging , even writing README.

Approx 18 hr 10 min

💓 Credits:

I extend my sincere gratitude to both IHA org. & CoffinXP for creating video This project simply wouldn't exist if they hadn't created it.

📞 Contact

📧 Email: [email protected]

📄 License

Licensed under GNU General Public License v3.0

🕒 Last Updated: April 4, 2025

Releases

No releases published

Packages

No packages published