[Srinivas Gutta] (https://github.com/97gutta)
[Sai Gowtham Ande] (https://github.com/SaiGowtham-11)
[Manish Mapakshi] (https://github.com/manishm96)
[Pratibha Awasthi] (https://github.com/PratibhaAwasthi)
This dataset contains a collection of legitimate and phishing website examples. Each website has a set of characteristics that indicate if it is real or not. Data can be used as a source of information in the machine learning process.
Dataset Link: https://data.mendeley.com/datasets/72ptz43s9v/1
Web attackers mostly target people in order to steal their personal information. Hackers attempt to duplicate the original website and then exploit the user. The website appears to be legitimate to the user, but it is not. When a user enters their credentials on a fraudulent website, the information is sent to the attackers' servers, where they can obtain credit card information, personal information, or install malware on the user's laptop. As the quantity of online transactions grows, one becomes increasingly vulnerable to these attacks.
We propose a strategy that employs data mining algorithms to identify dangerous websites by tracking down the URL, in order to limit the amount of attacks leading to phony websites. Attackers change the subdomain and file path (if it occurs in the URL) or introduce a typographical error to resemble a legitimate website. As a result, in order to detect phishing websites, we must study the URL and see what each element contains.The Potential methods can be like Logistic regression,Naive Bayes Classifier, Random forest. Once we start getting deep into the project we plan to include some more methods.
The goal of our project is to find the best optimal data mining algorithm based on its accuracy in order to identify a fraudulent website that steals information from users.
Install Missing No library, pip install missingpy