Skip to content

manishm96/Discovering_malicious_websites_using_data_mining_algorithms

Repository files navigation

CMPE255-Team-Project-Fall-2021

Discover Malicious Websites Using Data Mining Algorithms

Team Members

[Srinivas Gutta] (https://github.com/97gutta)

[Sai Gowtham Ande] (https://github.com/SaiGowtham-11)

[Manish Mapakshi] (https://github.com/manishm96)

[Pratibha Awasthi] (https://github.com/PratibhaAwasthi)

What data you’ll use and where you’ll get it?

This dataset contains a collection of legitimate and phishing website examples. Each website has a set of characteristics that indicate if it is real or not. Data can be used as a source of information in the machine learning process.

Dataset Link: https://data.mendeley.com/datasets/72ptz43s9v/1

DESCRIPTION OF THE PROBLEM:

Web attackers mostly target people in order to steal their personal information. Hackers attempt to duplicate the original website and then exploit the user. The website appears to be legitimate to the user, but it is not. When a user enters their credentials on a fraudulent website, the information is sent to the attackers' servers, where they can obtain credit card information, personal information, or install malware on the user's laptop. As the quantity of online transactions grows, one becomes increasingly vulnerable to these attacks.

POTENTIAL METHODS:

We propose a strategy that employs data mining algorithms to identify dangerous websites by tracking down the URL, in order to limit the amount of attacks leading to phony websites. Attackers change the subdomain and file path (if it occurs in the URL) or introduce a typographical error to resemble a legitimate website. As a result, in order to detect phishing websites, we must study the URL and see what each element contains.The Potential methods can be like Logistic regression,Naive Bayes Classifier, Random forest. Once we start getting deep into the project we plan to include some more methods.

Measurement of Success

The goal of our project is to find the best optimal data mining algorithm based on its accuracy in order to identify a fraudulent website that steals information from users.

Packages Required to Run the Notebook

Install Missing No library, pip install missingpy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published