Skip to content

Commit a693fba

Browse files
committed
initial commit
0 parents  commit a693fba

File tree

4 files changed

+169
-0
lines changed

4 files changed

+169
-0
lines changed

.gitignore

+113
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Created by https://www.gitignore.io
2+
3+
### IPythonNotebook ###
4+
# Temporary data
5+
.ipynb_checkpoints/
6+
7+
8+
### Python ###
9+
# Byte-compiled / optimized / DLL files
10+
__pycache__/
11+
*.py[cod]
12+
13+
# C extensions
14+
*.so
15+
16+
# Distribution / packaging
17+
.Python
18+
env/
19+
build/
20+
develop-eggs/
21+
dist/
22+
downloads/
23+
eggs/
24+
lib/
25+
lib64/
26+
parts/
27+
sdist/
28+
var/
29+
*.egg-info/
30+
.installed.cfg
31+
*.egg
32+
33+
# PyInstaller
34+
# Usually these files are written by a python script from a template
35+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
36+
*.manifest
37+
*.spec
38+
39+
# Installer logs
40+
pip-log.txt
41+
pip-delete-this-directory.txt
42+
43+
# Unit test / coverage reports
44+
htmlcov/
45+
.tox/
46+
.coverage
47+
.cache
48+
nosetests.xml
49+
coverage.xml
50+
51+
# Translations
52+
*.mo
53+
*.pot
54+
55+
# Django stuff:
56+
*.log
57+
58+
# Sphinx documentation
59+
docs/_build/
60+
61+
# PyBuilder
62+
target/
63+
64+
65+
### PyCharm ###
66+
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm
67+
68+
*.iml
69+
70+
## Directory-based project format:
71+
.idea/
72+
# if you remove the above rule, at least ignore the following:
73+
74+
# User-specific stuff:
75+
# .idea/workspace.xml
76+
# .idea/tasks.xml
77+
# .idea/dictionaries
78+
79+
# Sensitive or high-churn files:
80+
# .idea/dataSources.ids
81+
# .idea/dataSources.xml
82+
# .idea/sqlDataSources.xml
83+
# .idea/dynamic.xml
84+
# .idea/uiDesigner.xml
85+
86+
# Gradle:
87+
# .idea/gradle.xml
88+
# .idea/libraries
89+
90+
# Mongo Explorer plugin:
91+
# .idea/mongoSettings.xml
92+
93+
## File-based project format:
94+
*.ipr
95+
*.iws
96+
97+
## Plugin-specific files:
98+
99+
# IntelliJ
100+
out/
101+
102+
# mpeltonen/sbt-idea plugin
103+
.idea_modules/
104+
105+
# JIRA plugin
106+
atlassian-ide-plugin.xml
107+
108+
# Crashlytics plugin (for Android Studio and IntelliJ)
109+
com_crashlytics_export_strings.xml
110+
crashlytics.properties
111+
crashlytics-build.properties
112+
113+
atusdata/

.homework.json

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"name": "spambase",
3+
"version": "0.0.1",
4+
"title": "Classifing spam",
5+
"description": "Use the Spambase dataset to classify spam. This data is already parsed down from email to features.",
6+
"keywords": [
7+
"machine-learning",
8+
"bayes"
9+
]
10+
}

README.md

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Classifing spam
2+
3+
## Description
4+
5+
Use the Spambase dataset to classify spam. This data is already parsed down from email to features.
6+
7+
## Objectives
8+
9+
### Learning Objectives
10+
11+
After completing this assignment, you should understand:
12+
13+
* Simple Bayesian analysis
14+
* The importance of separating training and test data
15+
16+
### Performance Objectives
17+
18+
After completing this assignment, you should be able to:
19+
20+
* Create a Bayesian classifier
21+
* Train your classifier
22+
* Test your classifier
23+
24+
## Details
25+
26+
### Deliverables
27+
28+
* A Git repo called spambase containing at least:
29+
* `README.md` file explaining how to run your project
30+
* a `requirements.txt` file
31+
32+
### Requirements
33+
34+
* No PEP8 or Pyflakes warnings or errors
35+
36+
## Normal Mode
37+
38+
Go to the UCI Machine Learning repository and [download the Spambase dataset](https://archive.ics.uci.edu/ml/datasets/Spambase). Make sure you [read the documentation for the data](https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.DOCUMENTATION). This explains what the attributes are in the data file.
39+
40+
Subsample the data set so 60% is training data and 40% is test data. You can subsample however you like, including splitting the original file. Just make sure that you have a representative data set. (The original is about 60% not-spam and 40% spam.)
41+
42+
Then write code to classify the data into spam and not-spam, training with your training data and testing on your test data. Try multiple classifiers to see which gives you the highest success.

requirements.txt

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
scikit-learn
2+
pandas
3+
numpy
4+
matplotlib

0 commit comments

Comments
 (0)