initial commit

cndreisbach · cndreisbach · commit a693fbad5ba2 · 2015-02-10T12:49:00.000-05:00
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,113 @@
+# Created by https://www.gitignore.io
+
+### IPythonNotebook ###
+# Temporary data
+.ipynb_checkpoints/
+
+
+### Python ###
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.cache
+nosetests.xml
+coverage.xml
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+
+### PyCharm ###
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm
+
+*.iml
+
+## Directory-based project format:
+.idea/
+# if you remove the above rule, at least ignore the following:
+
+# User-specific stuff:
+# .idea/workspace.xml
+# .idea/tasks.xml
+# .idea/dictionaries
+
+# Sensitive or high-churn files:
+# .idea/dataSources.ids
+# .idea/dataSources.xml
+# .idea/sqlDataSources.xml
+# .idea/dynamic.xml
+# .idea/uiDesigner.xml
+
+# Gradle:
+# .idea/gradle.xml
+# .idea/libraries
+
+# Mongo Explorer plugin:
+# .idea/mongoSettings.xml
+
+## File-based project format:
+*.ipr
+*.iws
+
+## Plugin-specific files:
+
+# IntelliJ
+out/
+
+# mpeltonen/sbt-idea plugin
+.idea_modules/
+
+# JIRA plugin
+atlassian-ide-plugin.xml
+
+# Crashlytics plugin (for Android Studio and IntelliJ)
+com_crashlytics_export_strings.xml
+crashlytics.properties
+crashlytics-build.properties
+
+atusdata/
diff --git a/.homework.json b/.homework.json
@@ -0,0 +1,10 @@
+{
+  "name": "spambase",
+  "version": "0.0.1",
+  "title": "Classifing spam",
+  "description": "Use the Spambase dataset to classify spam. This data is already parsed down from email to features.",
+  "keywords": [
+    "machine-learning",
+    "bayes"
+  ]
+}
diff --git a/README.md b/README.md
@@ -0,0 +1,42 @@
+# Classifing spam
+
+## Description
+
+Use the Spambase dataset to classify spam. This data is already parsed down from email to features.
+
+## Objectives
+
+### Learning Objectives
+
+After completing this assignment, you should understand:
+
+* Simple Bayesian analysis
+* The importance of separating training and test data
+
+### Performance Objectives
+
+After completing this assignment, you should be able to:
+
+* Create a Bayesian classifier
+* Train your classifier
+* Test your classifier
+
+## Details
+
+### Deliverables
+
+* A Git repo called spambase containing at least:
+  * `README.md` file explaining how to run your project
+  * a `requirements.txt` file
+
+### Requirements  
+
+* No PEP8 or Pyflakes warnings or errors
+
+## Normal Mode
+
+Go to the UCI Machine Learning repository and [download the Spambase dataset](https://archive.ics.uci.edu/ml/datasets/Spambase). Make sure you [read the documentation for the data](https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.DOCUMENTATION). This explains what the attributes are in the data file.
+
+Subsample the data set so 60% is training data and 40% is test data. You can subsample however you like, including splitting the original file. Just make sure that you have a representative data set. (The original is about 60% not-spam and 40% spam.)
+
+Then write code to classify the data into spam and not-spam, training with your training data and testing on your test data. Try multiple classifiers to see which gives you the highest success.
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,4 @@
+scikit-learn
+pandas
+numpy
+matplotlib

-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +scikit-learn
 +pandas
 +numpy
 +matplotlib