|
| 1 | +Links |
| 2 | +===== |
| 3 | + |
| 4 | +## Preface |
| 5 | + |
| 6 | +[Data Science Venn Diagram](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) |
| 7 | + |
| 8 | +## 1. Introduction |
| 9 | + |
| 10 | +[OkCupid Questions](http://blog.okcupid.com/index.php/the-best-questions-for-first-dates/) |
| 11 | + |
| 12 | +[Facebook on coordinated migration](https://www.facebook.com/notes/facebook-data-science/coordinated-migration/10151930946453859) |
| 13 | + |
| 14 | +[Facebook on NFL fandom](https://www.facebook.com/notes/facebook-data-science/nfl-fans-on-facebook/10151298370823859) |
| 15 | + |
| 16 | +[Target's predictive modeling](http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html) |
| 17 | + |
| 18 | +[Making government more effective](http://www.marketplace.org/topics/tech/beyond-ad-clicks-using-big-data-social-good) |
| 19 | + |
| 20 | +[Helping homelessness](http://dssg.io/2014/08/20/paths-homelessness.html) |
| 21 | + |
| 22 | +[Improving public health](https://plus.google.com/communities/109572103057302114737) |
| 23 | + |
| 24 | +## 2. A Crash Course in Python |
| 25 | + |
| 26 | +http://python.org |
| 27 | + |
| 28 | +[Anaconda](https://store.continuum.io/cshop/anaconda/) |
| 29 | + |
| 30 | +[pip](https://pypi.python.org/pypi/pip) |
| 31 | + |
| 32 | +[IPython](http://ipython.org/) |
| 33 | + |
| 34 | +[the Zen of Python](http://legacy.python.org/dev/peps/pep-0020/) |
| 35 | + |
| 36 | +[official Python tutorial](https://docs.python.org/2/tutorial/) |
| 37 | + |
| 38 | +[official IPython tutorial](http://ipython.org/ipython-doc/2/interactive/tutorial.html) |
| 39 | + |
| 40 | +[IPython videos and presentations](http://ipython.org/videos.html) |
| 41 | + |
| 42 | +[Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) |
| 43 | + |
| 44 | +## 3. Visualizing Data |
| 45 | + |
| 46 | +[matplotlib](http://matplotlib.org/) |
| 47 | + |
| 48 | +[seaborn](http://www.stanford.edu/~mwaskom/software/seaborn/) |
| 49 | + |
| 50 | +[D3.js](http://d3js.org/) |
| 51 | + |
| 52 | +[Bokeh](http://bokeh.pydata.org/) |
| 53 | + |
| 54 | +[ggplot](https://pypi.python.org/pypi/ggplot) |
| 55 | + |
| 56 | +## 4. Linear Algebra |
| 57 | + |
| 58 | +[Linear Algebra, from UC Davis](https://www.math.ucdavis.edu/~linear/) |
| 59 | + |
| 60 | +[Linear Algebra, from Saint Michael's College](http://joshua.smcvt.edu/linearalgebra/) |
| 61 | + |
| 62 | +[Linear Algebra Done Wrong](http://www.math.brown.edu/~treil/papers/LADW/LADW.html) |
| 63 | + |
| 64 | +[SciPy linear algebra module](http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html) |
| 65 | + |
| 66 | +## 5. Statistics |
| 67 | + |
| 68 | +[Non-obvious tricks for computing medians](http://en.wikipedia.org/wiki/Quickselect) |
| 69 | + |
| 70 | +[Almost "average squared deviation from the mean"](http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation) |
| 71 | + |
| 72 | +["angrily accused of experimenting on your users"](http://www.nytimes.com/2014/06/30/technology/facebook-tinkers-with-users-emotions-in-news-feed-experiment-stirring-outcry.html) |
| 73 | + |
| 74 | +[SciPy stats](http://docs.scipy.org/doc/scipy/reference/stats.html) |
| 75 | + |
| 76 | +[pandas](http://pandas.pydata.org/) |
| 77 | + |
| 78 | +[StatsModels](http://statsmodels.sourceforge.net/) |
| 79 | + |
| 80 | +[OpenIntro Statistics](https://www.openintro.org/stat/textbook.php) |
| 81 | + |
| 82 | +[OpenStax Introductory Statistics](http://openstaxcollege.org/textbooks/introductory-statistics) |
| 83 | + |
| 84 | +## 6. Probability |
| 85 | + |
| 86 | +[the Monty Hall Problem](http://en.wikipedia.org/wiki/Monty_Hall_problem) |
| 87 | + |
| 88 | +[error function](http://en.wikipedia.org/wiki/Error_function) |
| 89 | + |
| 90 | +[binary search](http://en.wikipedia.org/wiki/Binary_search_algorithm) |
| 91 | + |
| 92 | +[SciPy stats](http://docs.scipy.org/doc/scipy/reference/stats.html) |
| 93 | + |
| 94 | +[Introduction to Probability](http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf) |
| 95 | + |
| 96 | +## 7. Hypothesis and Inference |
| 97 | + |
| 98 | +[continuity correction](http://en.wikipedia.org/wiki/Continuity_correction) |
| 99 | + |
| 100 | +[P-hacking](http://www.nature.com/news/scientific-method-statistical-errors-1.14700) |
| 101 | + |
| 102 | +["The Earth Is Round (p < .05)"](http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf) |
| 103 | + |
| 104 | +[conjugate priors](http://www.johndcook.com/blog/conjugate_prior_diagram/) |
| 105 | + |
| 106 | +[Coursera -- Data Analysis and Statistical Inference](https://www.coursera.org/course/statistics) |
| 107 | + |
| 108 | +## 8. Gradient Descent |
| 109 | + |
| 110 | +[Active Calculus](http://gvsu.edu/s/xr/) |
| 111 | + |
| 112 | +[scikit-learn stochastic gradient descent](http://scikit-learn.org/stable/modules/sgd.html) |
| 113 | + |
| 114 | +## 9. Getting Data |
| 115 | + |
| 116 | +[running Python scripts without the Python command](http://stackoverflow.com/questions/15587877/run-a-python-script-in-terminal-without-the-python-command) |
| 117 | + |
| 118 | +[opening csv files in binary mode](http://stackoverflow.com/questions/4249185/using-python-to-append-csv-files) |
| 119 | + |
| 120 | +[BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) |
| 121 | + |
| 122 | +[requests](http://docs.python-requests.org/en/latest/) |
| 123 | + |
| 124 | +[GitHub API](http://developer.github.com/v3/) |
| 125 | + |
| 126 | +http://www.pythonapi.com/ |
| 127 | + |
| 128 | +http://www.pythonforbeginners.com/development/list-of-python-apis/ |
| 129 | + |
| 130 | +http://www.programmableweb.com/ |
| 131 | + |
| 132 | +[Twython](https://github.com/ryanmcgrath/twython) |
| 133 | + |
| 134 | +https://apps.twitter.com/ |
| 135 | + |
| 136 | +[Twitter Search API](https://dev.twitter.com/docs/api/1.1/get/search/tweets) |
| 137 | + |
| 138 | +[unicode](https://docs.python.org/2/howto/unicode.html) |
| 139 | + |
| 140 | +[Twitter Streaming API](https://dev.twitter.com/docs/api/1.1/get/statuses/sample) |
| 141 | + |
| 142 | +[scrapy](http://scrapy.org/) |
| 143 | + |
| 144 | +[pandas](http://pandas.pydata.org/) |
| 145 | + |
| 146 | +## 10. Working With Data |
| 147 | + |
| 148 | +[pandas](http://pandas.pydata.org/) |
| 149 | + |
| 150 | +[Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) |
| 151 | + |
| 152 | +[scikit-learn matrix decomposition](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) |
| 153 | + |
| 154 | +## 11. Machine Learning |
| 155 | + |
| 156 | +[prevalence of "Luke"](http://www.babycenter.com/babyNameAllPops.htm?babyNameId=2918) |
| 157 | + |
| 158 | +[prevalence of leukemia](http://seer.cancer.gov/statfacts/html/leuks.html) |
| 159 | + |
| 160 | +[harmonic mean](http://en.wikipedia.org/wiki/Harmonic_mean) |
| 161 | + |
| 162 | +[Coursera -- Machine Learning](https://www.coursera.org/course/ml) |
| 163 | + |
| 164 | +[Caltech -- Machine Learning](https://work.caltech.edu/telecourse.html) |
| 165 | + |
| 166 | +[The Elements of Statistical Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/) |
| 167 | + |
| 168 | +## 12. Nearest Neighbors |
| 169 | + |
| 170 | +[the length represented by a degree of longitude](http://en.wikipedia.org/wiki/Longitude#Length_of_a_degree_of_longitude) |
| 171 | + |
| 172 | +[scikit-learn nearest neighbor models](http://scikit-learn.org/stable/modules/neighbors.html) |
| 173 | + |
| 174 | +## 13. Naive Bayes |
| 175 | + |
| 176 | +[SpamAssassin public corpus](https://spamassassin.apache.org/publiccorpus/) |
| 177 | + |
| 178 | +[7-Zip](http://www.7-zip.org/) |
| 179 | + |
| 180 | +[the Porter stemmer](http://tartarus.org/martin/PorterStemmer/) |
| 181 | + |
| 182 | +["A Plan for Spam"](http://www.paulgraham.com/spam.html) |
| 183 | + |
| 184 | +["Better Bayesian Filtering"](http://www.paulgraham.com/better.html) |
| 185 | + |
| 186 | +[scikit-learn Naive Bayes](http://scikit-learn.org/stable/modules/naive_bayes.html) |
| 187 | + |
| 188 | +## 14. Simple Linear Regression |
| 189 | + |
| 190 | +## 15. Multiple Regression |
| 191 | + |
| 192 | +[scikit-learn linear model](http://scikit-learn.org/stable/modules/linear_model.html) |
| 193 | + |
| 194 | +[StatsModels](http://statsmodels.sourceforge.net/) |
| 195 | + |
| 196 | +## 16. Logistic Regression |
| 197 | + |
| 198 | +[scikit-learn logistic regression](http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression) |
| 199 | + |
| 200 | +[scikit-learn support vector machines](http://scikit-learn.org/stable/modules/svm.html) |
| 201 | + |
| 202 | +[libsvm](http://www.csie.ntu.edu.tw/~cjlin/libsvm/) |
| 203 | + |
| 204 | +## 17. Decision Trees |
| 205 | + |
| 206 | +[Twenty Questions](http://en.wikipedia.org/wiki/Twenty_Questions) |
| 207 | + |
| 208 | +[scikit-learn decision trees](http://scikit-learn.org/stable/modules/tree.html) |
| 209 | + |
| 210 | +[scikit-learn ensembles](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble) |
| 211 | + |
| 212 | +http://en.wikipedia.org/wiki/Decision_tree_learning |
| 213 | + |
| 214 | +## 18. Neural Networks |
| 215 | + |
| 216 | +[Coursera -- Neural Networks for Machine Learning](https://www.coursera.org/course/neuralnets) |
| 217 | + |
| 218 | +[Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/) |
| 219 | + |
| 220 | +[pybrain](http://pybrain.org/) |
| 221 | + |
| 222 | +## 19. Clustering |
| 223 | + |
| 224 | +[RGB color model](http://en.wikipedia.org/wiki/RGB_color_model) |
| 225 | + |
| 226 | +[SciPy](http://www.scipy.org/) |
| 227 | + |
| 228 | +## 20. Natural Language Processing |
| 229 | + |
| 230 | +["What is Data Science"](http://radar.oreilly.com/2010/06/what-is-data-science.html) |
| 231 | + |
| 232 | +[Natural Language Toolkit](http://www.nltk.org/) |
| 233 | + |
| 234 | +[NLTK book](http://www.nltk.org/book/) |
| 235 | + |
| 236 | +[gensim](http://radimrehurek.com/gensim/) |
| 237 | + |
| 238 | +## 21. Network Analysis |
| 239 | + |
| 240 | +[Centrality](http://en.wikipedia.org/wiki/Centrality) |
| 241 | + |
| 242 | +[NetworkX](http://networkx.github.io/) |
| 243 | + |
| 244 | +[Gephi](http://gephi.github.io/) |
| 245 | + |
| 246 | +## 22. Recommender Systems |
| 247 | + |
| 248 | +[Crab](http://muricoca.github.io/crab/) |
| 249 | + |
| 250 | +[Graphlab recommender toolkit](http://graphlab.com/products/create/docs/graphlab.toolkits.recommender.html) |
| 251 | + |
| 252 | +[Netflix prize](http://www.netflixprize.com/) |
| 253 | + |
| 254 | +## 23. Databases |
| 255 | + |
| 256 | +[SQLite](http://www.sqlite.org/) |
| 257 | + |
| 258 | +[MySQL](http://www.mysql.com/) |
| 259 | + |
| 260 | +[PostgreSQL](http://www.postgresql.org/) |
| 261 | + |
| 262 | +[MongoDB](http://www.mongodb.org/) |
| 263 | + |
| 264 | +[NoSQL](http://en.wikipedia.org/wiki/NoSQL) |
| 265 | + |
| 266 | +## 24. Map-Reduce |
| 267 | + |
| 268 | +[Hadoop](http://hadoop.apache.org/) |
| 269 | + |
| 270 | +[Elastic MapReduce](http://aws.amazon.com/elasticmapreduce/) |
| 271 | + |
| 272 | +[mrjob](https://github.com/Yelp/mrjob) |
| 273 | + |
| 274 | +[Spark](http://spark.apache.org/) |
| 275 | + |
| 276 | +[Storm](http://storm.incubator.apache.org/) |
| 277 | + |
| 278 | +## 25. Go Forth And Do Data Science |
| 279 | + |
| 280 | +[IPython](http://ipython.org/) |
| 281 | + |
| 282 | +[NumPy](http://www.numpy.org/) |
| 283 | + |
| 284 | +[pandas](http://pandas.pydata.org/) |
| 285 | + |
| 286 | +[scikit-learn](http://scikit-learn.org/) |
| 287 | + |
| 288 | +[many, many scikit-learn examples](http://scikit-learn.org/stable/auto_examples/) |
| 289 | + |
| 290 | +[matplotlib examples](http://matplotlib.org/examples/) |
| 291 | + |
| 292 | +[matplotlib gallery](http://matplotlib.org/gallery.html) |
| 293 | + |
| 294 | +[seaborn](http://web.stanford.edu/~mwaskom/software/seaborn/) |
| 295 | + |
| 296 | +[D3.js](http://d3js.org/) |
| 297 | + |
| 298 | +[D3 gallery](https://github.com/mbostock/d3/wiki/Gallery) |
| 299 | + |
| 300 | +[Bokeh](http://bokeh.pydata.org/) |
| 301 | + |
| 302 | +[Data.gov](http://www.data.gov/) |
| 303 | + |
| 304 | +[r/datasets](http://www.reddit.com/r/datasets) and [r/data](http://www.reddit.com/r/data) |
| 305 | + |
| 306 | +[Amazon public data sets](http://aws.amazon.com/public-data-sets/) |
| 307 | + |
| 308 | +[100 Interesting Data Sets](http://rs.io/100-interesting-data-sets-for-statistics/) |
| 309 | + |
| 310 | +[Kaggle](https://www.kaggle.com/) |
| 311 | + |
| 312 | +[Hacker News](https://news.ycombinator.com/news) |
| 313 | + |
| 314 | +[Hacker News Story Classifier](https://github.com/joelgrus/hackernews) |
| 315 | + |
| 316 | +[Seattle Real-Time 911](http://www2.seattle.gov/fire/realtime911/getRecsForDatePub.asp?action=Today) |
| 317 | + |
| 318 | +[social network analysis of fire trucks](https://github.com/joelgrus/fire) |
| 319 | + |
| 320 | +[machine learning on t-shirts](https://github.com/joelgrus/shirts) |
0 commit comments