Skip to content

Commit 937867b

Browse files
committed
add links
1 parent 0298c47 commit 937867b

File tree

1 file changed

+320
-0
lines changed

1 file changed

+320
-0
lines changed

links.md

+320
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
Links
2+
=====
3+
4+
## Preface
5+
6+
[Data Science Venn Diagram](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram)
7+
8+
## 1. Introduction
9+
10+
[OkCupid Questions](http://blog.okcupid.com/index.php/the-best-questions-for-first-dates/)
11+
12+
[Facebook on coordinated migration](https://www.facebook.com/notes/facebook-data-science/coordinated-migration/10151930946453859)
13+
14+
[Facebook on NFL fandom](https://www.facebook.com/notes/facebook-data-science/nfl-fans-on-facebook/10151298370823859)
15+
16+
[Target's predictive modeling](http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html)
17+
18+
[Making government more effective](http://www.marketplace.org/topics/tech/beyond-ad-clicks-using-big-data-social-good)
19+
20+
[Helping homelessness](http://dssg.io/2014/08/20/paths-homelessness.html)
21+
22+
[Improving public health](https://plus.google.com/communities/109572103057302114737)
23+
24+
## 2. A Crash Course in Python
25+
26+
http://python.org
27+
28+
[Anaconda](https://store.continuum.io/cshop/anaconda/)
29+
30+
[pip](https://pypi.python.org/pypi/pip)
31+
32+
[IPython](http://ipython.org/)
33+
34+
[the Zen of Python](http://legacy.python.org/dev/peps/pep-0020/)
35+
36+
[official Python tutorial](https://docs.python.org/2/tutorial/)
37+
38+
[official IPython tutorial](http://ipython.org/ipython-doc/2/interactive/tutorial.html)
39+
40+
[IPython videos and presentations](http://ipython.org/videos.html)
41+
42+
[Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do)
43+
44+
## 3. Visualizing Data
45+
46+
[matplotlib](http://matplotlib.org/)
47+
48+
[seaborn](http://www.stanford.edu/~mwaskom/software/seaborn/)
49+
50+
[D3.js](http://d3js.org/)
51+
52+
[Bokeh](http://bokeh.pydata.org/)
53+
54+
[ggplot](https://pypi.python.org/pypi/ggplot)
55+
56+
## 4. Linear Algebra
57+
58+
[Linear Algebra, from UC Davis](https://www.math.ucdavis.edu/~linear/)
59+
60+
[Linear Algebra, from Saint Michael's College](http://joshua.smcvt.edu/linearalgebra/)
61+
62+
[Linear Algebra Done Wrong](http://www.math.brown.edu/~treil/papers/LADW/LADW.html)
63+
64+
[SciPy linear algebra module](http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html)
65+
66+
## 5. Statistics
67+
68+
[Non-obvious tricks for computing medians](http://en.wikipedia.org/wiki/Quickselect)
69+
70+
[Almost "average squared deviation from the mean"](http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation)
71+
72+
["angrily accused of experimenting on your users"](http://www.nytimes.com/2014/06/30/technology/facebook-tinkers-with-users-emotions-in-news-feed-experiment-stirring-outcry.html)
73+
74+
[SciPy stats](http://docs.scipy.org/doc/scipy/reference/stats.html)
75+
76+
[pandas](http://pandas.pydata.org/)
77+
78+
[StatsModels](http://statsmodels.sourceforge.net/)
79+
80+
[OpenIntro Statistics](https://www.openintro.org/stat/textbook.php)
81+
82+
[OpenStax Introductory Statistics](http://openstaxcollege.org/textbooks/introductory-statistics)
83+
84+
## 6. Probability
85+
86+
[the Monty Hall Problem](http://en.wikipedia.org/wiki/Monty_Hall_problem)
87+
88+
[error function](http://en.wikipedia.org/wiki/Error_function)
89+
90+
[binary search](http://en.wikipedia.org/wiki/Binary_search_algorithm)
91+
92+
[SciPy stats](http://docs.scipy.org/doc/scipy/reference/stats.html)
93+
94+
[Introduction to Probability](http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf)
95+
96+
## 7. Hypothesis and Inference
97+
98+
[continuity correction](http://en.wikipedia.org/wiki/Continuity_correction)
99+
100+
[P-hacking](http://www.nature.com/news/scientific-method-statistical-errors-1.14700)
101+
102+
["The Earth Is Round (p < .05)"](http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf)
103+
104+
[conjugate priors](http://www.johndcook.com/blog/conjugate_prior_diagram/)
105+
106+
[Coursera -- Data Analysis and Statistical Inference](https://www.coursera.org/course/statistics)
107+
108+
## 8. Gradient Descent
109+
110+
[Active Calculus](http://gvsu.edu/s/xr/)
111+
112+
[scikit-learn stochastic gradient descent](http://scikit-learn.org/stable/modules/sgd.html)
113+
114+
## 9. Getting Data
115+
116+
[running Python scripts without the Python command](http://stackoverflow.com/questions/15587877/run-a-python-script-in-terminal-without-the-python-command)
117+
118+
[opening csv files in binary mode](http://stackoverflow.com/questions/4249185/using-python-to-append-csv-files)
119+
120+
[BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/)
121+
122+
[requests](http://docs.python-requests.org/en/latest/)
123+
124+
[GitHub API](http://developer.github.com/v3/)
125+
126+
http://www.pythonapi.com/
127+
128+
http://www.pythonforbeginners.com/development/list-of-python-apis/
129+
130+
http://www.programmableweb.com/
131+
132+
[Twython](https://github.com/ryanmcgrath/twython)
133+
134+
https://apps.twitter.com/
135+
136+
[Twitter Search API](https://dev.twitter.com/docs/api/1.1/get/search/tweets)
137+
138+
[unicode](https://docs.python.org/2/howto/unicode.html)
139+
140+
[Twitter Streaming API](https://dev.twitter.com/docs/api/1.1/get/statuses/sample)
141+
142+
[scrapy](http://scrapy.org/)
143+
144+
[pandas](http://pandas.pydata.org/)
145+
146+
## 10. Working With Data
147+
148+
[pandas](http://pandas.pydata.org/)
149+
150+
[Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do)
151+
152+
[scikit-learn matrix decomposition](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)
153+
154+
## 11. Machine Learning
155+
156+
[prevalence of "Luke"](http://www.babycenter.com/babyNameAllPops.htm?babyNameId=2918)
157+
158+
[prevalence of leukemia](http://seer.cancer.gov/statfacts/html/leuks.html)
159+
160+
[harmonic mean](http://en.wikipedia.org/wiki/Harmonic_mean)
161+
162+
[Coursera -- Machine Learning](https://www.coursera.org/course/ml)
163+
164+
[Caltech -- Machine Learning](https://work.caltech.edu/telecourse.html)
165+
166+
[The Elements of Statistical Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/)
167+
168+
## 12. Nearest Neighbors
169+
170+
[the length represented by a degree of longitude](http://en.wikipedia.org/wiki/Longitude#Length_of_a_degree_of_longitude)
171+
172+
[scikit-learn nearest neighbor models](http://scikit-learn.org/stable/modules/neighbors.html)
173+
174+
## 13. Naive Bayes
175+
176+
[SpamAssassin public corpus](https://spamassassin.apache.org/publiccorpus/)
177+
178+
[7-Zip](http://www.7-zip.org/)
179+
180+
[the Porter stemmer](http://tartarus.org/martin/PorterStemmer/)
181+
182+
["A Plan for Spam"](http://www.paulgraham.com/spam.html)
183+
184+
["Better Bayesian Filtering"](http://www.paulgraham.com/better.html)
185+
186+
[scikit-learn Naive Bayes](http://scikit-learn.org/stable/modules/naive_bayes.html)
187+
188+
## 14. Simple Linear Regression
189+
190+
## 15. Multiple Regression
191+
192+
[scikit-learn linear model](http://scikit-learn.org/stable/modules/linear_model.html)
193+
194+
[StatsModels](http://statsmodels.sourceforge.net/)
195+
196+
## 16. Logistic Regression
197+
198+
[scikit-learn logistic regression](http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)
199+
200+
[scikit-learn support vector machines](http://scikit-learn.org/stable/modules/svm.html)
201+
202+
[libsvm](http://www.csie.ntu.edu.tw/~cjlin/libsvm/)
203+
204+
## 17. Decision Trees
205+
206+
[Twenty Questions](http://en.wikipedia.org/wiki/Twenty_Questions)
207+
208+
[scikit-learn decision trees](http://scikit-learn.org/stable/modules/tree.html)
209+
210+
[scikit-learn ensembles](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble)
211+
212+
http://en.wikipedia.org/wiki/Decision_tree_learning
213+
214+
## 18. Neural Networks
215+
216+
[Coursera -- Neural Networks for Machine Learning](https://www.coursera.org/course/neuralnets)
217+
218+
[Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)
219+
220+
[pybrain](http://pybrain.org/)
221+
222+
## 19. Clustering
223+
224+
[RGB color model](http://en.wikipedia.org/wiki/RGB_color_model)
225+
226+
[SciPy](http://www.scipy.org/)
227+
228+
## 20. Natural Language Processing
229+
230+
["What is Data Science"](http://radar.oreilly.com/2010/06/what-is-data-science.html)
231+
232+
[Natural Language Toolkit](http://www.nltk.org/)
233+
234+
[NLTK book](http://www.nltk.org/book/)
235+
236+
[gensim](http://radimrehurek.com/gensim/)
237+
238+
## 21. Network Analysis
239+
240+
[Centrality](http://en.wikipedia.org/wiki/Centrality)
241+
242+
[NetworkX](http://networkx.github.io/)
243+
244+
[Gephi](http://gephi.github.io/)
245+
246+
## 22. Recommender Systems
247+
248+
[Crab](http://muricoca.github.io/crab/)
249+
250+
[Graphlab recommender toolkit](http://graphlab.com/products/create/docs/graphlab.toolkits.recommender.html)
251+
252+
[Netflix prize](http://www.netflixprize.com/)
253+
254+
## 23. Databases
255+
256+
[SQLite](http://www.sqlite.org/)
257+
258+
[MySQL](http://www.mysql.com/)
259+
260+
[PostgreSQL](http://www.postgresql.org/)
261+
262+
[MongoDB](http://www.mongodb.org/)
263+
264+
[NoSQL](http://en.wikipedia.org/wiki/NoSQL)
265+
266+
## 24. Map-Reduce
267+
268+
[Hadoop](http://hadoop.apache.org/)
269+
270+
[Elastic MapReduce](http://aws.amazon.com/elasticmapreduce/)
271+
272+
[mrjob](https://github.com/Yelp/mrjob)
273+
274+
[Spark](http://spark.apache.org/)
275+
276+
[Storm](http://storm.incubator.apache.org/)
277+
278+
## 25. Go Forth And Do Data Science
279+
280+
[IPython](http://ipython.org/)
281+
282+
[NumPy](http://www.numpy.org/)
283+
284+
[pandas](http://pandas.pydata.org/)
285+
286+
[scikit-learn](http://scikit-learn.org/)
287+
288+
[many, many scikit-learn examples](http://scikit-learn.org/stable/auto_examples/)
289+
290+
[matplotlib examples](http://matplotlib.org/examples/)
291+
292+
[matplotlib gallery](http://matplotlib.org/gallery.html)
293+
294+
[seaborn](http://web.stanford.edu/~mwaskom/software/seaborn/)
295+
296+
[D3.js](http://d3js.org/)
297+
298+
[D3 gallery](https://github.com/mbostock/d3/wiki/Gallery)
299+
300+
[Bokeh](http://bokeh.pydata.org/)
301+
302+
[Data.gov](http://www.data.gov/)
303+
304+
[r/datasets](http://www.reddit.com/r/datasets) and [r/data](http://www.reddit.com/r/data)
305+
306+
[Amazon public data sets](http://aws.amazon.com/public-data-sets/)
307+
308+
[100 Interesting Data Sets](http://rs.io/100-interesting-data-sets-for-statistics/)
309+
310+
[Kaggle](https://www.kaggle.com/)
311+
312+
[Hacker News](https://news.ycombinator.com/news)
313+
314+
[Hacker News Story Classifier](https://github.com/joelgrus/hackernews)
315+
316+
[Seattle Real-Time 911](http://www2.seattle.gov/fire/realtime911/getRecsForDatePub.asp?action=Today)
317+
318+
[social network analysis of fire trucks](https://github.com/joelgrus/fire)
319+
320+
[machine learning on t-shirts](https://github.com/joelgrus/shirts)

0 commit comments

Comments
 (0)