You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
clean up AI library, consider what should move to core library
Big Ideas that should weave throughout the lessons: (organize data, generalize to a model, inference)
Introduction
Introduction to AI
More exciting intro, using real examples (teachable machines, spotify, etc)
AI Effect
Use teachable machines, spotify, etc for engaging hook
Remove the more technical stuff from the workbook pages, and just have students play with spotify, google image search, self-driving car game, etc and have them think about how they might work. Only use the animals table to introduce pyret.
emphasize that even though it's "just an algorithm", that it represents a HUGE and USEFUL step forward. Make it clear that it's still awesome, but that it is NOT thinking
LOs: (1) the line between "thinking" and "just following an algorithm" isn't clear; (2) AI is a misnomer - it's really ML, which is a set of algorithms; (3) there are ML algorithms for different tasks (prediction, similarity, language)
Make sure important parts of the old one are preserved
Language v. prediction wording is unclear: language is too verbal-centric. Use "Generation" instead!
Simple Datatypes
Add to pathway
Contracts for Tables
Add to pathway
Contracts for Data Visualizations
Add to pathway
Data Driven Algorithms
Rewrite library to work with insertions and deletions, and be fast enough to handle extremely large dictionaries with high performance (pre-computed bk-trees!)
use dictionary results as an excuse to practice table manipulation, and introduce row accessors
the API should randomize results, to motivate students to sort and take first-n-rows
first-spell-checker -- maybe extra lines for the non-exhaustive steps? Or at least ask them whether there are others?
Make sure important parts of the old one are preserved
Prediction
Dot Plots
Add to pathway
Building Decision Trees
audit to make sure we're using dot-plot vocabulary and language
LOs: (1) identifying which of a set of categorical answers is called classification; (2) classification involves if-then-else questions, which can be visualized as a decision tree; (3) efficient classification requires a decision tree with nodes that split the data into neat, balanced groups; (3) choosing these nodes requires an understanding of the shape of the data;
Use "lossy" in the context of a DT -- the algorithm throws away data that is "less important" to the overall structure
Make sure, befote students do "the algorithm", that we are explicit about classifiers being wrong!
From Joe: Need to make it clear that tail-classifier is just an example, and will make plenty of mistakes!
Make sure important parts of the old DT lesson preserved
Evaluating Decision Trees
emphasize value of domain knowledge
LOs: (1) The quality of a decision tree can be expressed as a confusion matrix; (2) computing decision trees is an example of a supervised ML algorithm; (3) "over-fitting" is possible when blindly using the algorithm without understanding the data, and says something about how well the training data represents the testing population
make sure we get into ethics!
discuss inference power and sample size
unsupervised analog, k-means, k-median, and k-mode clustering
Make sure important parts of the old DT lesson preserved
fix genre-img to add k-pop and reggae
Fitting Models
Add to pathway
Simple Regression
Check to see what's in the original linear-regression lesson that should be included
meta-goal: make this an eventual replacement for existing linear-regression lesson
LOs: (1) line-of-best-fit is the result of regression - a supervised ML algorithm; (2) regression throws away some variation in the data ("lossy") that is less important to the overall structure; (3) "slope" is the amount of variation in the output that is explained by an input;
discuss ethics
Multiple Regression
LOs: (1) regression works in multiple dimensions
API: remove fit-row-model
multiple regression with cars
Add cars workbook page where students build multiple models, using different combinations of input columns. Start with prediction ("which column do you think is the most important?"), then use lr-plot to see it, mr-coeffs to get and interpret the coefficients. then mr-code to get the function. Then have them drive the car, see the result, and try the next model.
LOs: (1) members of virtually any category can be represented by quantifiable attributes; (2) These attributes define dimensions in space, with members as points in "category space"; (3) Similarity between members can be defined in many ways;
"what numbers can we compute from this data?" (Butts-out-of-seats activity, teams at posters: Write a category of thing at the top of the poster (e.g. - Mexican restaurants, cars, sci-fi); Brainstorm as many properties as you can, for which you can measure a member of that category (e.g. - % of mexicans who eat there, avg yelp rating, avg scoville scale for menu items); Choose the two most important properties, draw the 1st quadrant, and label each axis with one of them.; Plot at least 3 members on your graph (Chipotle, Qdoba, Taco Bell…))
Introduce coordinate representation
starter-file: try simple similarity, distance-similarity, then angle similarity (offer cosine-similarity for HS)
angle-difference. make a second page that has relationships other than 0, 45 and 90 and either gives ranges for them to choose from or suggest protractor use.
Should we re-arrange RGB to be in alphabetical order, to match the columns produced from bag of words?
Unsupervised and Supervised Similarity
recommendation systems
Should we re-arrange RGB to be in alphabetical order, to match the columns produced from bag of words?
LOs: (1) when points are given human-curated labels, we can measure things like affinity, classification, etc; (2) centroids can be used to group points in space, representing "liked members", "disliked members", etc; (3) similarity measures are a way for machines to take small amounts of human-tagging and generalize them
tagging, image search
Open the spotify dataset and use it!
Switch to images: start on paper, with students finding "most similar" and "most different", then explaining their thinking
Introduce luminance, entropy, symmetry, and dominant color
starter-file: try it out!
ethics!
Modeling Language
LOs: (1) language is messy! and, ug, other stuff...
plagiarism detection, dealing with language
Show histograms of the ngrams.
Statistical Language Models
swap out documents in soekia fairytale collection for better written more representative set - confirm sufficient overlap
Whole-Curriculum
Introduction
Introduction to AI
Simple Datatypes
Contracts for Tables
Contracts for Data Visualizations
Data Driven Algorithms
Prediction
Dot Plots
Building Decision Trees
tail-classifieris just an example, and will make plenty of mistakes!Evaluating Decision Trees
genre-imgto add k-pop and reggaeFitting Models
Simple Regression
Multiple Regression
fit-row-modelmr-coeffsto get and interpret the coefficients. thenmr-codeto get the function. Then have them drive the car, see the result, and try the next model.Measuring Similarity
Unsupervised and Supervised Similarity
Modeling Language
Statistical Language Models