AI Rewrite

## Whole-Curriculum

- [x] add notes pages
- [ ] vocab to add (attribute, document)? 
- [ ] translate vocabulary
- [x] rewrite/reassign learning objectives
- [ ] clean up AI library, consider what should move to core library
- [ ] Big Ideas that should weave throughout the lessons: (organize data, generalize to a model, inference)

## Introduction

### Introduction to AI

- [x] More exciting intro, using real examples (teachable machines, spotify, etc)
- [x] AI Effect
- [x] Use teachable machines, spotify, etc for engaging hook
- [x] Remove the more technical stuff from the workbook pages, and just have students play with spotify, google image search, self-driving car game, etc and have them think about how they might work. Only use the animals table to introduce pyret.
- [x] emphasize that even though it's "just an algorithm", that it represents a HUGE and USEFUL step forward. Make it clear that it's still awesome, but that it is NOT thinking
- [x] LOs: (1) the line between "thinking" and "just following an algorithm" isn't clear; (2) AI is a misnomer - it's really ML, which is a set of algorithms; (3) there are ML algorithms for different tasks (prediction, similarity, language)
- [x] Make sure important parts of the old one are preserved
- [x] Language v. prediction wording is unclear: language is too verbal-centric. Use "Generation" instead!

### Simple Datatypes

- [x] Add to pathway

### Contracts for Tables

- [x] Add to pathway

### Contracts for Data Visualizations

- [x] Add to pathway

### Data Driven Algorithms

- [x] Rewrite library to work with insertions and deletions, and be fast enough to handle extremely large dictionaries with high performance (pre-computed bk-trees!)
- [x] use dictionary results as an excuse to practice table manipulation, and introduce row accessors
- [x] the API should randomize results, to motivate students to sort and take first-n-rows
- [x] first-spell-checker -- maybe extra lines for the non-exhaustive steps? Or at least ask them whether there are others?
- [x] Make sure important parts of the old one are preserved

## Prediction 

### Dot Plots

- [x] Add to pathway

### Building Decision Trees

- [x] audit to make sure we're using dot-plot vocabulary and language
- [x] LOs: (1) identifying which of a set of categorical answers is called classification; (2) classification involves if-then-else questions, which can be visualized as a decision tree; (3) efficient classification requires a decision tree with nodes that split the data into neat, balanced groups; (3) choosing these nodes requires an understanding of the shape of the data;
- [x] Use "lossy" in the context of a DT -- the algorithm throws away data that is "less important" to the overall structure
- [x] Make sure, befote students do "the algorithm", that we are explicit about classifiers being wrong!
- [x] From Joe: Need to make it clear that `tail-classifier` is just an example, and will make plenty of mistakes!
- [x] Make sure important parts of the old DT lesson preserved

### Evaluating Decision Trees

- [x] emphasize value of domain knowledge
- [x] LOs:  (1) The quality of a decision tree can be expressed as a confusion matrix; (2) computing decision trees is an example of a supervised ML algorithm; (3) "over-fitting" is possible when blindly using the algorithm without understanding the data, and says something about how well the training data  represents the testing population
- [x] make sure we get into ethics!
- [x] discuss inference power and sample size
- [ ] unsupervised analog, k-means, k-median, and k-mode clustering
- [x] Make sure important parts of the old DT lesson preserved
- [x] fix `genre-img` to add k-pop and reggae

### Fitting Models

- [x] Add to pathway

### Simple Regression 
- [ ] Check to see what's in the original linear-regression lesson that should be included
- [ ] meta-goal: make this an eventual replacement for existing linear-regression lesson
- [x] LOs: (1) line-of-best-fit is the result of regression - a supervised ML algorithm; (2) regression throws away some variation in the data ("lossy") that is less important to  the overall structure; (3) "slope" is the amount of variation in the output that is explained by an input;
- [x] discuss ethics

### Multiple Regression
- [x] LOs: (1) regression works in multiple dimensions
- [x] API: remove `fit-row-model`
- [x] multiple regression with cars
- [x] Add cars workbook page where students build multiple models, using different combinations of input columns. Start with prediction ("which column do you think is the most important?"), then use lr-plot to *see* it, `mr-coeffs` to get and interpret the coefficients. then `mr-code` to get the function. Then have them drive the car, see the result, and try the next model.
- [x] discuss inference power and sample size
- [ ] Reinforcement learning: stock trading, weather prediction
- [ ] PCA as unsupervised analog? 

## Measuring Similarity
- [x] LOs: (1) members of virtually any category can be represented by quantifiable attributes; (2) These attributes define dimensions in space, with members as points in "category space"; (3) Similarity between members can be defined in many ways;  
- [x] "what numbers can we compute from this data?" (Butts-out-of-seats activity, teams at posters: Write a category of thing at the top of the poster (e.g. - Mexican restaurants, cars, sci-fi); Brainstorm as many properties as you can, for which you can measure a member of that category (e.g. - % of mexicans who eat there, avg yelp rating, avg scoville scale for menu items); Choose the two most important properties, draw the 1st quadrant, and label each axis with one of them.; Plot at least 3 members on your graph (Chipotle, Qdoba, Taco Bell…))
- [x] Introduce coordinate representation
- [x] starter-file: try simple similarity, distance-similarity, then angle similarity (offer cosine-similarity for HS)
- [x] angle-difference. make a second page that has relationships other than 0, 45 and 90 and either gives ranges for them to choose from or suggest protractor use. 
- [ ] Should we re-arrange RGB to be in alphabetical order, to match the columns produced from bag of words?

### Unsupervised and Supervised Similarity
- [x] recommendation systems
- [ ] Should we re-arrange RGB to be in alphabetical order, to match the columns produced from bag of words?
- [x] LOs: (1) when points are given human-curated labels, we can measure things like affinity, classification, etc; (2) centroids can be used to group points in space, representing "liked members", "disliked members", etc; (3) similarity measures are a way for machines to take small amounts of human-tagging and generalize them
- [x] tagging, image search 
- [x] Open the spotify dataset and use it!
- [x] Switch to images: start on paper, with students finding "most similar" and "most different", then explaining their thinking
- [x] Introduce luminance, entropy, symmetry, and dominant color
- [x] starter-file: try it out!
- [x] ethics!

### Modeling Language
- [x] LOs: (1) language is messy! and, ug, other stuff...
- [x] plagiarism detection, dealing with language 
- [ ] Show histograms of the ngrams.

### Statistical Language Models
- [ ] swap out documents in soekia fairytale collection for better written more representative set - confirm sufficient overlap 
- [ ] Discuss [hallucinations](https://www.scientificamerican.com/article/chatbot-hallucinations-inevitable/)
- [ ] **Training: Lang in practice** 
- [x] n-gram generator / API?
- [ ] More activities like matching-human-judgment.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AI Rewrite #2771

Whole-Curriculum

Introduction

Introduction to AI

Simple Datatypes

Contracts for Tables

Contracts for Data Visualizations

Data Driven Algorithms

Prediction

Dot Plots

Building Decision Trees

Evaluating Decision Trees

Fitting Models

Simple Regression

Multiple Regression

Measuring Similarity

Unsupervised and Supervised Similarity

Modeling Language

Statistical Language Models

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

AI Rewrite #2771

Description

Whole-Curriculum

Introduction

Introduction to AI

Simple Datatypes

Contracts for Tables

Contracts for Data Visualizations

Data Driven Algorithms

Prediction

Dot Plots

Building Decision Trees

Evaluating Decision Trees

Fitting Models

Simple Regression

Multiple Regression

Measuring Similarity

Unsupervised and Supervised Similarity

Modeling Language

Statistical Language Models

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions