Skip to content

AI Rewrite #2771

Description

@schanzer

Whole-Curriculum

  • add notes pages
  • vocab to add (attribute, document)?
  • translate vocabulary
  • rewrite/reassign learning objectives
  • clean up AI library, consider what should move to core library
  • Big Ideas that should weave throughout the lessons: (organize data, generalize to a model, inference)

Introduction

Introduction to AI

  • More exciting intro, using real examples (teachable machines, spotify, etc)
  • AI Effect
  • Use teachable machines, spotify, etc for engaging hook
  • Remove the more technical stuff from the workbook pages, and just have students play with spotify, google image search, self-driving car game, etc and have them think about how they might work. Only use the animals table to introduce pyret.
  • emphasize that even though it's "just an algorithm", that it represents a HUGE and USEFUL step forward. Make it clear that it's still awesome, but that it is NOT thinking
  • LOs: (1) the line between "thinking" and "just following an algorithm" isn't clear; (2) AI is a misnomer - it's really ML, which is a set of algorithms; (3) there are ML algorithms for different tasks (prediction, similarity, language)
  • Make sure important parts of the old one are preserved
  • Language v. prediction wording is unclear: language is too verbal-centric. Use "Generation" instead!

Simple Datatypes

  • Add to pathway

Contracts for Tables

  • Add to pathway

Contracts for Data Visualizations

  • Add to pathway

Data Driven Algorithms

  • Rewrite library to work with insertions and deletions, and be fast enough to handle extremely large dictionaries with high performance (pre-computed bk-trees!)
  • use dictionary results as an excuse to practice table manipulation, and introduce row accessors
  • the API should randomize results, to motivate students to sort and take first-n-rows
  • first-spell-checker -- maybe extra lines for the non-exhaustive steps? Or at least ask them whether there are others?
  • Make sure important parts of the old one are preserved

Prediction

Dot Plots

  • Add to pathway

Building Decision Trees

  • audit to make sure we're using dot-plot vocabulary and language
  • LOs: (1) identifying which of a set of categorical answers is called classification; (2) classification involves if-then-else questions, which can be visualized as a decision tree; (3) efficient classification requires a decision tree with nodes that split the data into neat, balanced groups; (3) choosing these nodes requires an understanding of the shape of the data;
  • Use "lossy" in the context of a DT -- the algorithm throws away data that is "less important" to the overall structure
  • Make sure, befote students do "the algorithm", that we are explicit about classifiers being wrong!
  • From Joe: Need to make it clear that tail-classifier is just an example, and will make plenty of mistakes!
  • Make sure important parts of the old DT lesson preserved

Evaluating Decision Trees

  • emphasize value of domain knowledge
  • LOs: (1) The quality of a decision tree can be expressed as a confusion matrix; (2) computing decision trees is an example of a supervised ML algorithm; (3) "over-fitting" is possible when blindly using the algorithm without understanding the data, and says something about how well the training data represents the testing population
  • make sure we get into ethics!
  • discuss inference power and sample size
  • unsupervised analog, k-means, k-median, and k-mode clustering
  • Make sure important parts of the old DT lesson preserved
  • fix genre-img to add k-pop and reggae

Fitting Models

  • Add to pathway

Simple Regression

  • Check to see what's in the original linear-regression lesson that should be included
  • meta-goal: make this an eventual replacement for existing linear-regression lesson
  • LOs: (1) line-of-best-fit is the result of regression - a supervised ML algorithm; (2) regression throws away some variation in the data ("lossy") that is less important to the overall structure; (3) "slope" is the amount of variation in the output that is explained by an input;
  • discuss ethics

Multiple Regression

  • LOs: (1) regression works in multiple dimensions
  • API: remove fit-row-model
  • multiple regression with cars
  • Add cars workbook page where students build multiple models, using different combinations of input columns. Start with prediction ("which column do you think is the most important?"), then use lr-plot to see it, mr-coeffs to get and interpret the coefficients. then mr-code to get the function. Then have them drive the car, see the result, and try the next model.
  • discuss inference power and sample size
  • Reinforcement learning: stock trading, weather prediction
  • PCA as unsupervised analog?

Measuring Similarity

  • LOs: (1) members of virtually any category can be represented by quantifiable attributes; (2) These attributes define dimensions in space, with members as points in "category space"; (3) Similarity between members can be defined in many ways;
  • "what numbers can we compute from this data?" (Butts-out-of-seats activity, teams at posters: Write a category of thing at the top of the poster (e.g. - Mexican restaurants, cars, sci-fi); Brainstorm as many properties as you can, for which you can measure a member of that category (e.g. - % of mexicans who eat there, avg yelp rating, avg scoville scale for menu items); Choose the two most important properties, draw the 1st quadrant, and label each axis with one of them.; Plot at least 3 members on your graph (Chipotle, Qdoba, Taco Bell…))
  • Introduce coordinate representation
  • starter-file: try simple similarity, distance-similarity, then angle similarity (offer cosine-similarity for HS)
  • angle-difference. make a second page that has relationships other than 0, 45 and 90 and either gives ranges for them to choose from or suggest protractor use.
  • Should we re-arrange RGB to be in alphabetical order, to match the columns produced from bag of words?

Unsupervised and Supervised Similarity

  • recommendation systems
  • Should we re-arrange RGB to be in alphabetical order, to match the columns produced from bag of words?
  • LOs: (1) when points are given human-curated labels, we can measure things like affinity, classification, etc; (2) centroids can be used to group points in space, representing "liked members", "disliked members", etc; (3) similarity measures are a way for machines to take small amounts of human-tagging and generalize them
  • tagging, image search
  • Open the spotify dataset and use it!
  • Switch to images: start on paper, with students finding "most similar" and "most different", then explaining their thinking
  • Introduce luminance, entropy, symmetry, and dominant color
  • starter-file: try it out!
  • ethics!

Modeling Language

  • LOs: (1) language is messy! and, ug, other stuff...
  • plagiarism detection, dealing with language
  • Show histograms of the ngrams.

Statistical Language Models

  • swap out documents in soekia fairytale collection for better written more representative set - confirm sufficient overlap
  • Discuss hallucinations
  • Training: Lang in practice
  • n-gram generator / API?
  • More activities like matching-human-judgment.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions