allow user to specify which rows to subsample from #75

riastradh-probcomp · 2015-07-03T00:03:24Z

No description provided.

fsaad · 2015-08-26T13:55:59Z

I do not think the desiderata is to choose which rows to subsample from , but rather implement subsampling in its true form (separate issue?). That is, train parallel GPMs on slightly overlapping sets of the data. However figuring out how GPMs (from the same or even family) trained on different portions of the dataset (with the same schema) interact to answer BQL queries is not a straightforward problem.

gregory-marton · 2015-09-02T22:07:49Z

@fsaad I'm not sure that bagging or bootstrapping GPMs will be all that helpful. If it is, I feel like that's a longer-term project.

@riastradh-probcomp, what kinds of restrictions would you want to have?

I can think of a few sampling strategies you might want to choose from: first, random, evenly spaced... or do you mean specify a test on rows as to whether they should be eligible for sampling?
For the latter, I would lean towards asking them to do it in preprocessing, e.g. saving a temp table, rather than try to come up with a language.

riastradh-probcomp · 2015-09-02T22:14:39Z

I don't remember what I was thinking when I made this, other than that I don't think I had in mind any particular mechanism.

Whatever I meant, pseudorandom subsampling is doubtless a more immediately fruitful, and perhaps largely sufficient, strategy.

gregory-marton · 2015-11-17T21:39:23Z

If we're already using a subsample, and someone wants to do a query that focuses in on a particular sub-population, that might be a good time to re-subsample from the full population, selecting for that query. That might want to go into new generators (perhaps with a new label? #313) and do some new analysis.

riastradh-probcomp added this to the nice to have / someday milestone Jul 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow user to specify which rows to subsample from #75

allow user to specify which rows to subsample from #75

riastradh-probcomp commented Jul 3, 2015

fsaad commented Aug 26, 2015

gregory-marton commented Sep 2, 2015

riastradh-probcomp commented Sep 2, 2015

gregory-marton commented Nov 17, 2015

allow user to specify which rows to subsample from #75

allow user to specify which rows to subsample from #75

Comments

riastradh-probcomp commented Jul 3, 2015

fsaad commented Aug 26, 2015

gregory-marton commented Sep 2, 2015

riastradh-probcomp commented Sep 2, 2015

gregory-marton commented Nov 17, 2015