scalability to larger outputs + post training

Very interesting work! How would this method scale with very large datasets with 1000+ classes? Is a dense parameterization feasible? And what about higher dimensional outputs? 

With respect to post-training last layer posterior: 

> Post-training. As an alternative to jointly optimizing the variational last layer with the features, a two step procedure can be used. In this step, the feature weights θ are trained by any arbitrary training procedure (e.g. standard neural network training) and the last layer (and Σ) are trained with frozen features. The training objective is identical to (16), although θ∗ is trained in the initial pre-training step and η∗, Σ∗are trained via (16)

Can you please clarify what "the last layer (and Σ) are trained with frozen features" means? Does it mean the entire network backbone is frozen, and the last layer is retrained from scratch to learn both the last layer and its covariance? 

Thank you for clarifying!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scalability to larger outputs + post training #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scalability to larger outputs + post training #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions