Skip to content

scalability to larger outputs + post training #13

@zabboud

Description

@zabboud

Very interesting work! How would this method scale with very large datasets with 1000+ classes? Is a dense parameterization feasible? And what about higher dimensional outputs?

With respect to post-training last layer posterior:

Post-training. As an alternative to jointly optimizing the variational last layer with the features, a two step procedure can be used. In this step, the feature weights θ are trained by any arbitrary training procedure (e.g. standard neural network training) and the last layer (and Σ) are trained with frozen features. The training objective is identical to (16), although θ∗ is trained in the initial pre-training step and η∗, Σ∗are trained via (16)

Can you please clarify what "the last layer (and Σ) are trained with frozen features" means? Does it mean the entire network backbone is frozen, and the last layer is retrained from scratch to learn both the last layer and its covariance?

Thank you for clarifying!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions