PH#15: Big Transfer (BiT): General Visual Representation Learning

The paper that showed us how to pre-train with extremely larger datasets and get improvements on downstream tasks.

Apr 01, 2022

Haiku:

Pick a huge model (ResNet152x4),
Pre-trained on huge dataset (JFT-300M),
But with new heuristics.

Take the pre-trained large model,
Fine-tune on a smaller dataset,
= SOTA!

Big datasets?
Don’t use batch norm,
use group norm and weight std.

Big datasets?
Don’t use MixUp
Use it for smaller datasets only.

Paper Haiku

Discussion about this post