arxiv.org/abs/1909.11942 ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To arxiv.org ALBERT에 대해 갑자기 궁금하여 빠르게 훑어 보았다..