8As a binary classification task instead of the 30,000-way classification task in MLM, the discriminator’s loss was typically much lower than the generator’s.