5Note that since 47 and 11 are coprime, this gives 47 11 = 517 different possible positional encodings. Similar to the original (non-learned) positional encodings from Vaswani et al. (2017), the rationale behind this choice of positional encoding is to allow the model to generalize to unseen sequence lengths.