10local-p is similar to the local-m model except that we dynamically compute pt and use a truncated Gaussian distribution to modify the original alignment weights align(ht,hs) as shown in Eq. (10). By utilizing pt to derive at, we can compute backprop gradients for Wp and vp. This model is differentiable almost everywhere.