2 Statisticakal MT Prelimiaries

First, before talking about any specific models, this chapter describes the overall framework of (SMT) more formally.

First, we define our task of machine translation as translating a source sentence F = f₁, …, f_J = f₁^|F| into a target sentence E = e₁, …, e_I = e₁^|E|.¹ Thus, any type of translation system can be defined as a function
$$\hat{E} = \text{mt}(F),$$
which returns a translation hypothesis $\hat{E}$ given a source sentence F as input.

systems are systems that perform translation by creating a probabilistic model for the probability of E given F, P(E ∣ F; θ), and finding the target sentence that maximizes this probability:
$$\hat{E} = \argmax{E} P(E \mid F;\theta),$$
where θ are the parameters of the model specifying the probability distribution. The parameters θ are learned from data consisting of aligned sentences in the source and target languages, which are called in technical terminology.² Within this framework, there are three major problems that we need to handle appropriately in order to create a good translation system:

Modeling:: First, we need to decide what our model P(E ∣ F; θ) will look like. What parameters will it have, and how will the parameters specify a probability distribution?
Learning:: Next, we need a method to learn appropriate values for parameters θ from training data.
Search:: Finally, we need to solve the problem of finding the most probable sentence (solving “argmax”). This process of searching for the best hypothesis and is often called .³

The remainder of the material here will focus on solving these problems.

Note for the time being, we are assuming that we translate each sentence independently, although we will discuss document-level translation in .↩
Details about data can be found in .↩
This is based on the famous quote from Warren Weaver, likening the process of machine translation to decoding an encoded cipher.↩