创建自定义Estimator

本文介绍自定义Estimator。特别是，本文档演示如何创建一个自定义的Estimator，以模拟预制的Estimator DNNClassifier在解决鸢尾花问题时的行为。有关鸢尾花问题的详细信息，请参见预置的Estimator一章。

要下载并访问示例代码，请调用以下两个命令：

git clone https://github.com/tensorflow/models/
cd models/samples/core/get_started

在本文中，我们将看一下custom_estimator.py。你可以用下面的命令运行它：

python custom_estimator.py

如果你等得不耐烦，可以比较和对比custom_estimator.py和premade_estimator.py。（它在同一个目录中）。

预制和自定义的比较

如下图所示，预制的估算器是tf.estimator.Estimator基类的子类，而自定义Estimator是tf.estimator.Estimator的一个实例：

Premade estimators are sub-classes of `Estimator`. Custom Estimators are usually (direct) instances of `Estimator`

预制Estimator和自定义Estimator都是Estimator。

预制的Estimator已经完全制作好。有时候，你需要更多地控制Estimator的行为。这是自定义Estimator的用武之地。你可以创建一个自定义的Estimator来完成任何事情。如果你想以某种不寻常的方式连接隐藏层，请编写一个自定义的Estimator。如果想为你的模型计算唯一的指标，可以编写一个自定义Estimator。基本上，如果你想为你的特定问题优化Estimator，就编写一个自定义的Estimator。

模型函数（或model_fn）实现ML算法。使用预制的Estimator和自定义Estimator的唯一区别是：

预制Estimator已经为你编写好模型函数。
使用自定义Estimator，你必须自己编写模型函数。

你的模型函数可以实现各种算法、定义各种隐藏层和指标。与输入函数一样，所有模型函数都必须接受一组标准输入参数并返回一组标准输出值。就像输入函数可以利用数据集API一样，模型函数可以利用Layers API和Metrics API。

让我们看看如何用自定义Estimator解决鸢尾花问题。提示 — 下面是我们试图模仿的鸢尾花模型的组织结构：

A diagram of the network architecture: Inputs, 2 hidden layers, and outputs

我们的鸢尾花实现包含四个特征、两个隐藏层和一个logits输出层。

编写一个输入函数

我们的自定义Estimator实现使用与预制Estimator实现相同的输入函数，来自iris_data.py。即：

def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)

    # Return the read end of the pipeline.
    return dataset.make_one_shot_iterator().get_next()

这个输入函数建立了一个输入管道，产生每个批次的(features, labels)对，其中features是特征字典。

创建特征列

正如在预制Estimators和特征列章节中详细描述的那样，你必须定义模型的特征列以指定模型应该如何使用每个特征。无论是使用预制的Estimator还是自定义的Estimator，你都可以用相同的方式定义特征列。

以下代码为每个输入特征创建一个简单的numeric_column，指示输入特征的值应直接用作模型的输入：

# 特征列描述如何使用输入。
my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

编写一个模型函数

我们将使用的模型函数具有以下定义：

def my_model_fn(
   features, # This is batch_features from input_fn
   labels,   # This is batch_labels from input_fn
   mode,     # An instance of tf.estimator.ModeKeys
   params):  # Additional configuration

前两个参数是从输入函数返回的特征和标签的批次；即features和labels是模型将使用的数据的句柄。 mode参数指示调用者正在请求的是训练、预测还是评估。

调用者可以将params传递给Estimator的构造函数。任何传递给构造函数的params都会传递给model_fn。在custom_estimator.py中，以下几行创建一个Estimator并设置参数来配置模型。此配置步骤与我们在Estimator入门中配置tf.estimator.DNNClassifier类似。

classifier = tf.estimator.Estimator(
    model_fn=my_model,
    params={
        'feature_columns': my_feature_columns,
        # Two hidden layers of 10 nodes each.
        'hidden_units': [10, 10],
        # The model must choose between 3 classes.
        'n_classes': 3,
    })

要实现一个标准的模型函数，你必须做到以下几点：

定义模型。
为每个三种不同模式指定额外的计算：
- 预测
- 评估
- 训练

定义模型

基本的深度神经网络模型必须定义以下三个部分：

一个输入图层
一个或多个隐藏层
一个输出图层

定义输入层

model_fn的第一行调用tf.feature_column.input_layer来将特征字典和feature_columns转换为模型的输入，如下所示：

    # Use `input_layer` to apply the feature columns.
    net = tf.feature_column.input_layer(features, params['feature_columns'])

上一行应用特征列定义的转换，从而创建模型的输入层。

A diagram of the input layer, in this case a 1:1 mapping from raw-inputs to features.

隐藏层

如果你正在创建一个深度神经网络，你必须定义一个或多个隐藏层。 Layers API提供了一组丰富的函数来定义所有类型的隐藏层，包括卷积、池化和丢弃层。对于鸢尾花，我们只需调用tf.layers.dense来创建隐藏层，其维度由params ['hidden_layers']定义。在密集层中，每个节点都连接到前一层中的每个节点。以下是相关的代码：

    # Build the hidden layers, sized according to the 'hidden_units' param.
    for units in params['hidden_units']:
        net = tf.layers.dense(net, units=units, activation=tf.nn.relu)

units参数定义给定图层输出神经元的数量。
在这个例子中，activation参数定义激活函数 — Relu。

这里的变量net表示网络的当前顶层。在第一次迭代中，net表示输入层。在每个循环迭代中，tf.layers.dense使用变量net创建一个新层，它将前一层的输出作为其输入。

创建两个隐藏层后，我们的网络如下所示。为了简单起见，该图不显示每个图层中的所有单元。

The input layer with two hidden layers added.

请注意，tf.layers.dense提供了许多附加功能，其中包括设置众多正则化参数的功能。不过，为了简单起见，我们将简单地接受其他参数的默认值。

输出层

我们将通过再次调用tf.layers.dense来定义输出层，这次没有激活函数：

    # Compute logits (1 per class).
    logits = tf.layers.dense(net, params['n_classes'], activation=None)

这里，net表示最终的隐藏层。因此，全套图层现在连接如下：

A logit output layer connected to the top hidden layer

最后的隐藏层输入到输出层。

定义输出层时，units参数指定输出的数量。因此，通过将units设置为params['n_classes']，该模型为每个类生成一个输出值。输出向量的每个元素将包含计算鸢尾花：Setosa、Versicolor或Virginica的相关类别的分数或“logit”。

之后，这些logits将通过tf.nn.softmax函数转换为概率。

执行训练、评估和预测

创建模型函数的最后一步是编写实现预测、评估和训练的分支代码。

每次调用Estimator的train、evaluate或predict方法时，都会调用模型函数。回想一下模型函数的定义如下所示：

def my_model_fn(
   features, # This is batch_features from input_fn
   labels,   # This is batch_labels from input_fn
   mode,     # An instance of tf.estimator.ModeKeys, see below
   params):  # Additional configuration

关注第三个参数mode。如下表所示，当调用train、evaluate或predict时，Estimator框架会调用你的模型函数，并将mode参数设置为如下：

Estimator方法	Estimator模式
`train()`	`ModeKeys.TRAIN`
`evaluate()`	`ModeKeys.EVAL`
`predict()`	`ModeKeys.PREDICT`

例如，假设你实例化一个自定义的Estimator来生成一个名为classifier的对象。然后，你可以向下面这样调用：

classifier = tf.estimator.Estimator(...)
classifier.train(input_fn=lambda: my_input_fn(FILE_TRAIN, True, 500))

然后，Estimator框架将模式设置为ModeKeys.TRAIN，调用你的模型函数。

你的模型函数必须提供代码来处理所有三种模式值。对于每个模式值，你的代码必须返回tf.estimator.EstimatorSpec的实例，其中包含调用者所需的信息。我们来看看每种模式。

预测

当Estimator的predict方法被调用时，model_fn收到mode = ModeKeys.PREDICT。在这种情况下，模型函数必须返回包含预测的tf.estimator.EstimatorSpec。

在进行预测之前，模型必须经过训练。训练好的模型存储在实例化Estimator时建立的model_dir目录中的磁盘上。

为此模型生成预测的代码如下所示：

# Compute predictions.
predicted_classes = tf.argmax(logits, 1)
if mode == tf.estimator.ModeKeys.PREDICT:
    predictions = {
        'class_ids': predicted_classes[:, tf.newaxis],
        'probabilities': tf.nn.softmax(logits),
        'logits': logits,
    }
    return tf.estimator.EstimatorSpec(mode, predictions=predictions)

prediction字典包含预测模式下运行时模型返回的所有内容。

Additional outputs added to the output layer.

predictions包含以下三个键/值对：

class_ids包含表示模型对该样本最可能的物种的预测的类id（0、1或2）。
probabilities包含三个概率（在此例中为0.02、0.95和0.03）
logit包含原始logit值（在此示例中为-1.3、2.6和-0.9）

我们通过tf.estimator.EstimatorSpec的predictions参数将该字典返回给调用者。 Estimator的predict方法将生成这些字典。

计算损失

对于训练和评估，我们需要计算模型的损失。这是将被优化的目标。

我们可以通过调用tf.losses.sparse_softmax_cross_entropy来计算损失。当正确分类（label的索引处）的概率接近1.0时，该函数的返回值最小并接近0。这个返回的损失值随着正确分类的概率降低逐渐增大。

该函数返回整个批次的平均值。

# Compute loss.
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

评估

当Estimator的evaluate方法被调用时，model_fn收到mode = ModeKeys.EVAL。在这种情况下，模型函数必须返回包含模型的损失和可选的一个或多个指标的tf.estimator.EstimatorSpec。

虽然返回的指标是可选的，但大多数自定义Estimator至少会返回一个指标。 TensorFlow提供一个Metrics模块tf.metrics来计算常用指标。为了简洁起见，我们只返回准确率。 tf.metrics.accuracy函数将我们的预测与真实值进行比较，即根据输入函数提供的标签进行比较。 tf.metrics.accuracy函数要求标签和预测具有相同的形状。以下是对tf.metrics.accuracy的调用：

# Compute evaluation metrics.
accuracy = tf.metrics.accuracy(labels=labels,
                               predictions=predicted_classes,
                               name='acc_op')

返回用于评估的EstimatorSpec通常包含以下信息：

loss，这是模型的损失
eval_metric_ops，指标的一个可选字典。

所以，我们将创建一个包含我们唯一指标的字典。如果我们计算了其他指标，我们将它们作为附加键/值对添加到同一字典中。然后，我们将传递该字典到tf.estimator.EstimatorSpec的eval_metric_ops参数中。代码如下：

metrics = {'accuracy': accuracy}
tf.summary.scalar('accuracy', accuracy[1])

if mode == tf.estimator.ModeKeys.EVAL:
    return tf.estimator.EstimatorSpec(
        mode, loss=loss, eval_metric_ops=metrics)

tf.summary.scalar让TensorBoard可以在TRAIN和EVAL模式下获取准确率。（稍后更多介绍）。

训练

当调用Estimator的train方法时，model_fn以mode = ModeKeys.TRAIN调用。此时，模型函数必须返回包含损失和训练操作的EstimatorSpec。

构建训练操作需要优化器。我们将使用tf.train.AdagradOptimizer，因为我们模仿的DNNClassifier默认情况下它也使用Adagrad。 tf.train包提供了许多其他优化器 — 随时可以试用它们。

以下是构建优化器的代码：

optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)

接下来，在前面计算的损失上使用优化器的minimize方法，我们开始构建训练操作。

minimize方法也需要一个global_step参数。 TensorFlow使用此参数来计算已经处理过的训练步数（用于知道何时结束训练的运行）。此外，global_step对于TensorBoard 图形的正常工作至关重要。调用tf.train.get_global_step并将结果传递给minimize的global_step参数即可。

以下是训练模型的代码：

train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())

训练返回的EstimatorSpec必须设置以下字段：

loss，包含损失函数的值。
train_op，它执行训练步骤。

以下是我们调用EstimatorSpec的代码：

return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

模型函数现已完成。

自定义的Estimator

通过Estimator基类实例化自定义Estimator，如下所示：

    # Build 2 hidden layer DNN with 10, 10 units respectively.
    classifier = tf.estimator.Estimator(
        model_fn=my_model,
        params={
            'feature_columns': my_feature_columns,
            # Two hidden layers of 10 nodes each.
            'hidden_units': [10, 10],
            # The model must choose between 3 classes.
            'n_classes': 3,
        })

这里params字典的作用与DNNClassifier的关键字参数相同。也就是说，params字典可让你在不修改model_fn中的代码的情况下配置你的Estimator。

使用我们的Estimator进行训练、评估和产生预测的剩余代码与预置的Estimators章节中的相同。例如，以下行将训练模型：

# Train the Model.
classifier.train(
    input_fn=lambda:iris_data.train_input_fn(train_x, train_y, args.batch_size),
    steps=args.train_steps)

TensorBoard

你可以在TensorBoard上查看自定义Estimator的训练结果。要查看此报告，请从命令行启动TensorBoard，如下所示：

# Replace PATH with the actual path passed as model_dir
tensorboard --logdir=PATH

然后，通过浏览至http://localhost:6006打开TensorBoard

所有预制的Estimator都会自动将大量信息记录到TensorBoard上。然而，对于自定义的Estimator，TensorBoard只提供一个默认日志（损失图）加上你明确告诉TensorBoard记录的信息。对于刚刚创建的自定义Estimator，TensorBoard会生成以下内容：

Accuracy, 'scalar' graph from tensorboard

loss 'scalar' graph from tensorboard

steps/second 'scalar' graph from tensorboard

TensorBoard显示三张图。

简而言之，这三张图告诉你：

global_step/sec：性能指标，显示我们在模型训练时每秒处理多少批次（逐渐更新）。
loss：报告的损失。
准确率：准确率由以下两行记录：
- eval_metric_ops={'my_accuracy': accuracy}), during evaluation.
- tf.summary.scalar('accuracy', accuracy[1]), during training.

这些tensorboard图是将global_step传递给优化器的minimize方法非常重要的主要原因之一。没有它，模型不能记录这些图的x坐标。

请注意my_accuracy和loss图中的以下内容：

橙色线代表训练。
蓝点表示评估。

在训练期间，随着批次的处理，摘要（橙线）会周期性记录，这就是为什么它会成为跨越x轴范围的图形。

相比之下，对于evaluate的每个调用，评估在图上只产生一个点。这个点包含整个评估调用的平均值。这在图上没有宽度，因为它在特定训练步骤（从单个检查点）完全根据模型状态进行评估。

如下图所示，你可以使用左侧的控件查看并选择性地禁用/启用报告。

Check-boxes allowing the user to select which runs are shown.

启用或停用报告。

总结

尽管预先制作的估算器可以成为快速创建新模型的有效方法，但您通常需要自定义估算器提供的额外灵活性。幸运的是，预制和定制估算器遵循相同的编程模型。唯一的实际区别是你必须为自定义估算器编写一个模型函数；其他一切都是一样的。

有关更多详情，请务必查看：

MNIST的官方TensorFlow实现，它使用自定义估算器。
TensorFlow 官方模型库，其中包含使用自定义估算器的更多策划样本。
这个介绍TensorBoard的 TensorBoard视频。
低级别简介，演示了如何直接使用TensorFlow的低级别API进行实验，使调试变得更加简单。