图结构

如果你不知道底层发生了哪些事情,那么调试或分析用Theano编写的代码并不简单。本章旨在向你介绍Theano的内部运作所需的最低限度。

The first step in writing Theano code is to write down all mathematical relations using symbolic placeholders (variables). 当写下这些表达式时,你将使用+-**sum()tanh()等运算。All these are represented internally as ops. op表示一种基于某种类型的输入而产生某种类型的输出的计算。在大多数编程语言中,你可以看到它作为函数定义而出现。

Theano将符号的数学计算表示为图形。这些图由互连的ApplyVariableOp节点组成。Apply节点表示对variables应用op区别由op表示的计算的定义和由apply节点表示的其在实际数据上的应用非常重要。Furthermore, data types are represented by Type instances. 下面给出了一小段代码及其构建的结构的图示。This should help you understand how these pieces fit together:

Code

import theano.tensor as T

x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y

Diagram

../_images/apply.png

箭头表示对所指向的Python对象的引用。The blue box is an Apply node. Red boxes are Variable nodes. Green circles are Ops. Purple boxes are Types.

当我们创建Variables,然后为它们创建Apply Ops来产生更多的变量时,我们构建了一个双分,有向,无环图。Variable指向Apply节点,该节点代表了通过owner字段生成它们的函数应用程序。These Apply nodes point in turn to their input and output Variables via their inputs and outputs fields. (Apply instances also contain a list of references to their outputs, but those pointers don’t count in this graph.)

The owner field of both x and y point to None because they are not the result of another computation. If one of them was the result of another computation, it’s owner field would point to another blue box like z does, and so on.

Note that the Apply instance’s outputs points to z, and z.owner points back to the Apply instance.

遍历图

可以使用owner字段从输出(一些计算的结果)开始遍历图,直到输入。Take for example the following code:

>>> import theano
>>> x = theano.tensor.dmatrix('x')
>>> y = x * 2.

如果你输入type(y.owner),你将得到<class 'theano.gof.graph.Apply'>,它是连接op和输入并得到输出的apply节点。现在你可以打印通过apply得到y的op的名称:

>>> y.owner.op.name
'Elemwise{mul,no_inplace}'

可见,元素乘法被用于计算yThis multiplication is done between the inputs:

>>> len(y.owner.inputs)
2
>>> y.owner.inputs[0]
x
>>> y.owner.inputs[1]
InplaceDimShuffle{x,x}.0

Note that the second input is not 2 as we would have expected. 这是因为2首先被broadcasted为与x相同形状的矩阵。这是通过使用op的DimShuffle实现的:

>>> type(y.owner.inputs[1])
<class 'theano.tensor.var.TensorVariable'>
>>> type(y.owner.inputs[1].owner)
<class 'theano.gof.graph.Apply'>
>>> y.owner.inputs[1].owner.op 
<theano.tensor.elemwise.DimShuffle object at 0x106fcaf10>
>>> y.owner.inputs[1].owner.inputs
[TensorConstant{2.0}]

从该图形结构开始,接下来将更容易理解自动微分如何进行,以及符号关系可以如何优化以获得性能或稳定性。

图结构

The following section outlines each type of structure that may be used in a Theano-built computation graph. 下面解释了这些结构:ApplyConstantOpVariableType

Apply

An Apply node is a type of internal node used to represent a computation graph in Theano. Unlike Variable nodes, Apply nodes are usually not manipulated directly by the end user. They may be accessed via a Variable’s owner field.

An Apply node is typically an instance of the Apply class. 它表示在一个或多个输入上应用Op,其中每个输入都是一个是VariableBy convention, each Op is responsible for knowing how to build an Apply node from a list of inputs. Therefore, an Apply node may be obtained from an Op and a list of inputs by calling Op.make_node(*inputs).

Comparing with the Python language, an Apply node is Theano’s version of a function call whereas an Op is Theano’s version of a function definition.

An Apply instance has three important fields:

op
An Op that determines the function/transformation being applied here.
inputs
A list of Variables that represent the arguments of the function.
outputs
A list of Variables that represent the return values of the function.

可以通过调用gof.Apply(op, inputs, outputs)来创建Apply实例。

Op

Theano中的Op定义了一种基于某些类型的输入而产生某些类型的输出的计算。It is equivalent to a function definition in most programming languages. 通过一系列输入的Variables和一个Op,你可以建立一个表示应用Op到输入的Apply节点。

It is important to understand the distinction between an Op (the definition of a function) and an Apply node (the application of a function). 如果你要使用Theano的结构解释Python语言,def f(x): ...这样的代码将为f生成一个Op,而a = f(x)g(f(4), 5)这样的代码将生成一个涉及f Op的Apply节点。

Type

Theano中的Type表示对潜在数据对象的一组约束。These constraints allow Theano to tailor C code to handle them and to statically optimize the computation graph. 例如,theano.tensor包中的irow类型对irow类型的变量可能包含的数据的给出了以下约束:

  1. 必须是numpy.ndarray的实例:isinstance(x, numpy.ndarray)
  2. 必须是32位整数的数组:str(x.dtype) == 'int32'
  3. 必须为1xN的形状:len(x.shape) == 2x.shape[0] == 1

知道了这些限制,Theano可以生成用于加法的C代码等。that declares the right data types and that contains the right number of loops over the dimensions.

Note that a Theano Type is not equivalent to a Python type or class. 事实上,在Theano中,irowdmatrix都使用numpy.ndarray作为底层的类型来计算和存储数据,虽然它们是不同的Theano类型。Indeed, the constraints set by dmatrix are:

  1. 必须是numpy.ndarray的实例:isinstance(x, numpy.ndarray)
  2. 必须是64位浮点数数组:str(x.dtype) == 'float64'
  3. 必须具有MxN的形状,对M或N没有限制:len(x.shape) == 2

These restrictions are different from those of irow which are listed above.

在某些情况下,Type可以完全对应于Python类型,例如我们将要在此定义的double类型,它对应着Python的float类型。但是,需要知道并不总是这种情况。除非另有说明,当我们说“Type”时,我们指的是Theano的Type。

Variable

Variable是你在使用Theano时使用的主要数据结构。你操作的符号输入是Variable,你从应用各种Ops到这些输入中获得的也是Variable。For example, when I type

>>> import theano
>>> x = theano.tensor.ivector()
>>> y = -x

x and y are both Variables, i.e. instances of the Variable class. The Type of both x and y is theano.tensor.ivector.

Unlike x, y is a Variable produced by a computation (in this case, it is the negation of x). y is the Variable corresponding to the output of the computation, while x is the Variable corresponding to its input. The computation itself is represented by another type of node, an Apply node, and may be accessed through y.owner.

More specifically, a Variable is a basic structure in Theano that represents a datum at a certain point in computation. It is typically an instance of the class Variable or one of its subclasses.

A Variable r contains four important fields:

type
Type定义此Variable在计算中可以保留的值的类型。
owner
它为None,或者如果Variable是输出则为一个Apply节点。
index
一个整数,使得owner.outputs[index] is r(如果owner为None,则忽略)
name
a string to use in pretty-printing and debugging.

Variable has one special subclass: Constant.

Constant

Constant是具有一个额外字段dataVariable(只能设置一次)。When used in a computation graph as the input of an Op application, it is assumed that said input will always take the value contained in the constant’s data field. Furthermore, it is assumed that the Op will not under any circumstances modify the input. This means that a constant is eligible to participate in numerous optimizations: constant inlining in C code, constant folding, etc.

A constant does not need to be specified in a function‘s list of inputs. In fact, doing so will raise an exception.

图结构扩展

When we start the compilation of a Theano function, we compute some extra information. This section describes a portion of the information that is made available. 不是一切都会描述,所以如果缺少你需要的东西请发送电子邮件到theano-dev。

图在编译开始时克隆,因此在编译期间完成的修改不会影响用户图。

每个variable接收一个名为clients的新字段。It is a list with references to every place in the graph where this variable is used. 如果它的长度为0,表示variable没有使用。Each place where it is used is described by a tuple of 2 elements. There are two types of pairs:

  • The first element is an Apply node.
  • The first element is the string “output”. It means the function outputs this variable.

In both types of pairs, the second element of the tuple is an index, such that: var.clients[*][0].inputs[index] or fgraph.outputs[index] is that variable.

>>> import theano
>>> v = theano.tensor.vector()
>>> f = theano.function([v], (v+1).sum())
>>> theano.printing.debugprint(f)
Sum{acc_dtype=float64} [id A] ''   1
 |Elemwise{add,no_inplace} [id B] ''   0
   |TensorConstant{(1,) of 1.0} [id C]
   |<TensorType(float64, vector)> [id D]
>>> # Sorted list of all nodes in the compiled graph.
>>> topo = f.maker.fgraph.toposort()
>>> topo[0].outputs[0].clients
[(Sum{acc_dtype=float64}(Elemwise{add,no_inplace}.0), 0)]
>>> topo[1].outputs[0].clients
[('output', 0)]
>>> # An internal variable
>>> var = topo[0].outputs[0]
>>> client = var.clients[0]
>>> client
(Sum{acc_dtype=float64}(Elemwise{add,no_inplace}.0), 0)
>>> type(client[0])
<class 'theano.gof.graph.Apply'>
>>> assert client[0].inputs[client[1]] is var
>>> # An output of the graph
>>> var = topo[1].outputs[0]
>>> client = var.clients[0]
>>> client
('output', 0)
>>> assert f.maker.fgraph.outputs[client[1]] is var

自动微分

有了图结构,自动计算微分很简单。tensor.grad()唯一需要做的是从输出经过所有apply节点遍历回输入(apply节点是定义图执行哪些计算的节点)。For each such apply node, its op defines how to compute the gradient of the node’s outputs with respect to its inputs. Note that if an op does not provide this information, it is assumed that the gradient is not defined. Using the chain rule these gradients can be composed in order to obtain the expression of the gradient of the graph’s output with respect to the graph’s inputs.

本教程后面有一节将更详细地探讨微分的主题。

优化

当编译一个Theano函数时,你给theano.function的实际上是一个图(从输出变量开始,你可以遍历图到输入变量)。While this graph structure shows how to compute the output from the input, it also offers the possibility to improve the way this computation is carried out. The way optimizations work in Theano is by identifying and replacing certain patterns in the graph with other specialized patterns that produce the same results but are either faster or more stable. Optimizations can also detect identical subgraphs and ensure that the same values are not computed twice or reformulate parts of the graph to a GPU specific version.

例如,Theano使用的一个(简单)优化是用x替换模式\frac{xy}{y}

Further information regarding the optimization process and the specific optimizations that are applicable is respectively available in the library and on the entrance page of the documentation.

Example

符号编程涉及范式的改变:当我们应用它时,它将变得更清楚。Consider the following example of optimization:

>>> import theano
>>> a = theano.tensor.vector("a")      # declare symbolic variable
>>> b = a + a ** 10                    # build symbolic expression
>>> f = theano.function([a], b)        # compile function
>>> print(f([0, 1, 2]))                # prints `array([0,2,1026])`
[    0.     2.  1026.]
>>> theano.printing.pydotprint(b, outfile="./pics/symbolic_graph_unopt.png", var_with_name_simple=True)  
The output file is available at ./pics/symbolic_graph_unopt.png
>>> theano.printing.pydotprint(f, outfile="./pics/symbolic_graph_opt.png", var_with_name_simple=True)  
The output file is available at ./pics/symbolic_graph_opt.png

我们使用theano.printing.pydotprint()来显示优化过的图(右),这比未优化的图(左)更紧凑。

Unoptimized graph   优化过的图
g1   g2