图结构¶
如果你不知道底层发生了哪些事情,那么调试或分析用Theano编写的代码并不简单。本章旨在向你介绍Theano的内部运作所需的最低限度。
The first step in writing Theano code is to write down all mathematical relations using symbolic placeholders (variables). 当写下这些表达式时,你将使用+
、-
、**
、sum()
、tanh()
等运算。All these are represented internally as ops. op表示一种基于某种类型的输入而产生某种类型的输出的计算。在大多数编程语言中,你可以看到它作为函数定义而出现。
Theano将符号的数学计算表示为图形。这些图由互连的Apply、Variable和Op节点组成。Apply节点表示对variables应用op。区别由op表示的计算的定义和由apply节点表示的其在实际数据上的应用非常重要。Furthermore, data types are represented by Type instances. 下面给出了一小段代码及其构建的结构的图示。This should help you understand how these pieces fit together:
Code
import theano.tensor as T
x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y
Diagram
箭头表示对所指向的Python对象的引用。The blue box is an Apply node. Red boxes are Variable nodes. Green circles are Ops. Purple boxes are Types.
当我们创建Variables,然后为它们创建Apply Ops来产生更多的变量时,我们构建了一个双分,有向,无环图。Variable指向Apply节点,该节点代表了通过owner
字段生成它们的函数应用程序。These Apply nodes point in turn to their input and output Variables via their inputs
and outputs
fields. (Apply instances also contain a list of references to their outputs
, but those pointers don’t count in this graph.)
The owner
field of both x
and y
point to None
because they are not the result of another computation. If one of them was the result of another computation, it’s owner
field would point to another blue box like z
does, and so on.
Note that the Apply
instance’s outputs points to z
, and z.owner
points back to the Apply
instance.
遍历图¶
可以使用owner字段从输出(一些计算的结果)开始遍历图,直到输入。Take for example the following code:
>>> import theano
>>> x = theano.tensor.dmatrix('x')
>>> y = x * 2.
如果你输入type(y.owner)
,你将得到<class 'theano.gof.graph.Apply'>
,它是连接op和输入并得到输出的apply节点。现在你可以打印通过apply得到y的op的名称:
>>> y.owner.op.name
'Elemwise{mul,no_inplace}'
可见,元素乘法被用于计算y。This multiplication is done between the inputs:
>>> len(y.owner.inputs)
2
>>> y.owner.inputs[0]
x
>>> y.owner.inputs[1]
InplaceDimShuffle{x,x}.0
Note that the second input is not 2 as we would have expected. 这是因为2首先被broadcasted为与x相同形状的矩阵。这是通过使用op的DimShuffle
实现的:
>>> type(y.owner.inputs[1])
<class 'theano.tensor.var.TensorVariable'>
>>> type(y.owner.inputs[1].owner)
<class 'theano.gof.graph.Apply'>
>>> y.owner.inputs[1].owner.op
<theano.tensor.elemwise.DimShuffle object at 0x106fcaf10>
>>> y.owner.inputs[1].owner.inputs
[TensorConstant{2.0}]
从该图形结构开始,接下来将更容易理解自动微分如何进行,以及符号关系可以如何优化以获得性能或稳定性。
图结构¶
The following section outlines each type of structure that may be used in a Theano-built computation graph. 下面解释了这些结构:Apply,Constant,Op,Variable和Type 。
Apply¶
An Apply node is a type of internal node used to represent a computation graph in Theano. Unlike Variable nodes, Apply nodes are usually not manipulated directly by the end user. They may be accessed via a Variable’s owner
field.
An Apply node is typically an instance of the Apply
class. 它表示在一个或多个输入上应用Op,其中每个输入都是一个是Variable。By convention, each Op is responsible for knowing how to build an Apply node from a list of inputs. Therefore, an Apply node may be obtained from an Op and a list of inputs by calling Op.make_node(*inputs)
.
Comparing with the Python language, an Apply node is Theano’s version of a function call whereas an Op is Theano’s version of a function definition.
An Apply instance has three important fields:
- op
- An Op that determines the function/transformation being applied here.
- inputs
- A list of Variables that represent the arguments of the function.
- outputs
- A list of Variables that represent the return values of the function.
可以通过调用gof.Apply(op, inputs, outputs)
来创建Apply实例。
Op¶
Theano中的Op定义了一种基于某些类型的输入而产生某些类型的输出的计算。It is equivalent to a function definition in most programming languages. 通过一系列输入的Variables和一个Op,你可以建立一个表示应用Op到输入的Apply节点。
It is important to understand the distinction between an Op (the definition of a function) and an Apply node (the application of a function). 如果你要使用Theano的结构解释Python语言,def f(x): ...
这样的代码将为f
生成一个Op,而a = f(x)
或g(f(4), 5)
这样的代码将生成一个涉及f
Op的Apply节点。
Type¶
Theano中的Type表示对潜在数据对象的一组约束。These constraints allow Theano to tailor C code to handle them and to statically optimize the computation graph. 例如,theano.tensor
包中的irow类型对irow
类型的变量可能包含的数据的给出了以下约束:
- 必须是
numpy.ndarray
的实例:isinstance(x, numpy.ndarray)
- 必须是32位整数的数组:
str(x.dtype) == 'int32'
- 必须为1xN的形状:
len(x.shape) == 2且x.shape[0] == 1
知道了这些限制,Theano可以生成用于加法的C代码等。that declares the right data types and that contains the right number of loops over the dimensions.
Note that a Theano Type is not equivalent to a Python type or class. 事实上,在Theano中,irow和dmatrix都使用numpy.ndarray
作为底层的类型来计算和存储数据,虽然它们是不同的Theano类型。Indeed, the constraints set by dmatrix
are:
- 必须是
numpy.ndarray
的实例:isinstance(x, numpy.ndarray)
- 必须是64位浮点数数组:
str(x.dtype) == 'float64'
- 必须具有MxN的形状,对M或N没有限制:
len(x.shape) == 2
These restrictions are different from those of irow
which are listed above.
在某些情况下,Type可以完全对应于Python类型,例如我们将要在此定义的double
类型,它对应着Python的float
类型。但是,需要知道并不总是这种情况。除非另有说明,当我们说“Type”时,我们指的是Theano的Type。
Variable¶
Variable是你在使用Theano时使用的主要数据结构。你操作的符号输入是Variable,你从应用各种Ops到这些输入中获得的也是Variable。For example, when I type
>>> import theano
>>> x = theano.tensor.ivector()
>>> y = -x
x
and y
are both Variables, i.e. instances of the Variable
class. The Type of both x
and y
is theano.tensor.ivector
.
Unlike x
, y
is a Variable produced by a computation (in this case, it is the negation of x
). y
is the Variable corresponding to the output of the computation, while x
is the Variable corresponding to its input. The computation itself is represented by another type of node, an Apply node, and may be accessed through y.owner
.
More specifically, a Variable is a basic structure in Theano that represents a datum at a certain point in computation. It is typically an instance of the class Variable
or one of its subclasses.
A Variable r
contains four important fields:
- type
- Type定义此Variable在计算中可以保留的值的类型。
- owner
- 它为None,或者如果Variable是输出则为一个Apply节点。
- index
- 一个整数,使得
owner.outputs[index] is r
(如果owner
为None,则忽略) - name
- a string to use in pretty-printing and debugging.
Variable has one special subclass: Constant.
Constant¶
Constant是具有一个额外字段data的Variable(只能设置一次)。When used in a computation graph as the input of an Op application, it is assumed that said input will always take the value contained in the constant’s data field. Furthermore, it is assumed that the Op will not under any circumstances modify the input. This means that a constant is eligible to participate in numerous optimizations: constant inlining in C code, constant folding, etc.
A constant does not need to be specified in a function
‘s list of inputs. In fact, doing so will raise an exception.
图结构扩展¶
When we start the compilation of a Theano function, we compute some extra information. This section describes a portion of the information that is made available. 不是一切都会描述,所以如果缺少你需要的东西请发送电子邮件到theano-dev。
图在编译开始时克隆,因此在编译期间完成的修改不会影响用户图。
每个variable接收一个名为clients的新字段。It is a list with references to every place in the graph where this variable is used. 如果它的长度为0,表示variable没有使用。Each place where it is used is described by a tuple of 2 elements. There are two types of pairs:
- The first element is an Apply node.
- The first element is the string “output”. It means the function outputs this variable.
In both types of pairs, the second element of the tuple is an index, such that: var.clients[*][0].inputs[index]
or fgraph.outputs[index]
is that variable.
>>> import theano
>>> v = theano.tensor.vector()
>>> f = theano.function([v], (v+1).sum())
>>> theano.printing.debugprint(f)
Sum{acc_dtype=float64} [id A] '' 1
|Elemwise{add,no_inplace} [id B] '' 0
|TensorConstant{(1,) of 1.0} [id C]
|<TensorType(float64, vector)> [id D]
>>> # Sorted list of all nodes in the compiled graph.
>>> topo = f.maker.fgraph.toposort()
>>> topo[0].outputs[0].clients
[(Sum{acc_dtype=float64}(Elemwise{add,no_inplace}.0), 0)]
>>> topo[1].outputs[0].clients
[('output', 0)]
>>> # An internal variable
>>> var = topo[0].outputs[0]
>>> client = var.clients[0]
>>> client
(Sum{acc_dtype=float64}(Elemwise{add,no_inplace}.0), 0)
>>> type(client[0])
<class 'theano.gof.graph.Apply'>
>>> assert client[0].inputs[client[1]] is var
>>> # An output of the graph
>>> var = topo[1].outputs[0]
>>> client = var.clients[0]
>>> client
('output', 0)
>>> assert f.maker.fgraph.outputs[client[1]] is var
自动微分¶
有了图结构,自动计算微分很简单。tensor.grad()
唯一需要做的是从输出经过所有apply节点遍历回输入(apply节点是定义图执行哪些计算的节点)。For each such apply node, its op defines how to compute the gradient of the node’s outputs with respect to its inputs. Note that if an op does not provide this information, it is assumed that the gradient is not defined. Using the chain rule these gradients can be composed in order to obtain the expression of the gradient of the graph’s output with respect to the graph’s inputs.
本教程后面有一节将更详细地探讨微分的主题。
优化¶
当编译一个Theano函数时,你给theano.function
的实际上是一个图(从输出变量开始,你可以遍历图到输入变量)。While this graph structure shows how to compute the output from the input, it also offers the possibility to improve the way this computation is carried out. The way optimizations work in Theano is by identifying and replacing certain patterns in the graph with other specialized patterns that produce the same results but are either faster or more stable. Optimizations can also detect identical subgraphs and ensure that the same values are not computed twice or reformulate parts of the graph to a GPU specific version.
例如,Theano使用的一个(简单)优化是用x替换模式。
Further information regarding the optimization process and the specific optimizations that are applicable is respectively available in the library and on the entrance page of the documentation.
Example
符号编程涉及范式的改变:当我们应用它时,它将变得更清楚。Consider the following example of optimization:
>>> import theano
>>> a = theano.tensor.vector("a") # declare symbolic variable
>>> b = a + a ** 10 # build symbolic expression
>>> f = theano.function([a], b) # compile function
>>> print(f([0, 1, 2])) # prints `array([0,2,1026])`
[ 0. 2. 1026.]
>>> theano.printing.pydotprint(b, outfile="./pics/symbolic_graph_unopt.png", var_with_name_simple=True)
The output file is available at ./pics/symbolic_graph_unopt.png
>>> theano.printing.pydotprint(f, outfile="./pics/symbolic_graph_opt.png", var_with_name_simple=True)
The output file is available at ./pics/symbolic_graph_opt.png
我们使用theano.printing.pydotprint()
来显示优化过的图(右),这比未优化的图(左)更紧凑。
Unoptimized graph | 优化过的图 | |
---|---|---|