注意

here下载完整的示例代码

（实验）PyTorch 中命名张力简介|

作者：祖

命名"张力"旨在通过允许用户将显式名称与张数维度相关联，使张数更易于使用。在大多数情况下，采用尺寸参数的操作将接受尺寸名称，从而无需按位置跟踪尺寸。此外，命名张子使用名称自动检查 API 在运行时是否正确使用，从而提供额外的安全性。名称还可用于重新排列维度，例如，支持"按名称广播"，而不是"按位置广播"。

本教程旨在介绍 1.3 发布中将包含的功能。到最后，您将能够：

创建具有命名维度的"张力"，以及删除或重命名这些维度
了解操作如何传播维度名称的基础知识
了解命名维度如何在两个关键区域启用更清晰的代码：
- 广播业务
- 展平和未平展尺寸

最后，我们将使用命名的张条编写多头注意模块，从而付诸实践。

在PyTorch的命名张子是启发和与萨沙拉什合作完成。 Sasha在2019年1月的博客文章中提出了最初的想法和概念证明。

基础知识：命名维度|

PyTorch 现在允许拉伸器具有命名维度;工厂函数采用一个新名称参数，该参数将名称与每个维度关联。这适用于大多数工厂功能，例如

张量
空
的
零
兰登
兰德

在这里，我们构造一个带名称的张条：

import torch
imgs = torch.randn(1, 2, 2, 3, names=('N', 'C', 'H', 'W'))
print(imgs.names)

输出：

('N', 'C', 'H', 'W')

与最初名为"张量"的博客文章不同，命名维度是有序的tensor.names[i]是tensor的i x 维度的名称。

有两种方法可以重命名Tensor的维度：

# Method #1: set the .names attribute (this changes name in-place)
imgs.names = ['batch', 'channel', 'width', 'height']
print(imgs.names)

# Method #2: specify new names (this changes names out-of-place)
imgs = imgs.rename(channel='C', width='W', height='H')
print(imgs.names)

输出：

('batch', 'channel', 'width', 'height')
('batch', 'C', 'W', 'H')

删除名称的首选方法是调用tensor.rename(None)

imgs = imgs.rename(None)
print(imgs.names)

输出：

(None, None, None, None)

未命名的张条（无命名尺寸的张条）仍然正常工作repr名称。

unnamed = torch.randn(2, 1, 3)
print(unnamed)
print(unnamed.names)

输出：

tensor([[[ 1.2758, -0.2081,  0.7520]],

        [[ 0.3952,  1.4589, -0.5329]]])
(None, None, None)

命名张子不需要命名所有维度。

imgs = torch.randn(3, 1, 1, 2, names=('N', None, None, None))
print(imgs.names)

输出：

('N', None, None, None)

因为命名张条可以与未命名的张条共存，我们需要一种好的方式来编写命名张数感知代码，该代码适用于命名和未命名的张条。使用tensor.refine_names(*names)优化尺寸，并将未命名的暗点提升到命名点。优化维度定义为具有以下约束的"重命名"：

ANone昏暗可以提炼为具有任何名称
命名的 dim 只能细化为具有相同的名称。

imgs = torch.randn(3, 1, 1, 2)
named_imgs = imgs.refine_names('N', 'C', 'H', 'W')
print(named_imgs.names)

# Refine the last two dims to 'H' and 'W'. In Python 2, use the string '...'
# instead of ...
named_imgs = imgs.refine_names(..., 'H', 'W')
print(named_imgs.names)


def catch_error(fn):
    try:
        fn()
        assert False
    except RuntimeError as err:
        err = str(err)
        if len(err) > 180:
            err = err[:180] + "..."
        print(err)


named_imgs = imgs.refine_names('N', 'C', 'H', 'W')

# Tried to refine an existing name to a different name
catch_error(lambda: named_imgs.refine_names('N', 'C', 'H', 'width'))

输出：

('N', 'C', 'H', 'W')
(None, None, 'H', 'W')
refine_names: cannot coerce Tensor['N', 'C', 'H', 'W'] to Tensor['N', 'C', 'H', 'width'] because 'W' is different from 'width' at index 3

大多数简单操作传播名称。命名张条的最终目标是所有操作以合理、直观的方式传播名称。在 1.3 版本发布时，已添加对许多常见操作的支持;这里，例如，是.abs()：

print(named_imgs.abs().names)

输出：

('N', 'C', 'H', 'W')

访问和减少|

可以使用维度名称来引用维度而不是位置维度。这些操作还会传播名称。索引（基本和高级）尚未实现，但已处于路线图中。使用上面named_imgs张，我们可以执行：

output = named_imgs.sum('C')  # Perform a sum over the channel dimension
print(output.names)

img0 = named_imgs.select('N', 0)  # get one image
print(img0.names)

输出：

('N', 'H', 'W')
('C', 'H', 'W')

名称推断|

名称在称为名称推理的两个步骤的进程中在操作上传播：

检查名称：操作员可在运行时执行自动检查，检查某些维度名称必须匹配。
传播名称：名称推理将输出名称传播到输出张量。

让我们通过添加 2 个无广播的单点张量的非常小的示例。

x = torch.randn(3, names=('X',))
y = torch.randn(3)
z = torch.randn(3, names=('Z',))

检查名称：首先，我们将检查这两个张量的名称是否匹配。如果并且仅当两个名称相等（字符串相等）或至少一个名称为NoneNone本质上是一个特殊的通配符名称）时，两个名称才匹配。因此，这三个错误中唯一一个将出错，是x = z：

catch_error(lambda: x + z)

输出：

Error when attempting to broadcast dims ['X'] and dims ['Z']: dim 'X' and dim 'Z' are at the same position from the right but do not match.

传播名称：通过返回两个名称中最精细的名称来统一这两个名称。对于x = y，X比None"更精细。 X

print((x + y).names)

输出：

('X',)

大多数名称推理规则都很简单，但其中一些规则可能具有意外的语义。让我们来回顾一下一对你可能遇到的夫妇：广播和矩阵乘法。

广播|

命名张子不会改变广播行为;他们仍然按位置广播。但是，在检查两个维度是否可以广播时，PyTorch 还会检查这些维度的名称是否匹配。

这将导致命名张子在广播操作期间防止意外对齐。在下面的示例中，我们将per_batch_scale应用于imgs

imgs = torch.randn(2, 2, 2, 2, names=('N', 'C', 'H', 'W'))
per_batch_scale = torch.rand(2, names=('N',))
catch_error(lambda: imgs * per_batch_scale)

输出：

Error when attempting to broadcast dims ['N', 'C', 'H', 'W'] and dims ['N']: dim 'W' and dim 'N' are at the same position from the right but do not match.

没有namesper_batch_scale张量与imgs的最后一个维度对齐，这不是我们所想的。我们真的想per_batch_scale imgs的批处理维度对齐来执行该操作。有关如何按名称对齐张量器的新"按名称显式广播"功能，请参阅下文介绍。

矩阵乘法|

torch.mm（A、B）在A的第二个dim和B的第一个dim之间执行点积，返回一个张量，A的第一个AB的第二个dim。（其他母形函数，如torch.matmul和torch.dot的行为类似）。 torch.mv

markov_states = torch.randn(128, 5, names=('batch', 'D'))
transition_matrix = torch.randn(5, 5, names=('in', 'out'))

# Apply one transition
new_state = markov_states @ transition_matrix
print(new_state.names)

输出：

('batch', 'out')

如您所见，矩阵乘法不检查收缩维度是否具有相同的名称。

接下来，我们将介绍命名"张条"启用的两个新行为：按名称显式广播，按名称拼合和取消拼合维度

新行为：按名称显式广播|

处理多个维度的主要抱怨之一是需要unsqueeze"虚拟"维度，以便执行操作。例如，在之前每个批次比例的示例中，使用未命名的张量，我们将执行以下操作：

imgs = torch.randn(2, 2, 2, 2)  # N, C, H, W
per_batch_scale = torch.rand(2)  # N

correct_result = imgs * per_batch_scale.view(2, 1, 1, 1)  # N, C, H, W
incorrect_result = imgs * per_batch_scale.expand_as(imgs)
assert not torch.allclose(correct_result, incorrect_result)

通过使用名称，我们可以使这些操作更安全（并且很容易与维度数无关）。我们提供新的tensor.align_as(other)操作，使张量的尺寸与other.names中指定的顺序相匹配，在适当情况下添加一个尺寸的尺寸（tensor.align_to(*names)也有效）：

imgs = imgs.refine_names('N', 'C', 'H', 'W')
per_batch_scale = per_batch_scale.refine_names('N')

named_result = imgs * per_batch_scale.align_as(imgs)
# note: named tensors do not yet work with allclose
assert torch.allclose(named_result.rename(None), correct_result)

新行为：按名称拼合和取消拼合尺寸|

一个常见操作是平展和展开尺寸。现在，用户使用view、reshape或flatten执行此项;用例包括拼合批处理尺寸，以将张量发送到必须获取具有一定尺寸的输入的运算符（即 conv2d 采用 4D 输入）。

为了使这些操作在语义上比视图或重塑更有意义，我们引入了一种新的张式.unflatten（dim，命名形状）方法，并更新flatten以处理名称：张多.平展（dim，new_dim）。

flatten只能平展相邻尺寸，但也可用于非连续的暗点。一个人必须进入unflatten个命名的形状，这是一个列表（dim，大小）元数，以指定如何解平昏暗。在flatten期间可以保存大小以unflatten，但我们尚未这样做。

imgs = imgs.flatten(['C', 'H', 'W'], 'features')
print(imgs.names)

imgs = imgs.unflatten('features', (('C', 2), ('H', 2), ('W', 2)))
print(imgs.names)

输出：

('N', 'features')
('N', 'C', 'H', 'W')

自动分级支持|

Autograd 目前忽略所有张子上的名称，只是将它们视为常规张子。梯度计算是正确的，但我们失去了名称给我们的安全性。它在将名称处理引入自动格勒的路线图中。

x = torch.randn(3, names=('D',))
weight = torch.randn(3, names=('D',), requires_grad=True)
loss = (x - weight).abs()
grad_loss = torch.randn(3)
loss.backward(grad_loss)

correct_grad = weight.grad.clone()
print(correct_grad)  # Unnamed for now. Will be named in the future

weight.grad.zero_()
grad_loss = grad_loss.refine_names('C')
loss = (x - weight).abs()
# Ideally we'd check that the names of loss and grad_loss match, but we don't
# yet
loss.backward(grad_loss)

print(weight.grad)  # still unnamed
assert torch.allclose(weight.grad, correct_grad)

输出：

tensor([0.6840, 1.0557, 0.7177])
tensor([0.6840, 1.0557, 0.7177])

其他受支持（和不支持）功能|

有关 1.3 版本支持的内容的详细细分，请参阅此处。

特别是，我们希望指出当前不支持的三个重要功能：

通过torch.save保存或加载命名张量器。 torch.load
通过torch.multiprocessing
JIT 支持;例如，以下将错误

imgs_named = torch.randn(1, 2, 2, 3, names=('N', 'C', 'H', 'W'))

@torch.jit.script
def fn(x):
    return x

catch_error(lambda: fn(imgs_named))

输出：

NYI: Named tensors are currently unsupported in TorchScript. As a  workaround please drop names via `tensor = tensor.rename(None)`. (guardAgainstNamedTensor at /pytorch/torch/csrc/...

作为解决方法，请在使用任何尚不支持命名张量之前，通过张量 + 张量来删除名称。

较长的示例：多头注意力|

现在，我们将通过实现通用 PyTorch nn 的完整示例nn.Module：多头注意。我们假设读者已经熟悉多头注意力;复习，请查看此解释或此解释。

我们适应ParlAI多头注意力的实施;具体在这里。通读该示例中的代码;然后，与下面的代码进行比较，指出有四个位置标记为（I）、（II）、（III）和（IV），其中使用命名张条可实现更具可读性的代码;我们将在代码块之后深入到其中每个。

import torch.nn as nn
import torch.nn.functional as F
import math


class MultiHeadAttention(nn.Module):
    def __init__(self, n_heads, dim, dropout=0):
        super(MultiHeadAttention, self).__init__()
        self.n_heads = n_heads
        self.dim = dim

        self.attn_dropout = nn.Dropout(p=dropout)
        self.q_lin = nn.Linear(dim, dim)
        self.k_lin = nn.Linear(dim, dim)
        self.v_lin = nn.Linear(dim, dim)
        nn.init.xavier_normal_(self.q_lin.weight)
        nn.init.xavier_normal_(self.k_lin.weight)
        nn.init.xavier_normal_(self.v_lin.weight)
        self.out_lin = nn.Linear(dim, dim)
        nn.init.xavier_normal_(self.out_lin.weight)

    def forward(self, query, key=None, value=None, mask=None):
        # (I)
        query = query.refine_names(..., 'T', 'D')
        self_attn = key is None and value is None
        if self_attn:
            mask = mask.refine_names(..., 'T')
        else:
            mask = mask.refine_names(..., 'T', 'T_key')  # enc attn

        dim = query.size('D')
        assert dim == self.dim, \
            f'Dimensions do not match: {dim} query vs {self.dim} configured'
        assert mask is not None, 'Mask is None, please specify a mask'
        n_heads = self.n_heads
        dim_per_head = dim // n_heads
        scale = math.sqrt(dim_per_head)

        # (II)
        def prepare_head(tensor):
            tensor = tensor.refine_names(..., 'T', 'D')
            return (tensor.unflatten('D', [('H', n_heads), ('D_head', dim_per_head)])
                          .align_to(..., 'H', 'T', 'D_head'))

        assert value is None
        if self_attn:
            key = value = query
        elif value is None:
            # key and value are the same, but query differs
            key = key.refine_names(..., 'T', 'D')
            value = key
        dim = key.size('D')

        # Distinguish between query_len (T) and key_len (T_key) dims.
        k = prepare_head(self.k_lin(key)).rename(T='T_key')
        v = prepare_head(self.v_lin(value)).rename(T='T_key')
        q = prepare_head(self.q_lin(query))

        dot_prod = q.div_(scale).matmul(k.align_to(..., 'D_head', 'T_key'))
        dot_prod.refine_names(..., 'H', 'T', 'T_key')  # just a check

        # (III)
        attn_mask = (mask == 0).align_as(dot_prod)
        dot_prod.masked_fill_(attn_mask, -float(1e20))

        attn_weights = self.attn_dropout(F.softmax(dot_prod / scale,
                                                   dim='T_key'))

        # (IV)
        attentioned = (
            attn_weights.matmul(v).refine_names(..., 'H', 'T', 'D_head')
            .align_to(..., 'T', 'H', 'D_head')
            .flatten(['H', 'D_head'], 'D')
        )

        return self.out_lin(attentioned).refine_names(..., 'T', 'D')

（I）精炼输入张条变暗

def forward(self, query, key=None, value=None, mask=None):
    # (I)
    query = query.refine_names(..., 'T', 'D')

查询 = 查询.refine_names（...，T'，'D'）用作可解释的文档，并将输入维度提升为命名。它检查最后两个维度是否可以细化为['T'，'D']，防止以后可能静音或混淆的大小不匹配错误。

（二） prepare_head中操纵尺寸

# (II)
def prepare_head(tensor):
    tensor = tensor.refine_names(..., 'T', 'D')
    return (tensor.unflatten('D', [('H', n_heads), ('D_head', dim_per_head)])
                  .align_to(..., 'H', 'T', 'D_head'))

需要注意的第一件事是代码如何清楚地表示输入和输出维度：输入张量必须以T和D变暗结束，输出张量以H和D_head点暗结束。 T

需要注意的第二件事是代码描述正在发生的事情的清晰程度。 prepare_head获取键、查询和值，并将嵌入模糊到多个头，最后将 dim 顺序重新排列为[...，'H'，"T"，"D_head"。。 ParlAI 实现prepare_head作为以下内容，使用view和transpose操作：

def prepare_head(tensor):
    # input is [batch_size, seq_len, n_heads * dim_per_head]
    # output is [batch_size * n_heads, seq_len, dim_per_head]
    batch_size, seq_len, _ = tensor.size()
    tensor = tensor.view(batch_size, tensor.size(1), n_heads, dim_per_head)
    tensor = (
        tensor.transpose(1, 2)
        .contiguous()
        .view(batch_size * n_heads, seq_len, dim_per_head)
    )
    return tensor

我们命名的张量变量使用操作，虽然更详细，但比view和transpose更具语义意义，并且以名称的形式包含可证实的文档。

（三）以姓名进行明确广播

def ignore():
    # (III)
    attn_mask = (mask == 0).align_as(dot_prod)
    dot_prod.masked_fill_(attn_mask, -float(1e20))

mask通常有暗淡[N， T] （在自我注意的情况下）或[N， T， T_key] （在编码器注意的情况下），而dot_prod有暗淡[N， H， T， T_key]。为了使mask广播正确与dot_prod我们通常会解开昏暗1和-1的情况下，自我注意或unsqueeze昏暗1的情况下，编码器注意。使用命名的张子，我们只需使用align_as将attn_mask与dot_prod对齐，并不再担心在哪里unsqueeze。

（四）使用align_to和拼合进行更多尺寸操作

def ignore():
    # (IV)
    attentioned = (
        attn_weights.matmul(v).refine_names(..., 'H', 'T', 'D_head')
        .align_to(..., 'T', 'H', 'D_head')
        .flatten(['H', 'D_head'], 'D')
    )

在这里，如在（II）中align_to和flatten比view和transpose（尽管更详细）在语义上更有意义。

运行示例|

n, t, d, h = 7, 5, 2 * 3, 3
query = torch.randn(n, t, d, names=('N', 'T', 'D'))
mask = torch.ones(n, t, names=('N', 'T'))
attn = MultiHeadAttention(h, d)
output = attn(query, mask=mask)
# works as expected!
print(output.names)

输出：

('N', 'T', 'D')

上述工作如预期的那样。此外，请注意，在代码中，我们根本不提及批处理维度的名称。事实上，我们的MultiHeadAttention模块与批处理维度的存在是不可知的。

query = torch.randn(t, d, names=('T', 'D'))
mask = torch.ones(t, names=('T',))
output = attn(query, mask=mask)
print(output.names)

输出：

('T', 'D')

结论|

感谢您的阅读！命名张量器仍在开发中;如果您有反馈和/或改进建议，请通过创建一个问题来告知我们。

脚本总运行时间： （0 分 0.144 秒）

由狮身人面像库生成的画廊