1.3.1. NumPy数组对象

1.3.1.1. What are NumPy and NumPy arrays?

1.3.1.1.1. NumPy数组

Python对象
  • 高级数值对象:整数、浮点数
  • 容器:列表(插入和附加无成本),字典(快速查找)
NumPy provides:
  • 用于多维数组的Python扩展包
  • closer to hardware (efficiency)
  • designed for scientific computation (convenience)
  • 也称为面向数组的计算
>>> import numpy as np
>>> a = np.array([0, 1, 2, 3])
>>> a
array([0, 1, 2, 3])

For example, An array containing:

  • values of an experiment/simulation at discrete time steps
  • 由测量装置记录的信号,例如声波
  • 图像的像素、灰度或颜色
  • 3-D data measured at different X-Y-Z positions, e.g. MRI scan
  • ...

Why it is useful: Memory-efficient container that provides fast numerical operations.

In [1]: L = range(1000)
In [2]: %timeit [i**2 for i in L]
1000 loops, best of 3: 403 us per loop
In [3]: a = np.arange(1000)
In [4]: %timeit a**2
100000 loops, best of 3: 12.7 us per loop

1.3.1.1.2. NumPy参考文档

  • 在网上:http://docs.scipy.org/

  • 交互式帮助:

    In [5]: np.array?
    
    String Form:<built-in function array>
    Docstring:
    array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0, ...
  • 查找某个东西:

    >>> np.lookfor('create array') 
    
    Search results for 'create array'
    ---------------------------------
    numpy.array
    Create an array.
    numpy.memmap
    Create a memory-map to an array stored in a *binary* file on disk.
    In [6]: np.con*?
    
    np.concatenate
    np.conj
    np.conjugate
    np.convolve

1.3.1.1.3. 导入约定

The recommended convention to import numpy is:

>>> import numpy as np

1.3.1.2. Creating arrays

1.3.1.2.1. 手动构建数组

  • 1-D:

    >>> a = np.array([0, 1, 2, 3])
    
    >>> a
    array([0, 1, 2, 3])
    >>> a.ndim
    1
    >>> a.shape
    (4,)
    >>> len(a)
    4
  • 2-D, 3-D, ...:

    >>> b = np.array([[0, 1, 2], [3, 4, 5]])    # 2 x 3 array
    
    >>> b
    array([[0, 1, 2],
    [3, 4, 5]])
    >>> b.ndim
    2
    >>> b.shape
    (2, 3)
    >>> len(b) # returns the size of the first dimension
    2
    >>> c = np.array([[[1], [2]], [[3], [4]]])
    >>> c
    array([[[1],
    [2]],
    [[3],
    [4]]])
    >>> c.shape
    (2, 2, 1)

Exercise: Simple arrays

  • Create a simple two dimensional array. First, redo the examples from above. And then create your own: how about odd numbers counting backwards on the first row, and even numbers on the second?
  • Use the functions len(), numpy.shape() on these arrays. How do they relate to each other? 以及与数组的ndim属性的关系?

1.3.1.2.2. 创建数组的函数

在实践中,我们很少一个一个地输入元素...

  • Evenly spaced:

    >>> a = np.arange(10) # 0 .. n-1  (!)
    
    >>> a
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> b = np.arange(1, 9, 2) # start, end (exclusive), step
    >>> b
    array([1, 3, 5, 7])
  • 或按点的个数:

    >>> c = np.linspace(0, 1, 6)   # start, end, num-points
    
    >>> c
    array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ])
    >>> d = np.linspace(0, 1, 5, endpoint=False)
    >>> d
    array([ 0. , 0.2, 0.4, 0.6, 0.8])
  • 常用数组:

    >>> a = np.ones((3, 3))  # reminder: (3, 3) is a tuple
    
    >>> a
    array([[ 1., 1., 1.],
    [ 1., 1., 1.],
    [ 1., 1., 1.]])
    >>> b = np.zeros((2, 2))
    >>> b
    array([[ 0., 0.],
    [ 0., 0.]])
    >>> c = np.eye(3)
    >>> c
    array([[ 1., 0., 0.],
    [ 0., 1., 0.],
    [ 0., 0., 1.]])
    >>> d = np.diag(np.array([1, 2, 3, 4]))
    >>> d
    array([[1, 0, 0, 0],
    [0, 2, 0, 0],
    [0, 0, 3, 0],
    [0, 0, 0, 4]])
  • np.random: random numbers (Mersenne Twister PRNG):

    >>> a = np.random.rand(4)       # uniform in [0, 1]
    
    >>> a
    array([ 0.95799151, 0.14222247, 0.08777354, 0.51887998])
    >>> b = np.random.randn(4) # Gaussian
    >>> b
    array([ 0.37544699, -0.11425369, -0.47616538, 1.79664113])
    >>> np.random.seed(1234) # Setting the random seed

Exercise: Creating arrays using functions

  • Experiment with arange, linspace, ones, zeros, eye and diag.
  • Create different kinds of arrays with random numbers.
  • Try setting the seed before creating an array with random values.
  • Look at the function np.empty. What does it do? When might this be useful?

1.3.1.3. Basic data types

你可能已经注意到,在某些情况下,数组元素显示有后面的点(例如2. vs 2)。This is due to a difference in the data-type used:

>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')
>>> b = np.array([1., 2., 3.])
>>> b.dtype
dtype('float64')

Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers. Note that, in the example above, NumPy auto-detects the data-type from the input.


你可以显式指定所需的数据类型:

>>> c = np.array([1, 2, 3], dtype=float)
>>> c.dtype
dtype('float64')

The default data type is floating point:

>>> a = np.ones((3, 3))
>>> a.dtype
dtype('float64')

There are also other types:

复数:
>>> d = np.array([1+2j, 3+4j, 5+6*1j])
>>> d.dtype
dtype('complex128')
Bool:
>>> e = np.array([True, False, False, True])
>>> e.dtype
dtype('bool')
Strings:
>>> f = np.array(['Bonjour', 'Hello', 'Hallo',])
>>> f.dtype # <--- strings containing max. 7 letters
dtype('S7')
更多类型:
  • int32
  • int64
  • uint32
  • uint64

1.3.1.4. 基本的可视化

Now that we have our first data arrays, we are going to visualize them.

Start by launching IPython:

$ ipython

或者notebook:

$ ipython notebook

IPython启动之后,启用交互式画图:

>>> %matplotlib  

或者,从notebook中,在notebook中启用画图:

>>> %matplotlib inline 

inline对notebook很重要,这样图表显示在notebook中,而不是显示在新窗口中。

Matplotlib is a 2D plotting package. 我们可以如下导入其函数:

>>> import matplotlib.pyplot as plt  # the tidy way

然后使用(请注意,如果你未使用%matplotlib启用交互图,则必须显式使用show):

>>> plt.plot(x, y)       # line plot    
>>> plt.show() # <-- shows the plot (not needed with interactive plots)

或者,如果你已启用%matplotlib的交互式画图:

>>> plt.plot(x, y)       # line plot    
  • 1D绘图

    >>> x = np.linspace(0, 3, 20)
    
    >>> y = np.linspace(0, 9, 20)
    >>> plt.plot(x, y) # line plot
    [<matplotlib.lines.Line2D object at ...>]
    >>> plt.plot(x, y, 'o') # dot plot
    [<matplotlib.lines.Line2D object at ...>]

    [source code, hires.png, pdf]

    ../../_images/numpy_intro_1.png
  • 2D arrays (such as images):

    >>> image = np.random.rand(30, 30)
    
    >>> plt.imshow(image, cmap=plt.cm.hot)
    >>> plt.colorbar()
    <matplotlib.colorbar.Colorbar instance at ...>

    [source code, hires.png, pdf]

    ../../_images/numpy_intro_2.png

See also

更多在:matplotlib一章

Exercise: Simple visualizations

  • Plot some simple arrays: a cosine as a function of time and a 2D matrix.
  • Try using the gray colormap on the 2D matrix.

1.3.1.5. Indexing and slicing

可以按照与其他Python序列(例如列表)相同的方式访问和赋值数组的元素:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[-1]
(0, 2, 9)

Warning

指数从0开始,和其他Python序列(以及C/C++)一样。In contrast, in Fortran or Matlab, indices begin at 1.

支持反转python序列的通常写法:

>>> a[::-1]
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

对于多维数组,索引是整数组成的元组:

>>> a = np.diag(np.arange(3))
>>> a
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 2]])
>>> a[1, 1]
1
>>> a[2, 1] = 10 # third line, second column
>>> a
array([[ 0, 0, 0],
[ 0, 1, 0],
[ 0, 10, 2]])
>>> a[1]
array([0, 1, 0])

Note

  • In 2D, the first dimension corresponds to rows, the second to columns.
  • 对于多维数组aa[0]解释为获取未指定的维度中的所有元素。

Slicing: Arrays, like other Python sequences can also be sliced:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[2:9:3] # [start:end:step]
array([2, 5, 8])

Note that the last index is not included! :

>>> a[:4]
array([0, 1, 2, 3])

不需要所有三个切片分量:默认情况下,start为0,end是最后一个,step为1:

>>> a[1:3]
array([1, 2])
>>> a[::2]
array([0, 2, 4, 6, 8])
>>> a[3:]
array([3, 4, 5, 6, 7, 8, 9])

NumPy索引和切片摘要的小小图解...

../../_images/numpy_indexing.png

你还可以组合赋值和切片:

>>> a = np.arange(10)
>>> a[5:] = 10
>>> a
array([ 0, 1, 2, 3, 4, 10, 10, 10, 10, 10])
>>> b = np.arange(5)
>>> a[5:] = b[::-1]
>>> a
array([0, 1, 2, 3, 4, 4, 3, 2, 1, 0])

Exercise: Indexing and slicing

  • 尝试不同的切片风格,使用startendstep:从一个linspace开始,尝试获得从前向后数的奇数,和向后向前数的偶数。

  • 重复上图中的切片。你可以使用以下表达式创建数组:

    >>> np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]
    
    array([[ 0, 1, 2, 3, 4, 5],
    [10, 11, 12, 13, 14, 15],
    [20, 21, 22, 23, 24, 25],
    [30, 31, 32, 33, 34, 35],
    [40, 41, 42, 43, 44, 45],
    [50, 51, 52, 53, 54, 55]])

Exercise: Array creation

Create the following arrays (with correct data types):

[[1, 1, 1, 1],
[1, 1, 1, 1],
[1, 1, 1, 2],
[1, 6, 1, 1]]
[[0., 0., 0., 0., 0.],
[2., 0., 0., 0., 0.],
[0., 3., 0., 0., 0.],
[0., 0., 4., 0., 0.],
[0., 0., 0., 5., 0.],
[0., 0., 0., 0., 6.]]

课程标准:每个用3句话

提示:可以用类似于列表的方式访问各个数组元素,例如a[1]a[1, 2]

提示:查看diag的docstring。

练习:tile数组的创建

Skim through the documentation for np.tile, and use this function to construct the array:

[[4, 3, 4, 3, 4, 3],
[2, 1, 2, 1, 2, 1],
[4, 3, 4, 3, 4, 3],
[2, 1, 2, 1, 2, 1]]

1.3.1.6. 拷贝和视图

A slicing operation creates a view on the original array, which is just a way of accessing array data. Thus the original array is not copied in memory. 你可以使用np.may_share_memory()来检查两个数组是否共享同一个内存块。但请注意,这使用启发式算法,可能会给你假阳性。

When modifying the view, the original array is modified as well:

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = a[::2]
>>> b
array([0, 2, 4, 6, 8])
>>> np.may_share_memory(a, b)
True
>>> b[0] = 12
>>> b
array([12, 2, 4, 6, 8])
>>> a # (!)
array([12, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a = np.arange(10)
>>> c = a[::2].copy() # force a copy
>>> c[0] = 12
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.may_share_memory(a, c)
False

这种行为第一次看到可能令人惊讶...但它节省内存和时间。

Worked example: Prime number sieve

../../_images/prime-sieve.png

计算0-99之间的素数,用筛子

  • 构造一个形状为(100,) 的布尔数组is_prime,开始全部填充True:
>>> is_prime = np.ones((100,), dtype=bool)
  • 叉掉0和1,它们不是素数:
>>> is_prime[:2] = 0
  • 对于从2开始的每个整数j,叉掉它的倍数:
>>> N_max = int(np.sqrt(len(is_prime) - 1))
>>> for j in range(2, N_max + 1):
... is_prime[2*j::j] = False
  • 浏览help(np.nonzero),然后打印这些素数

  • Follow-up:

    • Move the above code into a script file named prime_sieve.py
    • 运行它并检查它的工作
    • Use the optimization suggested in the sieve of Eratosthenes:
    1. Skip j which are already known to not be primes
    2. 叉掉的第一个数字是j^2

1.3.1.7. 花式索引

NumPy数组不但可以用切片索引,而且可以用布尔数组或整数数组(掩码)索引。This method is called fancy indexing. 它创建拷贝而不是视图

1.3.1.7.1. 使用布尔掩码

>>> np.random.seed(3)
>>> a = np.random.randint(0, 21, 15)
>>> a
array([10, 3, 8, 0, 19, 10, 11, 9, 10, 6, 0, 20, 12, 7, 14])
>>> (a % 3 == 0)
array([False, True, False, True, False, False, False, True, False,
True, True, False, True, False, False], dtype=bool)
>>> mask = (a % 3 == 0)
>>> extract_from_a = a[mask] # or, a[a%3==0]
>>> extract_from_a # extract a sub-array with the mask
array([ 3, 0, 9, 6, 0, 12])

Indexing with a mask can be very useful to assign a new value to a sub-array:

>>> a[a % 3 == 0] = -1
>>> a
array([10, -1, 8, -1, 19, 10, 11, -1, 10, -1, -1, 20, -1, 7, 14])

1.3.1.7.2. 用整数数组索引

>>> a = np.arange(0, 100, 10)
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

Indexing can be done with an array of integers, where the same index is repeated several time:

>>> a[[2, 3, 2, 4, 2]]  # note: [2, 3, 2, 4, 2] is a Python list
array([20, 30, 20, 40, 20])

New values can be assigned with this kind of indexing:

>>> a[[9, 7]] = -100
>>> a
array([ 0, 10, 20, 30, 40, 50, 60, -100, 80, -100])

When a new array is created by indexing with an array of integers, the new array has the same shape than the array of integers:

>>> a = np.arange(10)
>>> idx = np.array([[3, 4], [9, 7]])
>>> idx.shape
(2, 2)
>>> a[idx]
array([[3, 4],
[9, 7]])

下面的图片说明了各种花式索引应用

../../_images/numpy_fancy_indexing.png

练习:花式索引

  • 再次,再现上图中所示的花式索引。
  • 在左边使用花式索引,右边使用数组创建将值赋值到数组,例如将上图中数组的一部分设置为零。