1.2.5. 重用代码:脚本和模块

For now, we have typed all instructions in the interpreter. 对于更长的指令集,我们需要改变一下,在文本文件中编写代码(使用文本编辑器),我们将称它们为脚本模块使用你最喜欢的文本编辑器(只要它为Python提供语法高亮),或者使用你可能正在使用的科学计算Python套件自带的编辑器(例如,Python(x,y)的Scite)。

1.2.5.1. 脚本

让我们先写一个脚本,它是一个具有指令序列的文件,这些指令在每次调用该脚本时都会执行。指令可以是从解释器复制和粘贴过来的(但注意遵守缩进规则!).

The extension for Python files is .py. Write or copy-and-paste the following lines in a file called test.py

message = "Hello how are you?"
for word in message.split():
print word

Let us now execute the script interactively, that is inside the Ipython interpreter. 这可能是在科学计算中脚本最常见的使用方式。

Note

in Ipython, the syntax to execute a script is %run script.py. For example,

In [1]: %run test.py
Hello
how
are
you?
In [2]: message
Out[2]: 'Hello how are you?'

The script has been executed. Moreover the variables defined in the script (such as message) are now available inside the interpreter’s namespace.

其他解释器也可以执行脚本(例如,在纯Python解释器中的execfile).

也可以通过在shell终端(Linux/Mac控制台或Windows cmd控制台)中执行脚本,将此脚本作为独立程序执行。For example, if we are in the same directory as the test.py file, we can execute this in a console:

$ python test.py
Hello
how
are
you?

独立脚本也可以接收命令行参数

In file.py:

import sys
print sys.argv
$ python file.py test arguments
['file.py', 'test', 'arguments']

Warning

Don’t implement option parsing yourself. 使用诸如optparseargparsedocopt等模块。

1.2.5.2. 从模块导入对象

In [1]: import os
In [2]: os
Out[2]: <module 'os' from '/usr/lib/python2.6/os.pyc'>
In [3]: os.listdir('.')
Out[3]:
['conf.py',
'basic_types.rst',
'control_flow.rst',
'functions.rst',
'python_language.rst',
'reusing.rst',
'file_io.rst',
'exceptions.rst',
'workflow.rst',
'index.rst']

还有:

In [4]: from os import listdir

Importing shorthands:

In [5]: import numpy as np

Warning

from os import *

这称为星号导入请小心使用

  • Makes the code harder to read and understand: where do symbols come from?
  • 使它不可能通过上下文和名称猜测功能(提示:os.name是操作系统的名称),并有效利用制表符补全。
  • 限制可以使用的变量名:os.name可能覆盖name,反之亦然。
  • Creates possible name clashes between modules.
  • Makes the code impossible to statically check for undefined symbols.

Modules are thus a good way to organize code in a hierarchical way. Actually, all the scientific computing tools we are going to use are modules:

>>> import numpy as np # data arrays
>>> np.linspace(0, 10, 6)
array([ 0., 2., 4., 6., 8., 10.])
>>> import scipy # scientific computing

在Python(x,y)中,Ipython(x,y)在启动时执行以下导入:

>>> import numpy
>>> import numpy as np
>>> from pylab import *
>>> import scipy

and it is not necessary to re-import these modules.

1.2.5.3. 创建模块

如果我们想要编写更大和更好的组织程序(与简单脚本相比),其中定义了一些对象(变量、函数、类),并且我们想要重用多次,我们必须创建自己的模块

让我们创建一个模块demo,包含在文件demo.py中:

"A demo module."
def print_b():
"Prints b."
print 'b'
def print_a():
"Prints a."
print 'a'
c = 2
d = 2

In this file, we defined two functions print_a and print_b. Suppose we want to call the print_a function from the interpreter. 我们可以以脚本的形式执行文件,但由于我们只想访问print_a函数,我们更愿意导入它为模块The syntax is as follows.

In [1]: import demo
In [2]: demo.print_a()
a
In [3]: demo.print_b()
b

Importing the module gives access to its objects, using the module.object syntax. Don’t forget to put the module’s name before the object’s name, otherwise Python won’t recognize the instruction.

Introspection

In [4]: demo?
Type: module
Base Class: <type 'module'>
String Form: <module 'demo' from 'demo.py'>
Namespace: Interactive
File: /home/varoquau/Projects/Python_talks/scipy_2009_tutorial/source/demo.py
Docstring:
A demo module.
In [5]: who
demo
In [6]: whos
Variable Type Data/Info
------------------------------
demo module <module 'demo' from 'demo.py'>
In [7]: dir(demo)
Out[7]:
['__builtins__',
'__doc__',
'__file__',
'__name__',
'__package__',
'c',
'd',
'print_a',
'print_b']
In [8]: demo.
demo.__builtins__ demo.__init__ demo.__str__
demo.__class__ demo.__name__ demo.__subclasshook__
demo.__delattr__ demo.__new__ demo.c
demo.__dict__ demo.__package__ demo.d
demo.__doc__ demo.__reduce__ demo.print_a
demo.__file__ demo.__reduce_ex__ demo.print_b
demo.__format__ demo.__repr__ demo.py
demo.__getattribute__ demo.__setattr__ demo.pyc
demo.__hash__ demo.__sizeof__

Importing objects from modules into the main namespace

In [9]: from demo import print_a, print_b
In [10]: whos
Variable Type Data/Info
--------------------------------
demo module <module 'demo' from 'demo.py'>
print_a function <function print_a at 0xb7421534>
print_b function <function print_b at 0xb74214c4>
In [11]: print_a()
a

Warning

Module caching

模块被缓存:如果修改demo.py并在旧会话中重新导入,你将获得旧的会话。

解决办法:

In [10]: reload(demo)

In Python3 instead reload is not builtin, so you have to import the importlib module first and then do:

In [10]: importlib.reload(demo)

1.2.5.4. '__main__'和模块加载

Sometimes we want code to be executed when a module is run directly, but not when it is imported by another module. if __name__ == '__main__'允许我们检查模块是否是被直接运行。

File demo2.py:

def print_b():
"Prints b."
print 'b'
def print_a():
"Prints a."
print 'a'
# print_b() runs on import
print_b()
if __name__ == '__main__':
# print_a() is only executed when the module is run directly.
print_a()

Importing it:

In [11]: import demo2
b
In [12]: import demo2

Running it:

In [13]: %run demo2
b
a

1.2.5.5. 脚本还是模块?如何组织代码

Note

Rule of thumb

  • Sets of instructions that are called several times should be written inside functions for better code reusability.
  • 由几个脚本调用的函数(或其他代码片段)应编写为模块,这样在不同的脚本只要导入该模块(不要复制粘贴你的函数到不同的脚本!).

1.2.5.5.1. 如何找到和导入模块

When the import mymodule statement is executed, the module mymodule is searched in a given list of directories. This list includes a list of installation-dependent default path (e.g., /usr/lib/python) as well as the list of directories specified by the environment variable PYTHONPATH.

Python搜索的目录列表由sys.path变量给出

In [1]: import sys
In [2]: sys.path
Out[2]:
['',
'/home/varoquau/.local/bin',
'/usr/lib/python2.7',
'/home/varoquau/.local/lib/python2.7/site-packages',
'/usr/lib/python2.7/dist-packages',
'/usr/local/lib/python2.7/dist-packages',
...]

模块必须位于搜索路径中,因此你可以:

  • write your own modules within directories already defined in the search path (e.g. $HOME/.local/lib/python2.7/dist-packages). 你可以使用符号链接(在Linux上)将代码保留在其他位置。

  • modify the environment variable PYTHONPATH to include the directories containing the user-defined modules.

    在Linux/Unix上,将以下行添加到启动时由shell读取的文件(例如/etc/profile,.profile)

    export PYTHONPATH=$PYTHONPATH:/home/emma/user_defined_modules
    

    On Windows, http://support.microsoft.com/kb/310519 explains how to handle environment variables.

  • or modify the sys.path variable itself within a Python script.

    import sys
    
    new_path = '/home/emma/user_defined_modules'
    if new_path not in sys.path:
    sys.path.append(new_path)

    This method is not very robust, however, because it makes the code less portable (user-dependent path) and because you have to add the directory to your sys.path each time you want to import from a module in this directory.

另见

See https://docs.python.org/tutorial/modules.html for more information about modules.

1.2.5.6.

A directory that contains many modules is called a package. 包是具有子模块的模块(它们自身可以具有子模块等)。A special file called __init__.py (which may be empty) tells Python that the directory is a Python package, from which modules can be imported.

$ ls
cluster/ io/ README.txt@ stsci/
__config__.py@ LATEST.txt@ setup.py@ __svn_version__.py@
__config__.pyc lib/ setup.pyc __svn_version__.pyc
constants/ linalg/ setupscons.py@ THANKS.txt@
fftpack/ linsolve/ setupscons.pyc TOCHANGE.txt@
__init__.py@ maxentropy/ signal/ version.py@
__init__.pyc misc/ sparse/ version.pyc
INSTALL.txt@ ndimage/ spatial/ weave/
integrate/ odr/ special/
interpolate/ optimize/ stats/
$ cd ndimage
$ ls
doccer.py@ fourier.pyc interpolation.py@ morphology.pyc setup.pyc
doccer.pyc info.py@ interpolation.pyc _nd_image.so
setupscons.py@
filters.py@ info.pyc measurements.py@ _ni_support.py@
setupscons.pyc
filters.pyc __init__.py@ measurements.pyc _ni_support.pyc tests/
fourier.py@ __init__.pyc morphology.py@ setup.py@

From Ipython:

In [1]: import scipy
In [2]: scipy.__file__
Out[2]: '/usr/lib/python2.6/dist-packages/scipy/__init__.pyc'
In [3]: import scipy.version
In [4]: scipy.version.version
Out[4]: '0.7.0'
In [5]: import scipy.ndimage.morphology
In [6]: from scipy.ndimage import morphology
In [17]: morphology.binary_dilation?
Type: function
Base Class: <type 'function'>
String Form: <function binary_dilation at 0x9bedd84>
Namespace: Interactive
File: /usr/lib/python2.6/dist-packages/scipy/ndimage/morphology.py
Definition: morphology.binary_dilation(input, structure=None,
iterations=1, mask=None, output=None, border_value=0, origin=0,
brute_force=False)
Docstring:
Multi-dimensional binary dilation with the given structure.
An output array can optionally be provided. The origin parameter
controls the placement of the filter. If no structuring element is
provided an element is generated with a squared connectivity equal
to one. The dilation operation is repeated iterations times. If
iterations is less than 1, the dilation is repeated until the
result does not change anymore. If a mask is given, only those
elements with a true value at the corresponding mask element are
modified at each iteration.

1.2.5.7. 良好做法

  • Use meaningful object names

  • Indentation: no choice!

    Indenting is compulsory in Python! Every command block following a colon bears an additional indentation level with respect to the previous line with a colon. 因此,def f():while:之后必须有一个缩进。在这样的逻辑块的末尾,减小缩进深度(并且如果输入新块则重新增加它)。

    严格遵守缩进是消除其他语言描述逻辑块的{;字符的代价。Improper indentation leads to errors such as

    ------------------------------------------------------------
    
    IndentationError: unexpected indent (test.py, line 2)

    All this indentation business can be a bit confusing in the beginning. 然而,有了清晰的缩进,并且没有额外的字符,得到的代码与其他语言相比非常好阅读。

  • 缩进深度:在文本编辑器中,你可以选择缩进任何正数空格(1、2、3、4...)。However, it is considered good practice to indent with 4 spaces. 你可以配置编辑器将Tab键映射到4个空格的缩进。在Python(x,y)中,编辑器已经以这种方式配置。

  • 样式指南

    长行:你不应该写出超过(例如)80个字符的超长行。Long lines can be broken with the \ character

    >>> long_line = "Here is a very very long line \
    
    ... that we break in two parts."

    Spaces

    编写的代码具有很好的空格:在逗号后、算术运算符周围放置空格等。:

    >>> a = 1 # yes
    
    >>> a=1 # too cramped

    编写“漂亮”代码的规则(更重要的是使用与其他人相同的约定!)are given in the Style Guide for Python Code.


Quick read

如果你想第一次快速通过Scipy讲座来学习生态系统,你可以直接跳到下一章:NumPy:创建和操作数值数据

本章的其余部分对于本简介的后面的内容不是必要的。But be sure to come back and finish this chapter later.