12.4. zipfile和 ZIP 压缩文档打交道

版本1.6新增.

源代码: Lib/zipfile.py


ZIP文件格式是目前一种常见通用的压缩标准格式。该模块提供了创建、读取、写入、追加、和列出zip文件的方法。更进一步地了解这个格式,可以参照PKZIP Application Note.

目前该模块还不能处理多磁盘ZIP文件。 它可以处理利用ZIP64 扩展的ZIP文件(也就是那些超过4Gb 大小的ZIP 文件)。支持加密ZIP文件的解压,但不能创建加密文件。 因为使用的是python而不是C语言,解密速度非常慢。

模块定义了以下方法:

exception zipfile.BadZipfile

损坏的zip文件将抛出错误 (之前版本的名字: zipfile.error).

exception zipfile.LargeZipFile

抛出错误,由于zip文件需要zip64功能但未启用。

class zipfile.ZipFile

这是一个关于ZIP文件读写的类。 参考 ZipFile Objects 了解更多构造细节

class zipfile.PyZipFile

创建包含Python类库的ZIP压缩文件。

class zipfile.ZipInfo([filename[, date_time]])

使用该类来表示对压缩文档成员的信息。 通过ZipFile 的getinfo() and infolist()方法返回类的实例。 大多数用户并不需要创建这些 zipfile模块,但只能使用该模块获得压缩文档的成员信息。filename 应该是存档成员的全名,并且date_time 应该是包含六个字段描述最后修改文件时间的元组;ZipInfo Objects提供了描述字段部分。

zipfile.is_zipfile(filename)

如果 filename是有效的ZIP文件格式返回 True,否则返回Falsefilename 可能是也可以是file或者file-like项目.

版本2.7更改: 对file和file-like对象提供支持。

zipfile.ZIP_STORED

一个解压存档的数值常数。

zipfile.ZIP_DEFLATED

一个数值常数,提供了惯用的ZIP压缩方法。需要zlib 模块支持。目前不支持其他压缩方法。

其它

PKZIP Application Note
Documentation on the ZIP file format by Phil Katz, the creator of the format and algorithms used.
Info-ZIP Home Page
Information about the Info-ZIP project’s ZIP archive programs and development libraries.

12.4.1. ZipFile 对象

class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])

打开ZIP文件,其 file可以是一个file (a string)或file-like对象的路径。mode参数:'r'读取现有的文件,'w'截断并写入一个新的文件,'a'追加到现有文件。如果mode'a'并且fileZIP文件已经存在,那么添加到文件。 如果file不存在ZIP file, 那么将新建ZIP文档添加那个文件中。 这意味着增加一个ZIP文件到另一个文件(如:python.exe)。

版本2.6更改: 如果mode 是a ,文件根本不存在,则创建它。

compression是压缩方法当使用写作文档时, 应该是ZIP_STORED或者ZIP_DEFLATED不能识别的值将引发RuntimeError错误。如果ZIP_DEFLATED 已经具体说明,但是zlib模块不可用,也会引发RuntimeError默认使用 ZIP_STORED. 当压缩文件大于2 GB时,如果allowZip64True将使用zip64扩展ZIP文件。当需要zip64 zip文件扩展,如果是假的(默认),zipfile将抛出一个异常。 默认情况下禁用ZIP64扩展,因为UNIX默认的 zip和 unzip命令(the InfoZIP utilities) 不支持这些扩展。

Changed in version 2.7.1: If the file is created with mode 'a' or 'w' and then closed without adding any files to the archive, the appropriate ZIP structures for an empty archive will be written to the file.

ZipFile is also a context manager and therefore supports the with statement. In the example, myzip is closed after the with statement’s suite is finished—even if an exception occurs:

with ZipFile('spam.zip', 'w') as myzip:
    myzip.write('eggs.txt')

New in version 2.7: Added the ability to use ZipFile as a context manager.

ZipFile.close()

关闭压缩文件你必须在退出程序前调用close()否则文件不会被写进去。

ZipFile.getinfo(name)

返回一个关于文档信息的ZipInfo对象成员的name调用getinfo()如果名字不在当前文档里抛出KeyError

ZipFile.infolist()

为归档的每个成员返回一个包含ZipInfo对象的列表。如果已打开现有存档,则对象与磁盘上实际ZIP文件中的条目的顺序相同。

ZipFile.namelist()

按名称返回归档成员列表。

ZipFile.open(name[, mode[, pwd]])

从档案中提取成员作为类似文件的对象(ZipExtFile)。名称是存档文件的名称,或ZipInfo对象。The mode parameter, if included, must be one of the following: 'r' (the default), 'U', or 'rU'. 选择'U''rU'将在只读对象中启用通用换行符支持。pwd是用于加密文件的密码。在一个关闭的ZipFile上调用open()会引发一个RuntimeError

Note

The file-like object is read-only and provides the following methods: read(), readline(), readlines(), __iter__(), next().

Note

If the ZipFile was created by passing in a file-like object as the first argument to the constructor, then the object returned by open() shares the ZipFile’s file pointer. Under these circumstances, the object returned by open() should not be used after any additional operations are performed on the ZipFile object. If the ZipFile was created by passing in a string (the filename) as the first argument to the constructor, then open() will create a new file object that will be held by the ZipExtFile, allowing it to operate independently of the ZipFile.

Note

The open(), read() and extract() methods can take a filename or a ZipInfo object. You will appreciate this when trying to read a ZIP file that contains members with duplicate names.

New in version 2.6.

ZipFile.extract(member[, path[, pwd]])

将成员从归档中提取到当前工作目录;member must be its full name or a ZipInfo object). Its file information is extracted as accurately as possible. path specifies a different directory to extract to. member can be a filename or a ZipInfo object. pwd is the password used for encrypted files.

New in version 2.6.

Note

If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, e.g.: ///foo/bar becomes foo/bar on Unix, and C:fooar becomes fooar on Windows. And all ".." components in a member filename will be removed, e.g.: ../../foo../../ba..r becomes foo../ba..r. On Windows illegal characters (:, <, >, |, ", ?, and *) replaced by underscore (_).

ZipFile.extractall([path[, members[, pwd]]])

将所有成员从存档提取到当前工作目录。path specifies a different directory to extract to. members is optional and must be a subset of the list returned by namelist(). pwd is the password used for encrypted files.

Warning

Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of path, e.g. members that have absolute filenames starting with "/" or filenames with two dots "..".

Changed in version 2.7.4: The zipfile module attempts to prevent that. See extract() note.

New in version 2.6.

ZipFile.printdir()

将归档的目录打印到sys.stdout

ZipFile.setpassword(pwd)

pwd设置为默认密码以提取加密文件。

New in version 2.6.

ZipFile.read(name[, pwd])

返回归档中文件名称的字节。名称是存档文件的名称,或ZipInfo对象。存档必须打开才能读取或附加。pwd is the password used for encrypted files and, if specified, it will override the default password set with setpassword(). Calling read() on a closed ZipFile will raise a RuntimeError.

Changed in version 2.6: pwd was added, and name can now be a ZipInfo object.

ZipFile.testzip()

读取归档中的所有文件,并检查其CRC和文件头。返回第一个坏文件的名称,否则返回在一个关闭的ZipFile上调用testzip()会引发一个RuntimeError

ZipFile.write(filename[, arcname[, compress_type]])

将名为filename的文件写入存档,给它档案名称arcname(默认情况下,这将与filename相同,但没有一个驱动器盘符和导向路径分隔符被删除)。如果给定,compress_type将覆盖压缩参数给新条目的构造函数的值。在使用模式创建的ZipFile中,必须打开模式'w''a' - 调用write() r'将引发一个RuntimeError在一个关闭的ZipFile上调用write()会引发一个RuntimeError

Note

There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write(). WinZip interprets all file names as encoded in CP437, also known as DOS Latin.

Note

Archive names should be relative to the archive root, that is, they should not start with a path separator.

Note

If arcname (or filename, if arcname is not given) contains a null byte, the name of the file in the archive will be truncated at the null byte.

ZipFile.writestr(zinfo_or_arcname, bytes[, compress_type])

将字符串字节写入存档;zinfo_or_arcname is either the file name it will be given in the archive, or a ZipInfo instance. If it’s an instance, at least the filename, date, and time must be given. If it’s a name, the date and time is set to the current date and time. The archive must be opened with mode 'w' or 'a' – calling writestr() on a ZipFile created with mode 'r' will raise a RuntimeError. Calling writestr() on a closed ZipFile will raise a RuntimeError.

If given, compress_type overrides the value given for the compression parameter to the constructor for the new entry, or in the zinfo_or_arcname (if that is a ZipInfo instance).

Note

When passing a ZipInfo instance as the zinfo_or_arcname parameter, the compression method used will be that specified in the compress_type member of the given ZipInfo instance. By default, the ZipInfo constructor sets this member to ZIP_STORED.

Changed in version 2.7: The compress_type argument.

The following data attributes are also available:

ZipFile.debug

调试输出的级别使用。This may be set from 0 (the default, no output) to 3 (the most output). Debugging information is written to sys.stdout.

ZipFile.comment

与ZIP文件关联的注释文本。If assigning a comment to a ZipFile instance created with mode ‘a’ or ‘w’, this should be a string no longer than 65535 bytes. Comments longer than this will be truncated in the written archive when close() is called.

12.4.2. PyZipFile 对象

The PyZipFile constructor takes the same parameters as the ZipFile constructor. Instances have one method in addition to those of ZipFile objects.

PyZipFile.writepy(pathname[, basename])

Search for files *.py and add the corresponding file to the archive. The corresponding file is a *.pyo file if available, else a *.pyc file, compiling if necessary. If the pathname is a file, the filename must end with .py, and just the (corresponding *.py[co]) file is added at the top level (no path information). If the pathname is a file that does not end with .py, a RuntimeError will be raised. If it is a directory, and the directory is not a package directory, then all the files *.py[co] are added at the top level. If the directory is a package directory, then all *.py[co] are added under the package name as a file path, and if any subdirectories are package directories, all of these are added recursively. basename is intended for internal use only. The writepy() method makes archives with file names like this:

string.pyc                                # Top level name
test/__init__.pyc                         # Package directory
test/test_support.pyc                          # Module test.test_support
test/bogus/__init__.pyc                   # Subpackage directory
test/bogus/myfile.pyc                     # Submodule test.bogus.myfile

12.4.3. ZipInfo 对象

Instances of the ZipInfo class are returned by the getinfo() and infolist() methods of ZipFile objects. Each object stores information about a single member of the ZIP archive.

Instances have the following attributes:

ZipInfo.filename

Name of the file in the archive.

ZipInfo.date_time

The time and date of the last modification to the archive member. This is a tuple of six values:

Index Value
0 Year (>= 1980)
1 Month (one-based)
2 Day of month (one-based)
3 Hours (zero-based)
4 Minutes (zero-based)
5 Seconds (zero-based)

Note

The ZIP file format does not support timestamps before 1980.

ZipInfo.compress_type

Type of compression for the archive member.

ZipInfo.comment

Comment for the individual archive member.

ZipInfo.extra

Expansion field data. The PKZIP Application Note contains some comments on the internal structure of the data contained in this string.

ZipInfo.create_system

System which created ZIP archive.

ZipInfo.create_version

PKZIP version which created ZIP archive.

ZipInfo.extract_version

PKZIP version needed to extract archive.

ZipInfo.reserved

Must be zero.

ZipInfo.flag_bits

ZIP flag bits.

ZipInfo.volume

Volume number of file header.

ZipInfo.internal_attr

Internal attributes.

ZipInfo.external_attr

External file attributes.

ZipInfo.header_offset

Byte offset to the file header.

ZipInfo.CRC

CRC-32 of the uncompressed file.

ZipInfo.compress_size

Size of the compressed data.

ZipInfo.file_size

Size of the uncompressed file.