19.5. XML 处理模块

Python处理XML的接口分组在xml包中。

Warning

XML模块不能防止错误或恶意构造的数据。如果您需要解析不可信或未认证的数据,请参阅XML漏洞

重要的是要注意,xml包中的模块要求至少有一个可与SAX兼容的XML解析器。Expat解析器包含在Python中,所以xml.parsers.expat模块将始终可用。

xml.domxml.sax包的文档是DOM和SAX接口的Python绑定的定义。

XML处理子模块是:

19.6. XML漏洞

XML处理模块不能防止恶意构造的数据。攻击者可以滥用漏洞。拒绝服务攻击,访问本地文件,生成到其他机器的网络连接,或防范防火墙。对XML滥用的攻击不熟悉的功能,如内联DTD(文档类型定义)与实体。

下表列出了已知攻击的概述,以及各种模块是否易受攻击。

kindsaxetreeminidompulldomxmlrpc
billion laughsYesYesYesYesYes
quadratic blowupYesYesYesYesYes
external entity expansionYesNo (1)No (2)YesNo (3)
DTD retrievalYesNoNoYesNo
decompression bombNoNoNoNoYes
  1. xml.etree.ElementTree不会扩展外部实体,并在实体发生时引发ParserError。
  2. xml.dom.minidom不扩展外部实体,只需逐字地返回未展开的实体。
  3. xmlrpclib不会扩展外部实体并省略它们。
十亿笑声/指数实体扩张
攻击(也称为指数实体扩展)Billion Laughs使用多层次的嵌套实体。每个实体多次引用另一个实体,最后一个实体定义包含一个小字符串。Eventually the small string is expanded to several gigabytes. The exponential expansion consumes lots of CPU time, too.
quadratic blowup entity expansion
A quadratic blowup attack is similar to a Billion Laughs attack; it abuses entity expansion, too. Instead of nested entities it repeats one large entity with a couple of thousand chars over and over again. The attack isn’t as efficient as the exponential case but it avoids triggering countermeasures of parsers against heavily nested entities.
external entity expansion
Entity declarations can contain more than just text for replacement. They can also point to external resources by public identifiers or system identifiers. System identifiers are standard URIs or can refer to local files. The XML parser retrieves the resource with e.g. HTTP or FTP requests and embeds the content into the XML document.
DTD retrieval
Some XML libraries like Python’s xml.dom.pulldom retrieve document type definitions from remote or local locations. The feature has similar implications as the external entity expansion issue.
decompression bomb
The issue of decompression bombs (aka ZIP bomb) apply to all XML libraries that can parse compressed XML stream like gzipped HTTP streams or LZMA-ed files. For an attacker it can reduce the amount of transmitted data by three magnitudes or more.

The documentation of defusedxml on PyPI has further information about all known attack vectors with examples and references.

19.6.1. defused packages

These external packages are recommended for any code that parses untrusted XML data.

defusedxml is a pure Python package with modified subclasses of all stdlib XML parsers that prevent any potentially malicious operation. The package also ships with example exploits and extended documentation on more XML exploits like xpath injection.

defusedexpat provides a modified libexpat and patched replacement pyexpat extension module with countermeasures against entity expansion DoS attacks. Defusedexpat still allows a sane and configurable amount of entity expansions. The modifications will be merged into future releases of Python.

The workarounds and modifications are not included in patch releases as they break backward compatibility. After all inline DTD and entity expansion are well-defined XML features.