Spiders Contracts

New in version 0.15.

这是一个新引入(Scrapy 0.15)的特性,在后续的功能/API更新中可能有所改变。查看release notes来了解更新。

Testing spiders can get particularly annoying and while nothing prevents you from writing unit tests the task gets cumbersome quickly. Scrapy offers an integrated way of testing your spiders by the means of contracts.

This allows you to test each callback of your spider by hardcoding a sample url and check various constraints for how the callback processes the response. Each contract is prefixed with an @ and included in the docstring. See the following example:

def parse(self, response):
    """ This function parses a sample response. Some contracts are mingled
    with this docstring.

    @url http://www.amazon.com/s?field-keywords=selfish+gene
    @returns items 1 16
    @returns requests 0 0
    @scrapes Title Author Year Price
    """

This callback is tested using three built-in contracts:

class scrapy.contracts.default.UrlContract

This contract (@url) sets the sample url used when checking other contract conditions for this spider. 该contract是必须的。所有缺失该contract的回调函数在测试时将会被忽略:

@url url
class scrapy.contracts.default.ReturnsContract

This contract (@returns) sets lower and upper bounds for the items and requests returned by the spider. The upper bound is optional:

@returns item(s)|request(s) [min [max]]
class scrapy.contracts.default.ScrapesContract

This contract (@scrapes) checks that all the items returned by the callback have the specified fields:

@scrapes field_1 field_2 ...

Use the check command to run the contract checks.

自定义Contracts

If you find you need more power than the built-in scrapy contracts you can create and load your own contracts in the project by using the SPIDER_CONTRACTS setting:

SPIDER_CONTRACTS = {
    'myproject.contracts.ResponseCheck': 10,
    'myproject.contracts.ItemValidate': 10,
}

Each contract must inherit from scrapy.contracts.Contract and can override three methods:

class scrapy.contracts.Contract(method, *args)
Parameters:
  • method (function) – callback function to which the contract is associated
  • args (list) – 传入docstring的(以空格区分的)argument列表
adjust_request_args(args)

接收一个dict作为参数,该参数包含了所有 Request对象 参数的默认值。Must return the same or a modified version of it.

pre_process(response)

This allows hooking in various checks on the response received from the sample request, before it’s being passed to the callback.

post_process(output)

This allows processing the output of the callback. Iterators are converted listified before being passed to this hook.

下面是一个样例contract,在response接收时检查了是否有自定义header。Raise scrapy.exceptions.ContractFail in order to get the failures pretty printed:

from scrapy.contracts import Contract
from scrapy.exceptions import ContractFail

class HasHeaderContract(Contract):
    """ Demo contract which checks the presence of a custom header
        @has_header X-CustomHeader
    """

    name = 'has_header'

    def pre_process(self, response):
        for header in self.args:
            if header not in response.headers:
                raise ContractFail('X-CustomHeader not present')