Journeyman Python:

通过阅读精心设计的软件代码来学习Python

_images/book-cover-jpy.png

学习任何编程语言和技术的最佳方法是阅读其他人的代码。 有什么更好的学习方法可以阅读一些设计最好的开源Python库的代码。 在本书中,我们将阅读Django,Flask和Pandas中的选定代码。

我们将学习诸如装饰器,上下文管理器,生成器,迭代器,itertool以及Django,Flask和Pandas使用的常见设计模式等主题。

How Pandas uses first class functions

Iterators, slicing and generators in SQLAlchemy

How Django uses decorators to simplify apis

Understanding python magic methods by reading Django queryset source code.

What are magic methods?

Django查询集很棒。 我们每天都使用它们,但很少考虑它们给我们提供的精彩API。 只是queysets的一些惊人的属性

  • 您可以从中获取切片queryset [i:j],只从DB中提取所需的对象。
  • 您可以查找特定对象queryset [i],仅从DB中提取所需对象。
  • You can iterate over them, for user in users_queryset, as if they were a list.
  • You can AND or OR them and they apply the criteria at the SQL level.
  • You can use them like a boolean, if users_queryset: users_queryset.update(first_name="Batman")
  • You can pickle and unpickle them, even when the individual istances may not be.
  • You can get a useful representation of the queryset in python cli, or ipython. Even if the queryset consists of 1000s of records, only first 20 records will be printed and shown.

Querysets通过实现Python魔术方法(即dunder方法)获得所有这些属性。 那你为什么需要这些神奇的dunder方法呢? 因为它们使api大大清洁使用。

It is more intutive to say, if users_queryset: users_queryset.do_something() than if users_queryset.as_boolean: users_queryset.do_something(). It is more intutive to say queryset_1 & queryset_2 rather than queryse_1.do_and(queryset_2)

魔术方法是由类实现的方法,这些类对Python解释器具有特殊意义。 它们始终以__开头,有时称为dunder方法。 (Dunder == double underscore).

查询和相关类实现以下方法以获得我们上面列出的属性。

  • __getitem__: For queryset[i:j] and queryset[i]
  • __iter__ for for user in users_queryset
  • __and__ and __or__ for queryset_1 & queryset_2 and queryset_1 | queryset_2
  • __bool__ to use them like a boolean
  • __getstate__ and __setstate__ to pickle and unpickle them
  • __repr__ to get a useful representation and to limit the DB hit

We will look at how Django 2.0 does it.

Implementing __getitem__

The code looks like this:

def __getitem__(self, k):
    """Retrieve an item or slice from the set of results."""
    if not isinstance(k, (int, slice)):
        raise TypeError
    assert ((not isinstance(k, slice) and (k >= 0)) or
            (isinstance(k, slice) and (k.start is None or k.start >= 0) and
             (k.stop is None or k.stop >= 0))), \
        "Negative indexing is not supported."

    if self._result_cache is not None:
        return self._result_cache[k]

    if isinstance(k, slice):
        qs = self._chain()
        if k.start is not None:
            start = int(k.start)
        else:
            start = None
        if k.stop is not None:
            stop = int(k.stop)
        else:
            stop = None
        qs.query.set_limits(start, stop)
        return list(qs)[::k.step] if k.step else qs

这里有很多内容,但每个if块都很简单。

  • 在块的第一个中,我们确保切片具有可靠的值。
  • In second block, if _result_cache is filled, aka the queryset has been evaluated, we return the slice from the cache and skip hitting the db again.
  • If the _result_cache is not filled, we qs.query.set_limits(start, stop) which sets the limit and offset in sql.

Implementing __iter__

def __iter__(self):
    # ...
    self._fetch_all()
    return iter(self._result_cache)

Pretty strightforward, we populate the data then use builtin iter to return an iterator.

It is also instructive to look at FlatValuesListIterable.__iter__ which uses yield to implment __iter__.

class FlatValuesListIterable(BaseIterable):
    """
    Iterable returned by QuerySet.values_list(flat=True) that yields single
    values.
    """

    def __iter__(self):
        queryset = self.queryset
        compiler = queryset.query.get_compiler(queryset.db)
        for row in compiler.results_iter(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size):
            yield row[0]

Implementing __and__ and __or__

The code looks like this:

def __and__(self, other):
    self._merge_sanity_check(other)
    if isinstance(other, EmptyQuerySet):
        return other
    if isinstance(self, EmptyQuerySet):
        return self
    combined = self._chain()
    combined._merge_known_related_objects(other)
    combined.query.combine(other.query, sql.AND)
    return combined

We d some sanity checks on the querysets, return early if one of the querysets is empty then apply SQL or using combined.query.combine(other.query, sql.AND). The __or__ is essentially same except the SQL is changed using combined.query.combine(other.query, sql.OR)

Implementing __bool__

The code looks like this:

def __bool__(self):
    self._fetch_all()
    return bool(self._result_cache)

Pretty straightforward, _fetch_all() ensures that the queryset is evaluated, and _result_cache is filled. We then return the boolean equivalent of _result_cache, which means if there are any records, you will get a True.

Implementing __getstate__ and __setstate__

__getstate__ and __setstate__ look like this:

def __getstate__(self):
    # Force the cache to be fully populated.
    self._fetch_all()
    return {**self.__dict__, DJANGO_VERSION_PICKLE_KEY: get_version()}

def __setstate__(self, state):
    msg = None
    pickled_version = state.get(DJANGO_VERSION_PICKLE_KEY)
    if pickled_version:
        current_version = get_version()
        if current_version != pickled_version:
            msg = (
                "Pickled queryset instance's Django version %s does not "
                "match the current version %s." % (pickled_version, current_version)
            )
    else:
        msg = "Pickled queryset instance's Django version is not specified."

    if msg:
        warnings.warn(msg, RuntimeWarning, stacklevel=2)

    self.__dict__.update(state)

While pickling, we ensure data is populated, then use self.__dict__ to get queryset representation, and return it along with Django version. While unpickling, __setstate__ ensures that a warning is raised when pickled querysets are used across Django versions.

On a related note, {**self.__dict__, DJANGO_VERSION_PICKLE_KEY: get_version()}, shows why you should move to Python 3. This syntax for merging dictionaries doesn’t work in Python2.

Implementing __repr__

The code for __repr__, look like this

def __repr__(self):
    data = list(self[:REPR_OUTPUT_SIZE + 1])
    if len(data) > REPR_OUTPUT_SIZE:
        data[-1] = "...(remaining elements truncated)..."
    return '<%s %r>' % (self.__class__.__name__, data)

This is straightforward, but has a few nice tricks worth looking at.

self[:REPR_OUTPUT_SIZE + 1] does slicing, which because we implemented __getitem__, does ... limit ... offset ... query.

REPR_OUTPUT_SIZE ensures that we don’t pull in the wholeyset to display data, but pulls up REPR_OUTPUT_SIZE + 1 records. On next line len(data) > REPR_OUTPUT_SIZE allows us the check if there were more records without hitting the DB.

Final thoughts

Magic, dunder methods provide a clean straightforward way to provide a clean api to your classes. Unlike their name, they don’t have any hidden magic and should be used where it makes sense.

Understanding Python context managers by reading Django source code

Django comes with a bunch of useful context managers. We will read their source code to find what context managers can do and how to implement them including some best parctices.

The three I use most are

  • transactions.atomic - To get a atomic transaction block
  • TestCase.settings - To change settings during a test run
  • connection.cursor - TO get a raw cursor

connection.cursor Is generally implemented in the actual DB backends such a psycopg2, so we will focus on transactions.atomic, TestCase.settings and a few other contextmanagers.

What is a context manager?

Context managers are a code patterns for

  • Step 1: Do something
  • Step 2: Do something else
  • Step 3: Final step, this step must be guaranteed to run.

For example when you say

with transaction.atomic():
    # This code executes inside a transaction.
    do_more_stuff()

What you really want is:

  • create a savepoint
  • do_more_stuff()
  • Commit or rollback the savepoint

Similarly, when you say (Inside a django.test.TestCase)

with self.settings(LOGIN_URL='/other/login/'):
    response = self.client.get('/sekrit/')

What you want is

  • Change settings to LOGIN_URL=’/other/login/’
  • response = self.client.get('/sekrit/'), assert something with on response with the changed setting.
  • Change settings back to what existed at start.

A context manager povides a clean api to enforce this three step workflow.

Some non-Django context managers

The most common context manager is

with open('alice-in-wonderland.txt', 'rw') as infile:
    line = infile.readlines()
    do_something_more()

If you did not have open contextmanager, you would need to do the below everytime, because you need to ensure do_something_more() is called.

try:
    infile = open('alice-in-wonderland.txt', 'r')
    line = infile.readlines()
    do_something_more()
finally:
    infile.close()

Another common use is

a_lock = threading.Lock()

with a_lock:
    do_something_more()

And without a context manager, this would have been.

a_lock.acquire()
try:
    do_something_more()
finally:
    a_lock.release()

So at a high level, context managers are syntactic sugar for ``try: … finally …`` block. This is important, so I will repeat context managers are syntactic sugar for ``try: … finally …`` block

Implementing context managers

Context managers can be implemented as a class with two required methods and one optional __init__

  • __enter__: what to do when the context starts
  • __exit__: what to do when the context ends
  • __init__: if your context manager requires arguments

Alternatively, you can use contextlib.contextmanager with yield statements to get a context manager. We will see an example in the next section.

A simple Django context manager

In django/tests/backends/mysql/tests.py, Django implements a very simple context manager.

@contextmanager
def get_connection():
    new_connection = connection.copy()
    yield new_connection
    new_connection.close()

And then uses it like this:

def test_setting_isolation_level(self):
    with get_connection() as new_connection:
        new_connection.settings_dict['OPTIONS']['isolation_level'] = self.other_isolation_level
        self.assertEqual(
            self.get_isolation_level(new_connection),
            self.isolation_values[self.other_isolation_level]
        )

There is some code here which doesn’t immediately concern us, let us just focus on with get_connection() as new_connection:

Using @contextmanager, here is what happened:

  • The part before yield new_connection = connection.copy() handles the context setup.
  • The yield new_connection part allows using new_connection as as new_connection.
  • The part after yield new_connection.close() handle context teardown.

Lets look at the TestCase.settings next, which uses the __enter__ - __exit__ protocol.

Implementing Testcase.settings

Testcase.settings is implemented as

def settings(self, **kwargs):
    """
    A context manager that temporarily sets a setting and reverts to the
    original value when exiting the context.
    """
    return override_settings(**kwargs)

There is a bit of class hierarchy to jup through which takes us from

Testcase.settingsoverride_settingsTestContextDecorator

Skipping the part we don’t care about, we get

class TestContextDecorator:
    # ...
    def enable(self):
        raise NotImplementedError

    def disable(self):
        raise NotImplementedError

    def __enter__(self):
        return self.enable()

    def __exit__(self, exc_type, exc_value, traceback):
        self.disable()

然后override_settings实现.enable.disable

class override_settings(TestContextDecorator):
    # ...
    def enable(self):
        # Keep this code at the beginning to leave the settings unchanged
        # in case it raises an exception because INSTALLED_APPS is invalid.
        if 'INSTALLED_APPS' in self.options:
            try:
                apps.set_installed_apps(self.options['INSTALLED_APPS'])
            except Exception:
                apps.unset_installed_apps()
                raise
        override = UserSettingsHolder(settings._wrapped)
        for key, new_value in self.options.items():
            setattr(override, key, new_value)
        self.wrapped = settings._wrapped
        settings._wrapped = override
        for key, new_value in self.options.items():
            setting_changed.send(sender=settings._wrapped.__class__,
                                 setting=key, value=new_value, enter=True)

    def disable(self):
        if 'INSTALLED_APPS' in self.options:
            apps.unset_installed_apps()
        settings._wrapped = self.wrapped
        del self.wrapped
        for key in self.options:
            new_value = getattr(settings, key, None)
            setting_changed.send(sender=settings._wrapped.__class__,
                                 setting=key, value=new_value, enter=False)

There is a lot of boiler plate here which is interesting, but skipping the state management we see

class override_settings(TestContextDecorator):
    # ...
    def enable(self):
        # ...
        # This gets called by __enter__
        for key, new_value in self.options.items():
            setattr(override, key, new_value)
        self.wrapped = settings._wrapped
        settings._wrapped = override
        for key, new_value in self.options.items():
            setting_changed.send(sender=settings._wrapped.__class__,
                                 setting=key, value=new_value, enter=True)

    def disable(self):
        # ...
        # This gets called by __exit__
        for key in self.options:
            new_value = getattr(settings, key, None)
            setting_changed.send(sender=settings._wrapped.__class__,
                                 setting=key, value=new_value, enter=False)

实现上下文管理器也可以用作装饰器。

When you can say with transaction.atomic():,你可以通过使用它作为装饰器获得相同的效果。

@transaction.atomic
def do_something():
    # this must run in a transaction
    # ...

将上下文管理器用作装饰器是一种常见的模式,Django对原子也是如此。 contextlib.ContextDecorator makes this straightforward.

# class Atomic is implemented later
def atomic(using=None, savepoint=True):
    # Bare decorator: @atomic -- although the first argument is called
    # `using`, it's actually the function being decorated.
    if callable(using):
        return Atomic(DEFAULT_DB_ALIAS, savepoint)(using)
    # Decorator: @atomic(...) or context manager: with atomic(...): ...
    else:
        return Atomic(using, savepoint)

class Atomic(ContextDecorator):
    # There is a lot of complicated corner cases and error handling.
    # See the gory details in django/django/db/transaction.py
    def __init__(self, using, savepoint):
        self.using = using
        self.savepoint = savepoint

    def __enter__(self):
        connection = get_connection(self.using)
        # ...
        # sid = connection.savepoint()
        # connection.savepoint_ids.append(sid)

    def __exit__(self, exc_type, exc_value, traceback):
        # Skip the gory details
        # ...
        sid = connection.savepoint_ids.pop()
        if sid is not None:
            try:
                connection.savepoint_commit(sid)
            except DatabaseError:
                connection.savepoint_rollback(sid)

Final thoughts

上下文管理器为powerso构造提供了一个简单的API。 尽管它们只是语法糖,但它们构成了一个优秀的API,并且与contextlib模块一起易于实现。

Indices and tables