Wrapping an Unbounded Iterator to Restrict Its Output

Credit: Tom Good

Problem

You need to filter the sequence produced by a potentially unbounded Python 2.2 iterator or limit the sequence length by a condition.

Solution

Python 2.2 generators are suitable for wrapping other generators (or other kinds of iterators) and tweaking their output—for example, by limiting the output’s length:

from _ _future_ _ import generators

def genWhile(g, condition):
    """ Run generator g, stopping when condition(g.next(  )) is false. condition
    can be any callable. genWhile returns an iterator. """
    g = iter(g)
    while 1:
        next = g.next(  )
        if condition(next):
            yield next
        else:
            return

def take(n, g):
    """ A subiterator limited to the first n items of g's sequence """
    g = iter(g)
    for i in range(n): yield g.next(  )

def drop(n, g):
    """ A subiterator removing the first n items from g's sequence """
    g = iter(g)
    for i in range(n): g.next(  )
    while 1: yield g.next(  )

# an example of an unbounded sequence generator
def genEven(  ):
    x = 0
    while 1:
        x += 2
        yield x

def main(  ):
    print [x for x in genWhile(genEven(  ), lambda x: x<12)]
    print [x for x in take(5, genEven(  ))]
    print [x for x in take(5, drop(5, genEven(  )))]

Discussion

With Python 2.2 and later, you can make iterators that return unbounded output (for example, see Recipe 17.11). By creating a wrapper generator that runs another iterator, you can restrict the resulting sequence to a defined subset. The g=iter(g) idiom at the start of each wrapper in this recipe ensures that you can polymorphically wrap sequences as well as iterators (remember, all generators return iterators, but not all iterators come from generators). The iter built-in function, new in Python 2.2, can be applied to any sequence (in which case, it yields an iterator on that sequence), to any iterator (in which it yields the same iterator on which it was called), or to user-defined objects whose classes define a special method _ _iter_ _ (in this case, iter(x) is the same as x._ _iter_ _).

The genEven generator, given in the recipe as an example, generates all positive even numbers. To see the positive even numbers less than 12, it would be tempting to write something like:

[x for x in genEven(  ) if x < 12]

But this approach does not work. A list-comprehension construct cannot know that in this specific case, once x becomes greater than 12, it will never become less than 12 again. So the list comprehension would keep looking, in case an item less than 12 appears, until genEven terminates (i.e., it would keep looking, and looping, forever). Instead, we can use the genWhile wrapper to get a similar effect, as shown in the main function of the recipe.

The take and drop wrappers are also quite useful, and are patterned on the homonymous functions of Haskell, a language whose semantics are all defined in terms of lazy evaluation. Iterators are Python’s systematic foray into the lazy evaluation field. Previous releases of Python had some ad hoc lazy-evaluation cases, such as xrange and xreadlines, but no systematic conceptual framework for them. Note that take limits sequence length, but drop doesn’t, and drop(n, g) is also an unlimited-sequence iterator, if g is.

Also, each of these wrappers can be freely used on unbounded iterators (such as those iter(s) gives from any sequence s). If any call to g.next from inside a wrapper raises a StopIteration, the exception simply propagates and thus stops the iteration of the wrapper without fuss. So, for example, take(n, g) does not ensure that it yields exactly n items but, rather, at most n.

See Also

Recipe 17.11 shows how to make unbounded iterators; Recipe 17.13 for a more systematic approach to wrapping generators and other iterators.

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.