BucketIterator

BucketIterator automatically shuffles and buckets the input sequences into sequences of similar length.

To enable batch processing, we need the input sequences in a batch that's of identical length. This is done by padding the smaller input sequences to the length of the longest sequence in batch. Check out the following code:

[ [3, 15, 2, 7], 
  [4, 1], 
  [5, 5, 6, 8, 1] ]

This will need to be padded to become the following:

[ [3, 15, 2, 7, 0],
  [4, 1, 0, 0, 0],
  [5, 5, 6, 8, 1] ]

Additionally, the padding operation is most efficient when the sequences are of similar lengths. The BucketIterator does all of this behind the scenes. This is what makes it an extremely powerful abstraction for text processing.

We want the bucket sorting ...

Get Natural Language Processing with Python Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.