You are previewing Text Processing in Python.
O'Reilly logo
Text Processing in Python

Book Description

Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.

Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.

Here is some of what you will find in thie book:

  • When do I use formal parsers to process structured and semi-structured data? Page 257

  • How do I work with full text indexing? Page 199

  • What patterns in text can be expressed using regular expressions? Page 204

  • How do I find a URL or an email address in text? Page 228

  • How do I process a report with a concrete state machine? Page 274

  • How do I parse, create, and manipulate internet formats? Page 345

  • How do I handle lossless and lossy compression? Page 454

  • How do I find codepoints in Unicode? Page 465



0321112547B05022003

Table of Contents

  1. Copyright
  2. Preface
    1. 0.1. What Is Text Processing?
    2. 0.2. The Philosophy of Text Processing
    3. 0.3. What You'll Need to Use This Book
    4. 0.4. Conventions Used in This Book
    5. 0.5. A Word on Source Code Examples
    6. 0.6. External Resources
      1. 0.6.1. General Resources
      2. 0.6.2. Books
      3. 0.6.3. Software Directories
      4. 0.6.4. Specific Software
  3. Acknowledgments
  4. 1. Python Basics
    1. 1.1. Techniques and Patterns
      1. 1.1.1. Utilizing Higher-Order Functions in Text Processing
      2. 1.1.2. Exercise: More on combinatorial functions
        1. QUESTIONS
      3. 1.1.3. Specializing Python Datatypes
        1. PYTHONIC POLYMORPHISM
        2. ENHANCED OBJECTS
      4. 1.1.4. Base Classes for Datatypes
        1. METHODS
          1. object.__eq__(self, other)
          2. object.__ne__(self, other)
          3. object.__nonzero__(self)
          4. object.__len__(self)len(object)
          5. object.__repr__(self)repr(object)object.__str__(self)str(object)
        2. BUILT-IN FUNCTIONS
          1. open(fname [,mode [,buffering]])file(fname [,mode [,buffering]])
        3. METHODS AND ATTRIBUTES
          1. FILE.close()
          2. FILE.closed
          3. FILE.fileno()
          4. FILE.flush()
          5. FILE.isatty()
          6. FILE.mode
          7. FILE.name
          8. FILE.read ([size=sys.maxint])
          9. FILE.readline([size=sys.maxint])
          10. FILE.readlines([size=sys.maxint])
          11. FILE.seek(offset [,whence=0])
          12. FILE.tell()
          13. FILE.truncate([size=0])
          14. FILE.write(s)
          15. FILE.writelines(lines)
          16. FILE.xreadlines()
        4. METHODS
          1. int.__and__(self, other)int.__rand__(self, other)
          2. int.__hex__(self)
          3. int.__invert__(self)
          4. int.__lshift__(self, other)int.__rlshift__(self, other)
          5. int.__oct__(self)
          6. int.__or__(self, other)int.__ror__(self, other)
          7. int.__rshift__(self, other)int.__rrshift__(self, other)
          8. int.__xor__(self, other)int.__rxor__(self, other)
        5. DIGRESSION
        6. CAPABILITIES
        7. METHODS
          1. float.__abs__(self)
          2. float.__add__(self, other)float.__radd__(self, other)
          3. float.__cmp__(self, other)
          4. float.__div__(self, other)float.__rdiv__(self, other)
          5. float.__divmod__(self, other)float.__rdivmod__(self, other)
          6. float.__floordiv__(self, other)float.__rfloordiv__(self, other)
          7. float.__mod__(self, other)float.__rmod__(self, other)
          8. float.__mul__(self, other)float.__rmul__(self, other)
          9. float.__neg__(self)
          10. float.__pow__(self, other)float.__rpow__(self, other)
          11. float.__sub__(self, other)float.__rsub__(self, other)
          12. float.__truediv__(self, other)float.__rtruediv__(self, other)
        8. METHODS
          1. complex.conjugate(self)
          2. complex.imag
          3. complex.real
        9. METHODS
          1. dict.__cmp__(self, other)UserDict.UserDict.__cmp__(self, other)
          2. dict.__contains__(self, x)UserDict.UserDict.__contains__(self, x)
          3. dict.__delitem__(self, x)UserDict.UserDict.__delitem__(self, x)
          4. dict.__getitem__(self, x)UserDict.UserDict.__getitem__(self, x)
          5. dict.__len__(self)UserDict.UserDict.__len__(self)
          6. dict.__setitem__(self, key, val)UserDict.UserDict.__setitem__(self, key, val)
          7. dict.clear(self)UserDict.UserDict.clear(self)
          8. dict.copy(self)UserDict.UserDict.copy(self)
          9. dict.get(self, key [,default=None])UserDict.UserDict.get(self, key [,default=None])
          10. dict.has_key(self, key)UserDict.UserDict.has_key(self, key)
          11. dict.items(self)UserDict.UserDict.items(self)dict.iteritems(self)UserDict.UserDict.iteritems(self)
          12. dict.keys(self)UserDict.UserDict.keys(self)dict.iterkeys(self)UserDict.UserDict.iterkeys(self)
          13. dict.popitem(self)UserDict.UserDict.popitem(self)
          14. dict.setdefault(self, key [,default=None])UserDict.UserDict.setdefault(self, key [,default=None])
          15. dict.update(self, other)UserDict.UserDict.update(self, other)
          16. dict.values(self)UserDict.UserDict.values(self)dict.itervalues(self)UserDict.UserDict.itervalues(self)
        10. METHODS
          1. list.__add__(self, other)UserList.UserList.__add__(self, other)tuple.__add__(self, other)list.__iadd__(self, other)UserList.UserList.__iadd__(self, other)
          2. list.__contains__(self, x)UserList.UserList.__contains__(self, x)tuple.__contains__(self, x)
          3. list.__delitem__(self, x)UserList.UserList.__delitem__(self, x)
          4. list.__delslice__(self, start, end)UserList.UserList.__delslice__(self, start, end)
          5. list.__getitem__(self, pos)UserList.UserList.__getitem__(self, pos)tuple.__getitem__(self, pos)
          6. list.__getslice__(self, start, end)UserList.UserList.__getslice__(self, start, end)tuple.__getslice__(self, start, end)
          7. list.__hash__(self)UserList.UserList.__hash__(self)tuple.__hash__(self)
          8. list.__len__(selfUserList.UserList.__len__(selftuple.__len__(self
          9. list.__mul__(self, num)UserList.UserList.__mul__(self, num)tuple.__mul__(self, num)list.__rmul__(self, num)UserList.UserList.__rmul__(self, num)tuple.__rmul__(self, num)list.__imul__(self, num)UserList.UserList.__imul__(self, num)
          10. list.__setitem__(self, pos, val)UserList.UserList.__setitem__(self, pos, val)
          11. list.__setslice__(self, start, end, other)UserList.UserList.__setslice__(self, start, end, other)
          12. list.append(self, item)UserList.UserList.append(self, item)
          13. list.count(self, item)UserList.UserList.count(self, item)
          14. list.extend(self, seq)UserList.UserList.extend (self, seq)
          15. list.index(self, item)UserList.UserList.index(self, item)
          16. list.insert(self, pos, item)UserList.UserList.insert(self, pos, item)
          17. list.pop(self [,pos=-1])UserList.UserList.pop(self [,pos=-1])
          18. list.remove(self, item)UserList.UserList.remove(self, item)
          19. list.reverse(self)UserList.UserList.reverse(self)
          20. list.sort(self [cmpfunc])UserList.UserList.sort(self [,cmpfunc])
        11. METHODS
          1. str.__contains__(self, x)UserString.UserString.__contains__(self, x)
      5. 1.1.5. Exercise: Filling out the forms (or deciding not to)
        1. DISCUSSION
        2. QUESTIONS
      6. 1.1.6. Problem: Working with lines from a large file
        1. A CACHED LINE LIST
        2. A RANDOM LINE
    2. 1.2. Standard Modules
      1. 1.2.1. Working with the Python Interpreter
        1. FUNCTIONS
          1. copy.copy(obj)
          2. copy.deepcopy(obj)
        2. FUNCTIONS
          1. getopt.getopt(args, options [,long_options]])
        3. ATTRIBUTES
          1. sys.argv
          2. sys.byteorder
          3. sys.copyright
          4. sys.hexversion
          5. sys.maxint
          6. sys.maxunicode
          7. sys.path
          8. sys.platform
          9. sys.stderrsys.__stderr__
          10. sys.stdinsys.__stdin__
          11. sys.stdoutsys.__stdout__
          12. sys.version
          13. sys.version_info
        4. FUNCTIONS
          1. sys.exit ([code=0])
          2. sys.getdefaultencoding()
          3. sys.getrefcount(obj)
        5. BUILT-IN
          1. type(o)
        6. CONSTANTS
          1. types.BuiltinFunctionTypetypes.BuiltinMethodType
          2. types.BufferType
          3. types.Class Type
          4. types.CodeType
          5. types.ComplexType
          6. types.DictTypetypes.DictionaryType
          7. types.EllipsisType
          8. types.FileType
          9. types.FloatType
          10. types.FrameType
          11. types.FunctionTypetypes.LambdaType
          12. types.GeneratorType
          13. types.InstanceType
          14. types.IntType
          15. types.ListType
          16. types.LongType
          17. types.MethodTypetypes.Unbound MethodType
          18. types.ModuleType
          19. types.NoneType
          20. types.StringType
          21. types.TracebackType
          22. types.TupleType
          23. types.UnicodeType
          24. types.SliceType
          25. types.StringTypes
          26. types.TypeType
          27. types.XRangeType
      2. 1.2.2. Working with the Local Filesystem
        1. FUNCTIONS
          1. dircache.listdir(path)
          2. dircache.opendir(path)
          3. dircache.annotate(path, lst)
        2. FUNCTIONS
          1. filecmp.cmp(fname1, fname2 [,shallow=1 [,use_statcache=0]])
          2. filecmp.cmpfiles(dirname1, dirname2, fnamelist [,shallow=1 [,use_statcache=0]])
        3. CLASSES
          1. filecmp.dircmp(dirname1, dirname2 [,ignore=...[,hide=...])
        4. METHODS AND ATTRIBUTES
          1. filecmp.dircmp.report()
          2. filecmp.dircmp.report_partial_closure()
          3. filecmp.dircmp.report_partial_closure()
          4. filecmp.dircmp.left_list
          5. filecmp.dircmp.right_list
          6. filecmp.dircmp.common
          7. filecmp.dircmp.left_only
          8. filecmp.dircmp.right_only
          9. filecmp.dircmp.common_dirs
          10. filecmp.dircmp.common_files
          11. filecmp.dircmp.common_funny
          12. filecmp.dircmp.same_files
          13. filecmp.dircmp.diff_files
          14. filecmp.dircmp.funny_files
          15. filecmp.dircmp.subdirs
        5. FUNCTIONS
          1. fileinput.input([files=sys.argv[1:] [,inplace=0 [,backup=“.bak”]]])
          2. fileinput.close()
          3. fileinput.nextfile()
          4. fileinput.filelineno()
          5. fileinput.filename()
          6. fileinput.isfirstline()
          7. fileinput.isstdin()
          8. fileinput.lineno()
        6. CLASSES
          1. fileinput.Filelnput([files [,inplace=0 [,backup=“.bak”]]])
        7. FUNCTIONS
          1. glob.glob(pat)
        8. FUNCTIONS
          1. linecache.getline(fname, linenum)
          2. linecache.clearcache()
          3. linecache.checkcache()
        9. FUNCTIONS
          1. os.path.abspath(pathname)
          2. os.path.basename(pathname)
          3. os .path.commonprefix(pathlist)
          4. os.path.dirname(pathname)
          5. os.path.exists(pathname)
          6. os.path.expanduser(pathname)
          7. os.path.expandvars(pathname)
          8. os.path.getatime(pathname)
          9. os.path.getmtime(pathname)
          10. os.path.getsize(pathname)
          11. os.path.isabs(pathname)
          12. os.path.isdir(pathname)
          13. os.path.isfile(pathname)
          14. os.path.islink(pathname)
          15. os.path.ismount(pathname)
          16. os.path.join(path1 [,path2 [...]])
          17. os.path.normcase(pathname)
          18. os.path.normpath(pathname)
          19. os.path.realpath(pathname)
          20. os.path.samefile(pathname1, pathname2)
          21. os.path.sameopenfile(fp1, fp2)
          22. os.path.split(pathname)
          23. os.path.splitdrive(pathname)
          24. os.path.walk(pathname, visitfunc, arg)
        10. FUNCTIONS
          1. shutil.copy(src, dst)
          2. shutil.copy2(src, dst)
          3. shutil.copyfile(src, dst)
          4. shutil.copyfileobj(fpsrc, fpdst [,buffer=-1])
          5. shutil.copymode(src, dst)
          6. shutil.copystat(src, dst)
          7. shutil.copytree(src, dst [,symlinks=0])
          8. shutil.rmtree(dirname [ignore [,errorhandler]])
        11. FUNCTIONS
          1. stat.S_ISDIR(mode)
          2. stat.S_ISCHR(mode)
          3. stat.S_ISBLK(mode)
          4. stat.S_ISREG(mode)
          5. stat.S_ISFIFO(mode)
          6. stat.S_ISLNK(mode)
          7. stat.S_ISSOCK(mode)
        12. CONSTANTS
          1. stat.ST_MODE
          2. stat.ST_INO
          3. stat.ST_DEV
          4. stat.ST_NLINK
          5. stat.ST_UID
          6. stat.ST_GID
          7. stat.ST_SIZE
          8. stat.ST_ATIME
          9. stat.ST_MTIME
          10. stat.ST_CTIME
        13. FUNCTIONS
          1. tempfile.mktemp([suffix=“”])
          2. tempfile.TemporaryFile([mode=“w+b” [,buffsize=-1 [suffix=“”]]])
        14. FUNCTIONS
          1. xreadlines.xreadlines(fp)
      3. 1.2.3. Running External Commands and Accessing OS Features
        1. FUNCTIONS
          1. commands.getoutput(cmd)
          2. commands.getstatusoutput(cmd)
          3. commands.getstatus(filename)
        2. FUNCTIONS
          1. os.access(pathname, operation)
          2. os.chdir(pathname)
          3. os.chmod(pathname, mode)
          4. os.chown(pathname, uid, gid)
          5. os.chroot(pathname)
          6. os.getcwd()
          7. os.getenv(var [,value=None])
          8. os.getpid()
          9. os.kill(pid, sig)
          10. os.link(src, dst)
          11. os.listdir(pathname)
          12. os.lstat(pathname)
          13. os.mkdir(pathname [,mode=0777])
          14. os.mkdirs(pathname [,mode=0777])
          15. os.mkfifo(pathname [,mode=0666])
          16. os.nice(increment)
          17. os.popen(cmd [,mode=“r” [,bufsize]])
          18. os.popen2(cmd [,mode [,bufsize]])
          19. os.popen3(cmd [,mode [,bufsize]])
          20. os.popen4(cmd [,mode [,bufsize]])
          21. os.putenv(var, value)
          22. os.readlink(linkname)
          23. os.remove(filename)
          24. os.removedirs(pathname)
          25. os.rename(src, dst)
          26. os.renames(src, dst)
          27. os.rmdir(pathname)
          28. os.startfile(path)
          29. os.stat(pathname)
          30. os.strerror(code)
          31. os.symlink(src, dst)
          32. os.system(cmd)
          33. os.tempnam([dir [,prefix]])
          34. os.tmpfile()
          35. os.uname()
          36. os.unlink(filename)
          37. os.utime(pathname, times)
        3. CONSTANTS AND ATTRIBUTES
          1. os.altsep
          2. os.curdir
          3. os.defpath
          4. os.environ
          5. os.linesep
          6. os.name
          7. os.pardir
          8. os.pathsep
          9. os.sep
      4. 1.2.4. Special Data Values and Formats
        1. FUNCTIONS
          1. random.betavariate(alpha, beta)
          2. random.choice(seq)
          3. random.cunifvariate(mean, arc)
          4. random.expovariate(lambda_)
          5. random.gamma(alpha, beta)
          6. random.gauss(mu, sigma)
          7. random.lognormvariate(mu, sigma)
          8. random.normalvariate(mu, sigma)
          9. random.paretovariate(alpha)
          10. random.random()
          11. random.randrange([start=0,] stop [,step=1])
          12. random.seed([x=time.time()])
          13. random.shuffle(seq [,random=random.random])
          14. random.uniform(min, max)
          15. random.vonmisesvariate(mu, kappa)
          16. random.weibullvariate(alpha, beta)
        2. FUNCTIONS
          1. struct.calcsize(fmt)
          2. struct.pack(fmt, v1 [,v2 [...]])
          3. struct.unpack(fmt, s)
        3. CONSTANTS AND ATTRIBUTES
          1. time.accept2dyear
          2. time.altzonetime.daylighttime.timezonetime.tzname
        4. FUNCTIONS
          1. time.asctime([tuple=time.localtime()])
          2. time.clock()
          3. time.ctime([seconds=time.time()])
          4. time.gmtime([seconds=time.time()])
          5. time.localtime([seconds=time.time()])
          6. time.mktime(tuple)
          7. time.sleep(seconds)
          8. time.strftime(format [,tuple=time.localtime()])
          9. time.strptime(s [,format=“%a %b %d %H:%M:%S %Y”])
          10. time.time()
    3. 1.3. Other Modules in the Standard Library
      1. __builtin__
      2. 1.3.1. Serializing and Storing Python Objects
        1. FUNCTIONS
          1. DBM.open(fname [,flag=“r” [,mode=0666]])
        2. METHODS
          1. DBM.close()
          2. DBM.first()
          3. DBM.has_key(key)
          4. DBM.keys()
          5. DBM.Iast()
          6. DBM.next()
          7. DBM.previous()
          8. DBM.sync()
        3. MODULES
          1. anydbm
          2. bsddb
          3. dbhash
          4. dbm
          5. dumbdbm
          6. gdbm
          7. whichdb
        4. FUNCTIONS
          1. pickle.dump(o, file [,bin=0])cPickle.dump(o, file [,bin=0])
          2. pickle.dumps(o [,bin=0])cPickle.dumps(o [,bin=0])
          3. pickle.load(file)cPickle.load(file)
          4. pickle.loads(s)cPickle.load(s)
          5. marshal
        5. FUNCTIONS
          1. pprint.isreadable(o)
          2. pprint.isrecursive(o)
          3. pprint.pformat(o)
          4. pprint.pprint(o [,stream=sys.stdout])
        6. CLASSES
          1. pprint.PrettyPrinter(width=80, depth=..., indent=1, stream=sys.stdout)
        7. METHODS
        8. CLASSES
          1. repr.Repr()
        9. ATTRIBUTES
          1. repr.maxlevel
          2. repr.maxdictrepr.maxlistrepr.maxtuple
          3. repr.maxlong
          4. repr.maxstring
          5. repr.maxother
        10. FUNCTIONS
          1. repr.repr(o)
          2. repr.repr_TYPE(o, level)
      3. 1.3.2. Platform-Specific Operations
        1. _winreg
        2. AE
        3. aepack
        4. aetypes
        5. applesingle
        6. buildtools
        7. calendar
        8. Carbon.AE, Carbon.App, Carbon.CF, Carbon.Cm, Carbon.Ctl, Carbon.Dlg, Carbon.Evt, Carbon.Fm, Carbon.Help, Carbon.List, Carbon.Menu, Carbon.Mlte, Carbon.Qd, Carbon.Qdoffs, Carbon.Qt, Carbon.Res, Carbon.Scrap, Carbon.Snd, Carbon.TE, Carbon.Win
        9. cd
        10. cfmfile
        11. ColorPicker
        12. ctb
        13. dl
        14. EasyDialogs
        15. fcntl
        16. findertools
        17. fl, FL, flp
        18. fm, FM
        19. fpectl
        20. FrameWork, MiniAEFrame
        21. gettext
        22. grp
        23. locale
        24. mac, macerrors, macpath
        25. macfs, macfsn, macostools
        26. MacOS
        27. macresource
        28. macspeech
        29. mactty
        30. mkcwproject
        31. msvcrt
        32. Nac
        33. nis
        34. pipes
        35. PixMapWrapper
        36. posix, posixfile
        37. preferences
        38. pty
        39. pwd
        40. pythonprefs
        41. py_resource
        42. quietconsole
        43. resource
        44. syslog
        45. tty, termios, TERMIOS
        46. W
        47. waste
        48. winsound
        49. xdrlib
      4. 1.3.3. Working with Multimedia Formats
        1. aifc
        2. al, AL
        3. audioop
        4. chunk
        5. colorsys
        6. gl, DEVICE, GL
        7. imageop
        8. imgfile
        9. jpeg
        10. rgbimg
        11. sunau
        12. sunaudiodev, SUNAUDIODEV
        13. videoreader
        14. wave
      5. 1.3.4. Miscellaneous Other Modules
        1. array
        2. atexit
        3. BaseHTTPServer, SimpleHTTPServer, SimpleXMLRPCServer, CGIHTTPServer
        4. Bastion
        5. bisect
        6. cmath
        7. cmd
        8. code
        9. codeop
        10. compileall
        11. compile, compile.ast, compile.visitor
        12. copy_reg
        13. curses, curses.ascii, curses.panel, curses.textpad, curses.wrapper
        14. dircache
        15. dis
        16. distutils
        17. doctest
        18. errno
        19. fpformat
        20. gc
        21. getpass
        22. imp
        23. inspect
        24. keyword
        25. math
        26. mutex
        27. new
        28. pdb
        29. popen2
        30. profile
        31. pstats
        32. pyclbr
        33. pydoc
        34. py_compile
        35. Queue
        36. readline, rlcompleter
        37. rexec
        38. sched
        39. signal
        40. site, user
        41. statcache
        42. statvfs
        43. thread, threading
        44. Tkinter, ScrolledText, Tix, turtle
        45. traceback
        46. unittest
        47. warnings
        48. weakref
        49. whrandom
  5. 2. Basic String Operations
    1. 2.1. Some Common Tasks
      1. 2.1.1. Problem: Quickly sorting lines on custom criteria
      2. 2.1.2. Problem: Reformatting paragraphs of text
      3. 2.1.3. Problem: Column statistics for delimited or flat-record files
      4. 2.1.4. Problem: Counting characters, words, lines, and paragraphs
      5. 2.1.5. Problem: Transmitting binary data as ASCII
      6. 2.1.6. Problem: Creating word or letter histograms
      7. 2.1.7. Problem: Reading a file backwards by record, line, or paragraph
        1. QUESTIONS
    2. 2.2. Standard Modules
      1. 2.2.1. Basic String Transformations
        1. CONSTANTS
          1. string.digits
          2. string.hexdigits
          3. string.octdigits
          4. string.lowercase
          5. string.uppercase
          6. string.letters
          7. string.punctuation
          8. string.whitespace
          9. string.printable
        2. FUNCTIONS
          1. string.atof(s=...)
          2. string.atoi(s=...[,base=10])
          3. string.atol(s=...[,base=10])
          4. string.capitalize(s=...)”“.capitalize()
          5. string.capwords(s=...)”“.title()
          6. string.center(s=. . . , width=...)”“.center(width)
          7. string.count(s, sub [,start [,end]])”“.count(sub [,start [,end]])
          8. ”“.endswith(suffix [,start [,end]])
          9. string.expandtabs(s=...[,tabsize=8])”“.expandtabs([,tabsize=8])
          10. string.find(s, sub [,start [,end]])”“.find(sub [,start [,end]])
          11. string.index(s, sub [,start [,end]])”“.index(sub [,start [,end]])
          12. ”“.isalpha()
          13. ”“.isalnum()
          14. ”“.isdigit()
          15. ”“.islower()
          16. ”“.isspace()
          17. ”“.istitle()
          18. ”“.isupper()
          19. string.join(words=...[,sep=” “])”“.join (words)
          20. string.joinfields(...)
          21. string.ljust(s=..., width=...)”“.Ijust(width)
          22. string.lower(s=...)”“.lower()
          23. string.lstrip(s=...)”“.lstrip([chars=string.whitespace])
          24. string.maketrans(from, to)
          25. string.replace(s=..., old=..., new=...[,maxsplit=...])”“.replace(old, new [,maxsplit])
          26. string.rfind(s, sub [,start [,end]])”“.rfind(sub [,start [,end]])
          27. string.rindex(s, sub [,start [,end]])”“.rindex(sub [,start [,end]])
          28. string.rjust(s=..., width=...)”“.rjust(width)
          29. string.rstrip(s=...)”“.rstrip([chars=string.whitespace])
          30. string.split(s=...[,sep=...[,maxsplit=...]])”“.split([,sep [,maxsplit]])
          31. string.splitfields(...)
          32. ”“.splitlines([keepends=0])
          33. ”“.startswith(prefix [,start [,end]])
          34. string.strip(s=...)”“.strip([chars=string.whitespace])
          35. string.swapcase(s=...)”“.swapcase()
          36. string.translate(s=..., table=...[,deletechars=”“])”“.translate(table [,deletechars=”“])
          37. string.upper(s=...)”“.upper()
          38. string.zfill(s=..., width=...)
      2. 2.2.2. Strings as Files, and Files as Strings
        1. CLASSES
          1. mmap.mmap(fileno, length [,tagname]) (Windows)mmap.mmap(fileno, length [,flags=MAP_SHARED, prot=PROT_READ|PROT_WRITE])
        2. METHODS
          1. mmap.mmap.close()
          2. mmap.mmap.find(sub [,pos])
          3. mmap.mmap.flush([offset, size])
          4. mmap.mmap.move(target, source, length)
          5. mmap.mmap.read(num)
          6. mmap.mmap.read_byte()
          7. mmap.mmap.readline()
          8. mmap.mmap.resize(newsize)
          9. mmap.mmap.seek(offset [,mode])
          10. mmap.mmap.size()
          11. mmap.mmap.tell()
          12. mmap.mmap.write(s)
          13. mmap.mmap.write_byte(c)
        3. CONSTANTS
          1. cStringIO.InputType
          2. cStringlO.OutputType
        4. CLASSES
          1. StringlO.StringIO ([buf=...])cStringIO.StringIO([buf])
        5. METHODS
          1. StringIO.StringIO.close()cStringIO.StringIO.close()
          2. StringIO.StringIO.flush()cStringIO.StringIO.flush()
          3. StringIO.StringIO.getvalue()cStringIO.StringIO.getvalue()
          4. StringIO.StringIO.isatty()cStringIO.StringIO.isatty()
          5. StringIO.StringIO.read ([num])cStringIO.StringIO.read ([num])
          6. StringIO.StringIO.readline([length=...])cStringIO.StringIO.readline([length])
          7. StringIO.StringIO.readlines([sizehint=...])cStringIO.StringIO.readlines([sizehint]
          8. cStringIO.StringIO.reset()
          9. StringIO.StringIO.seek(offset [,mode=0])cStringIO.StringIO.seek(offset [,mode])
          10. StringIO.StringIO.tell()cStringIO.StringIO.tell()
          11. StringIO.StringIO.truncate([len=0])cStringIO.StringIO.truncate ([len])
          12. StringIO.StringIO.write(s=...)cStringIO.StringIO.write(s)
          13. StringIO.StringIO.writelines(list=...)cStringIO.String IO.writelines(list)
      3. 2.2.3. Converting Between Binary and ASCII
        1. FUNCTIONS
          1. base64.encode(input=..., output=...)
          2. base64.encodestring(s=...)
          3. base64.decode(input=..., output=...)
          4. base64.decodestring(s=...)
        2. FUNCTIONS
          1. binascii.a2b_base64(s)
          2. binascii.a2b_hex(s)
          3. binascii.a2b_hqx(s)
          4. binascii.a2b_qp(s [,header=0])
          5. binascii.a2b_uu(s)
          6. binascii.b2a_base64(s)
          7. binascii.b2a_hex(s)
          8. binascii.b2a_hqx(s)
          9. binascii.b2a_qp(s [,quotetabs=0 [,istext=1 [header=0]]])
          10. binascii.b2a_uu(s)
          11. binascii.crc32(s [,crc])
          12. binascii.crc_hqx(s, crc)
          13. binascii.hexlify(s)
          14. binascii.rlecode_hqx(s)
          15. binascii.rledecode_hqx(s)
          16. binascii.unhexlify(s)
        3. EXCEPTIONS
          1. binascii.Error
          2. binascii.Incomplete
        4. FUNCTIONS
          1. binhex.binhex(inp=..., out=...)
          2. binhex.hexbin(inp=...[,out=...])
        5. CLASSES
        6. FUNCTIONS
          1. quopri.encode(input, output, quotetabs)
          2. quopri.encodestring(s [,quotetabs=0])
          3. quopri.decode(input=..., output=...[,header=0])
          4. quopri.decodestring(s [,header=0])
        7. FUNCTIONS
          1. uu.encode(in, out, [name=...[,mode=0666]])
          2. uu.decode(in, [,out_file=...[, mode=...])
      4. 2.2.4. Cryptography
        1. mxCryptoamkCrypto
        2. Python Cryptography
        3. M2Crypto
        4. fcrypt
        5. FUNCTIONS
          1. crypt.crypt(passwd, salt)
        6. CONSTANTS
          1. md5.MD5Type
        7. CLASSES
          1. md5.new([s])
          2. md5.md5([s])
        8. METHODS
          1. md5.copy()
          2. md5.digest()
          3. md5.hexdigest()
          4. md5.update(s)
        9. CLASSES
          1. rotor.newrotor(key [,numrotors])
        10. METHODS
          1. rotor.decrypt(s)
          2. rotor.decryptmore(s)
          3. rotor.encrypt(s)
          4. rotor.encryptmore(s)
          5. rotor.setkey (key)
        11. CLASSES
          1. sha.new([s])
          2. sha.sha ([s])
        12. METHODS
          1. sha.copy()
          2. sha.digest()
          3. sha.hexdigest()
          4. sha.update(s)
      5. 2.2.5. Compression
        1. CLASSES
          1. gzip.GzipFile([filename=...[,mode=”rb“ [,compresslevel=9 [,fileobj=...]]]])
          2. gzip.open(filename=...[mode='rb [,compresslevel=9]])
        2. METHODS AND ATTRIBUTES
          1. gzip.close()
          2. gzip.flush()
          3. gzip.isatty()
          4. gzip.myfileobj
          5. gzip.read([num])
          6. gzip.readline([length])
          7. gzip.readlines([sizehint=...])
          8. gzip.write(s)
          9. gzip.writelines(list)
        3. CONSTANTS
        4. FUNCTIONS
          1. zipfile.is_zipfile(filename=...)
        5. CLASSES
          1. zipfile.PyZipFile(pathname)
          2. zipfile.ZipFile(file=...[,mode='r' [,compression=ZIP_STORED]])
          3. zipfile.Ziplnfo()
        6. METHODS AND ATTRIBUTES
          1. zipfile.ZipFile.close()
          2. zipfile.ZipFile.getinfo(name=...)
          3. zipfile.ZipFile.infolist()
          4. zipfile.ZipFile.namelist()
          5. zipfile.ZipFile.printdir()
          6. zipfile.ZipFile.read(name=...)
          7. zipfile.ZipFile.testzip()
          8. zipfile.ZipFile.write(filename=...[,arcname=...[,compress_type=...]])
          9. zipfile.ZipFile.writestr(zinfo=..., bytes=...)
          10. zipfile.ZipFile.NameTolnfo
          11. zipfile.ZipFile.compression
          12. zipfile.ZipFile.debug = 0
          13. zipfile.ZipFile.filelist
          14. zipfile.ZipFile.filename
          15. zipfile.ZipFile.fp
          16. zipfile.ZipFile.mode
          17. zipfile.ZipFile.start_dir
          18. zipfile.Ziplnfo.CRC
          19. zipfile.ZipInfo.comment
          20. zipfile.ZipInfo.compress_size
          21. zipfile.ZipInfo.compress_type
          22. zipfile.ZipInfo.create_system
          23. zipfile.ZipInfo.create_version
          24. zipfile.ZipInfo.date_time
          25. zipfile.ZipInfo.external_attr
          26. zipfile.ZipInfo.extract_version
          27. zipfile.ZipInfo.file_offset
          28. zipfile.ZipInfo.file size
          29. zipfile.ZipInfo.filename
          30. zipfile.ZipInfo.header_offset
          31. zipfile.ZipInfo.volume
        7. EXCEPTIONS
          1. zipfile.error
          2. zipfile.BadZipFile
        8. CONSTANTS
          1. zlib.ZLIB_VERSION
          2. zlib.Z_BEST_COMPRESSION = 9
          3. zlib.Z_BEST_SPEED = 1
          4. zlib.Z_HUFFMAN_ONLY = 2
        9. FUNCTIONS
          1. zlib.adler32(s [,crc])
          2. zlib.compress(s [,level])
          3. zlib.crc32(s [,crc])
          4. zlib.decompress(s [,winsize [,buffsize]])
        10. CLASS FACTORIES
          1. zlib.compressobj([level])
          2. zlib.decompressobj([winsize])
        11. METHODS AND ATTRIBUTES
          1. zlib.compressobj.compress(s)
          2. zlib.compressobj.flush([mode])
          3. zlib.decompressobj.unused_data
          4. zlib.decompressobj.decompress (s)
          5. zlib.decompressobj.flush()
        12. EXCEPTIONS
          1. zlib.error
      6. 2.2.6. Unicode
        1. ascii, us-ascii
        2. base64
        3. latin-1, iso-8859-1
        4. quopri
        5. rot13
        6. utf-7
        7. utf-8
        8. utf-16
        9. utf-16-le
        10. utf-16-be
        11. unicode-escape
        12. raw-unicode-escape
        13. strict
        14. ignore
        15. replace
        16. u”“.encode([enc [,errmode]])”“.encode([enc [,errmode]])
        17. unicode(s [,enc [,errmode]])
        18. unichr(cp)
        19. codecs.open(filename=...[,mode='rb' [,encoding=...[,errors='strict' [,buffering=1]]]])
        20. codecs.EncodedFile(file=..., data_encoding=...[,file_encoding=...[,errors='strict']])
        21. FUNCTIONS
          1. unicodedata.bidirectional(unichr)
          2. unicodedata.category (unichr)
          3. unicodedata.combining(unichr)
          4. unicodedata.decimal(unichr [,default])
          5. unicodedata.decomposition(unichr)
          6. unicodedata.digit(unichr [,default])
          7. unicodedata.lookup(name)
          8. unicodedata.mirrored(unichr)
          9. unicodedata.name(unichr)
          10. unicodedata.numeric(unichr [,default])
    3. 2.3. Solving Problems
      1. 2.3.1. Exercise: Many ways to take out the garbage
        1. DISCUSSION
        2. QUESTIONS
      2. 2.3.2. Exercise: Making sure things are what they should be
        1. DISCUSSION
        2. QUESTIONS
      3. 2.3.3. Exercise: Finding needles in haystacks (full-text indexing)
        1. DISCUSSION
        2. QUESTIONS
  6. 3. Regular Expressions
    1. 3.1. A Regular Expression Tutorial
      1. 3.1.1. Just What Is a Regular Expression, Anyway?
      2. 3.1.2. Matching Patterns in Text: The Basics
      3. 3.1.3. Matching Patterns in Text: Intermediate
      4. 3.1.4. Advanced Regular Expression Extensions
    2. 3.2. Some Common Tasks
      1. 3.2.1. Problem: Making a text block flush left
      2. 3.2.2. Problem: Summarizing command-line option documentation
      3. 3.2.3. Problem: Detecting duplicate words
      4. 3.2.4. Problem: Checking for server errors
      5. 3.2.5. Problem: Reading lines with continuation characters
      6. 3.2.6. Problem: Identifying URLs and email addresses in texts
      7. 3.2.7. Problem: Pretty-printing numbers
    3. 3.3. Standard Modules
      1. 3.3.1. Versions and Optimizations
      2. 3.3.2. Simple Pattern Matching
        1. FUNCTIONS
          1. fnmatch.fnmatch(s, pat)
          2. fnmatch.fnmatchcase(s, pat)
          3. fnmatch.filter(lst, pat)
      3. 3.3.3. Regular Expression Modules
        1. FUNCTIONS
          1. reconvert.convert(s)
        2. PATTERN SUMMARY
        3. ATOMIC OPERATORS
          1. Plain symbol
          2. Escape: “\”
          3. Grouping operators: “(”, “)”
          4. Backreference: “\d”, “\dd”
          5. Character classes: “[”, “]”
          6. Digit character class: “\d”
          7. Non-digit character class: “\D”
          8. Alphanumeric character class: “\w”
          9. Non-alphanumeric character class: “\W”
          10. Whitespace character class: “\s”
          11. Non-whitespace character class: “\S”
          12. Wildcard character: “.”
          13. Beginning of line: “^”
          14. Beginning of string: “\A”
          15. End of line: “$”
          16. End of string: “\Z”
          17. Word boundary: “\b”
          18. Non-word boundary: “\B”
          19. Alternation operator: “ |”
        4. QUANTIFIERS
          1. Universal quantifier: “*”
          2. Non-greedy universal quantifier: “*?”
          3. Existential quantifier: “+”
          4. Non-greedy existential quantifier: “+?”
          5. Potentiality quantifier: “?”
          6. Non-greedy potentiality quantifier: “??”
          7. Exact numeric quantifier: “{num}”
          8. Lower-bound quantifier: “{min,}”
          9. Bounded numeric quantifier: “{min,max}”
          10. Non-greedy bounded quantifier: “{min,max}?”
        5. GROUP-LIKE PATTERNS
          1. Pattern modifiers: “(?Limsux)”
          2. Comments: “(?#...)”
          3. Non-backreferenced atom: “(?:...)”
          4. Positive Lookahead assertion: “(?=...)”
          5. Negative Lookahead assertion: “(?!...)”
          6. Positive Lookbehind assertion: “(?< =...)”
          7. Negative Lookbehind assertion: “(?<!...)”
          8. Named group identifier: “(?P<name>)”
          9. Named group backreference: “(?P=name)”
        6. CONSTANTS
          1. re.l, re.IGNORECASE
          2. re.L, re.LOCALE
          3. re.M, re.MULTILINE
          4. re.S, re.DOTALL
          5. re.U, re.UNICODE
          6. re.X, re.VERBOSE
          7. re.engine
        7. FUNCTIONS
          1. re.escape(s)
          2. re.findall(pattern=..., string=...)
          3. re.purge()
          4. re.split(pattern=..., string=...[,maxsplit=0])
          5. re.sub(pattern=..., repl=..., string=...[,count=0])
          6. re.subn(pattern=..., repl=..., string=...[,count=0])
        8. CLASS FACTORIES
          1. re.compile(pattern=...[,flags=...])
          2. re.match(pattern=..., string=...[,flags=...])
          3. re.search(pattern=..., string=...[,flags=...])
        9. METHODS AND ATTRIBUTES
          1. re.compile.findall(s)
          2. re.compile.flags
          3. re.compile.groupindex
          4. re.compile.match(s [,start [,end]])
          5. re.compile.pattern
          6. re.compile.search(s [,start [,end]])
          7. re.compile.split(s [,maxsplit])
          8. re.compile.sub(repl, s [,count=0])
          9. re.compile.subn()
          10. re.match.end([group])re.search.end ([group])
          11. re.match.endpos, re.search.endpos
          12. re.match.expand(template)re.search.expand(template)
          13. re.match.group([group [,...]])re.search.group([group [,...]])
          14. re.match.groupdict([defval])re.search.groupdict([defval])
          15. re.match.groups([defval])re.search.groups([defval])
          16. re.match.lastgroup, re.search.lastgroup
          17. re.match.lastindex, re.search.lastindex
          18. re.match.pos, re.search.pos
          19. re.match.re, re.search.re
          20. re.match.span ([group])re.search.span([group])
          21. re.match.start ([group])re.search.start ([group])
          22. re.match.string, re.search.string
        10. EXCEPTIONS
          1. re.error
  7. 4. Parsers and State Machines
    1. 4.1. An Introduction to Parsers
      1. 4.1.1. When Data Becomes Deep and Texts Become Stateful
      2. 4.1.2. What Is a Grammar?
      3. 4.1.3. An EBNF Grammar for IF/THEN/END Structures
      4. 4.1.4. Pencil-and-Paper Parsing
      5. 4.1.5. Exercise: Some variations on the language
    2. 4.2. An Introduction to State Machines
      1. 4.2.1. Understanding State Machines
      2. 4.2.2. Text Processing State Machines
      3. 4.2.3. When Not to Use a State Machine
        1. Sidebar: A digression on functional programming
      4. 4.2.4. When to Use a State Machine
      5. 4.2.5. An Abstract State Machine Class
      6. 4.2.6. Processing a Report with a Concrete State Machine
      7. 4.2.7. Subgraphs and State Reuse
      8. 4.2.8. Exercise: Finding other solutions
    3. 4.3. Parser Libraries for Python
      1. 4.3.1. Specialized Parsers in the Standard Library
        1. ConfigParser
        2. difflib.../Tools/scripts/ndiff.py
        3. formatter
        4. htmllib
        5. multifile
        6. parsersymboltokentokenize
        7. robotparser
        8. sgmllib
        9. shlex
        10. tabnanny
      2. 4.3.2. Low-Level State Machine Parsing
        1. BENCHMARKS
          1. Example: Buyer/Order Report Parsing
          2. Tag table state in buyers
          3. Subtable states in buyers
          4. Example: Marking up smart ASCII
        2. DEBUGGING A TAG TABLE
        3. CONSTANTS
          1. mx.TextTools.a2zmx.TextTools.a2z_set
          2. mx.TextTools.A2Zmx.TextTools.A2Z_set
          3. mx.TextTools.umlautemx.TextTools.umlaute_set
          4. mx.TextTools.Umlautemx.TextTools.Umlaute_set
          5. mx.TextTools.alphamx.TextTools.alpha_set
          6. mx.TextTools.german_alphamx.TextTools.german_alpha_set
          7. mx.TextTools.numbermx.TextTools.number_set
          8. mx.TextTools.alphanumericmx.TextTools.alphanumeric_set
          9. mx.TextTools.whitemx.TextTools.white_set
          10. mx.TextTools.newlinemx.TextTools.newline_set
          11. mx.TextTools.formfeedmx.TextTools.formfeed_set
          12. mx.TextTools.whitespacemx.TextTools.whitespace_set
          13. mx.TextTools.anymx.TextTools.any_set
        4. COMMANDS
        5. UNCONDITIONAL COMMANDS
          1. mx.TextTools.Failmx.TextTools.Jump
          2. mx.TextTools.Skipmx.TextTools.Move
        6. MATCHING PARTICULAR CHARACTERS
          1. mx.TextTools.AllInmx.TextTools.AllInSetmx.TextTools.AllInCharSet
          2. mx.TextTools.AIINotIn
          3. mx.TextTools.ls
          4. mx.TextTools.IsNot
          5. mx.TextTools.IsInmx.TextToo1s.IsInSetmx.TextTools.IsInCharSet
          6. mx.TextTools.IsNotIn
        7. MATCHING SEQUENCES
          1. mx.TextTools.Word
          2. mx.TextTools.WordStartmx.TextTools.sWordStartmx.TextTools.WordEndmx.TextTools.sWordEnd
          3. mx.TextTools.sFindWord
          4. mx.TextTools.EOF
        8. COMPOUND MATCHES
          1. mx.TextTools.Tablemx.TextTools.SubTable
          2. mx.TextTools.TableInListmx.TextTools.SubTableInList
          3. mx.TextTools.Call
          4. mx.TextTools.CallArg
        9. MODIFIERS
          1. mx.TextTools.CallTag
          2. mx.TextTools.AppendMatch
          3. mx.TextTools.AppendToTagobj
          4. mx.TextTools.AppendTagobj
          5. mx.TextTools.LookAhead
        10. CLASSES
          1. mx.TextTools.BMS(word [,translate])mx.TextTools.FS(word [,translate])mx.TextTools.TextSearch(word [,translate [,algorithm=BOYERMOORE]])
          2. mx.TextTools.CharSet(definition)
        11. METHODS AND ATTRIBUTES
          1. mx.TextTools.BMS.search(s [,start [,end]])mx.TextTools.FS.search(s [,start [,end]])mx.TextTools.TextSearch.search(s [,start [,end]])
          2. mx.TextTools.BMS.find(s, [,start [,end]])mx.TextTools.FS.find(s, [,start [,end]])mx.TextTools.TextSearch.search(s [,start [,end]])
          3. mx.TextTools.BMS.findall(s [,start [,end]])mx.TextTools.FS.findall(s [,start [,end]])mx.TextTools.TextSearch.search(s [,start [,end]])
          4. mx.TextTools.BMS.matchmx.TextTools.FS.matchmx.TextTools.TextSearch.match
          5. mx.TextTools.BMS.translatemx.TextTools.FS.translatemx.TextTools.TextSearch.match
          6. mx.TextTools.CharSet.contains(c)
          7. mx.TextTools.CharSet.search(s [,direction [,start=0 [,stop=len(s)]]])
          8. mx.TextTools.CharSet.match(s [,direction [,start=0 [,stop=len(s)]]])
          9. mx.TextTools.CharSet.split(s [,start=0 [,stop=len(text)]])
          10. mx.TextTools.CharSet.splitx(s [,start=0 [,stop=len(text)]])
          11. mx.TextTools.CharSet.strip(s [,where=0 [,start=0 [,stop=len(s)]]])
        12. FUNCTIONS
          1. mx.TextTools.cmp(t1, t2)
          2. mx.TextTools.invset(s)
          3. mx.TextTools.set(s [,includechars=1])
          4. mx.TextTools.tag(s, table [,start [,end [,taglist]]])
        13. UTILITY FUNCTIONS
          1. mx.TextTools.charsplit(s, char, [start [,end]])
          2. mx.TextTools.collapse(s, sep=' ')
          3. mx.TextTools.countlines(s)
          4. mx.TextTools.find(s, search_obj, [start, [,end]])
          5. mx.TextTools.findall(s, search_obj [,start [,end]])
          6. mx.TextTools.hex2str(hexstr)
          7. mx.TextTools.is_whitespace(s [,start [,end]])
          8. mx.TextTools.isascii(s)
          9. mx.TextTools.join(joinlist [,sep=”“ [,start [,end]]])
          10. mx.TextTools.lower(s)
          11. mx.TextTools.prefix(s, prefixes [,start [,stop [,translate]]])
          12. mx.TextTools.multireplace(s ,replacements [,start [,stop]])
          13. mx.TextTools.replace(s, old, new [,start [,stop]])
          14. mx.TextTools.setfind(s, set [,start [,end]])
          15. mx.TextTools.setsplit(s, set [,start [,stop]])
          16. mx.TextTools.setsplitx(text,set[,start =0, stop =len(text)])
          17. mx.TextTools.splitat(s, char, [n=1 [,start [end]]])
          18. mx.TextTools.splitlines(s)
          19. mx.TextTools.splitwords(s)
          20. mx.TextTools.str2hex(s)
          21. mx.TextTools.suffix(s, suffixes [,start [,stop [,translate]]])
          22. mx.TextTools.upper(s)
      3. 4.3.3. High-Level EBNF Parsing
        1. Example: Marking up smart ASCII (Redux)
        2. GENERATING AND USING A TAGLIST
        3. THE TAGLIST AND THE OUTPUT
        4. GRAMMAR
        5. DECLARATION PATTERNS
        6. LITERALS
          1. Literal string
          2. Character class: “[”, “]”
        7. QUANTIFIERS
          1. Universal quantifier: “*”
          2. Existential quantifier: “+”
          3. Potentiality quantifier: “?”
          4. Lookahead quantifier: “?”
          5. Error on Failure: “!”
        8. STRUCTURES
          1. Alternation operator: “/”
          2. Sequence operator: “,”
          3. Negation operator: “-”
          4. Grouping operators: “(”, “)”
        9. USEFUL PRODUCTIONS
          1. simpleparse.common.calendar_names
          2. simpleparse.common.chartypes
          3. simpleparse.common.comments
          4. simpleparse.common.iso_date
          5. simpleparse.common.iso_date_loose
          6. simpleparse.common.numbers
          7. simpleparse.common.phonetics
          8. simpleparse.common.strings
          9. simpleparse.common.timezone_names
        10. GOTCHAS
      4. 4.3.4. High-Level Programmatic Parsing
        1. Example: Marking up smart ASCII (yet again)
        2. GENERATING A TOKEN LIST
          1. Parsing a token list
        3. LEX
        4. YACC
        5. MORE ON PLY PARSERS
          1. Error Recovery
          2. The Parser State Machine
          3. Precedence and Associativity
  8. 5. Internet Tools and Techniques
    1. 5.1. Working with Email and Newsgroups
      1. 5.1.1. Manipulating and Creating Message Texts
        1. CLASSES
          1. email.MIMEBase.MIMEBase(maintype, subtype, **params)
          2. email.MIMENonMultipart.MIMENonMultipart(maintype, subtype, **params)
          3. email.MIMEMultipart.MIMEMultipart([subtype=“mixed” [boundary, [,*subparts [,**params]]]])
          4. email.MIMEAudio.MIMEAudio(audiodata [,subtype [,encoder [,**params]]])
          5. email.MIMEImage.MIMEImage(imagedata [,subtype [,encoder [,**params]]])
          6. email.MIMEText.MIMEText(text [,subtype [,charset]])
        2. FUNCTIONS
          1. email.message_from_file(file [,_class=email.Message.Message [,strict=0]])
          2. email.message_from_string(s [,_class=email.Message.Message [,strict=0]])
        3. FUNCTIONS
          1. email.Encoders.encode_quopri(mess)
          2. email.Encoders.encode_base64(mess)
          3. email.Encoders.encode_7or8bit(mess)
        4. CLASSES
          1. email.Generator.Generator(file [,mangle_from_=l [,maxheaderlen=78]])
          2. email.Generator.DecodedGenerator(file [,mangle_from_ [,maxheaderlen [,fmt]]])
        5. METHODS
          1. email.Generator.Generator.clone()email.Generator.DecodedGenerator.clone()
          2. email.Generator.Generator.flatten(mess [,unixfrom=0])email.Generator.DecodedGenerator.flatten(mess [,unixfrom=0])
          3. email.Generator.Generator.write(s)email.Generator.DecodedGenerator.write(s)
        6. CLASSES
          1. email.Header.Header([s=“” [,charset [,maxlinelen=76 [,header_name=“”[,continuation_ws=“ ”]]]]])
        7. METHODS
          1. email.Header.Header.append(s [,charset])
          2. email.Header.Header.encode()email.Header.Header.__str__()
        8. FUNCTIONS
          1. email.Header.decode_header(header)
          2. email.Header.make_header(decoded_seq [,maxlinelen [,header_name [,continuation_ws]]])
        9. FUNCTIONS
          1. email.Iterators.body_line_iterator(mess)
          2. email.Iterators.typed_subpart_iterator(mess [,maintype=“text” [,subtype]])
          3. email.Iterators._structure(mess [,file=sys.stdout])
        10. CLASSES
          1. email.Message.Message()
        11. METHODS AND ATTRIBUTES
          1. email.Message.Message.add_header(field, value [,**params])
          2. email.Message.Message.as_string([unixfrom=0])
          3. email.Message.Message.attach(mess)
          4. email.Message.Message.del_param(param [,header=“ Content-Type” [,requote=1]])
          5. email.Message.Message.epilogue
          6. email.Message.Message.get_all(field [,failobj=None])
          7. email.Message.Message.get_boundary([failobj=None])
          8. email.Message.Message.get_charsets([failobj=None])
          9. email.Message.Message.get_content_charset([failobj=None])
          10. email.Message.Message.get_content_maintype()
          11. email.Message.Message.get_content_subtype()
          12. email.Message.Message.get_content_type()
          13. email.Message.Message.get_default_type()
          14. email.Message.Message.get_filename([failobj=None])
          15. email.Message.Message.get_param(param [,failobj [,header=...[,unquote=1]]])
          16. email.Message.Message.get_params([,failobj=None [,header=...[,unquote=1]]])
          17. email.Message.Message.get_payload([i [,decode=0]])
          18. email.Message.Message.get_unixfrom()
          19. email.Message.Message.is_multipart()
          20. email.Message.Message.preamble
          21. email.Message.Message.replace_header(field, value)
          22. email.Message.Message.set_boundary(s)
          23. email.Message.Message.set_default_type(ctype)
          24. email.Message.Message.set_param(param, value [,header=“ Content-Type” [,requote=1 [,charset [,language]]]])
          25. email.Message.Message.set_payload(payload [,charset=None])
          26. email.Message.Message.set_type(ctype [,header=“Content-Type” [,requote=1]])
          27. email.Message.Message.set_unixfrom(s)
          28. email.Message.Message.walk()
        12. CLASSES
          1. email.Parser.Parser([_class=email.Message.Message [,strict=0]])
          2. email.Parser.HeaderParser([_class=email.Message.Message [,strict=0]])
        13. METHODS
          1. email.Parser.Parser.parse(file [,headersonly=0])email.Parser.HeaderParser.parse(file [,headersonly=0])
          2. email.Parser.Parser.parsestr(s [,headersonly=0])email.Parser.HeaderParser.parsestr(s [,headersonly=0])
        14. FUNCTIONS
          1. email.Utils.decode_rfc2231(s)
          2. email.Utils.encode_rfc2231(s [,charset [,language]])
          3. email.Utils.formataddr(pair)
          4. email.Utils.formataddr([timeval [,localtime=0]])
          5. email.Utils.getaddresses(addresses)
          6. email.Utils.make_msgid([seed])
          7. email.Utils.mktime_tz(tuple)
          8. email.Utils.parseaddr(address)
          9. email.Utils.parsedate(datestr)
          10. email.Utils.parsedate_tz(datestr)
          11. email.Utils.quote(s)
          12. email.Utils.unquote(s)
      2. 5.1.2. Communicating with Mail Servers
        1. CLASSES
          1. imaplib.IMAP4([host=”localhost“ [port=143]])
        2. METHODS
          1. imaplib.IMAP4.close()
          2. imaplib.IMAP4.expunge()
          3. imaplib.IMAP4.fetch(message_set, message_parts)
          4. imaplib.IMAP4.list([dirname=”“ [,pattern=”*“])
          5. imaplib.IMAP4.login(user, passwd)
          6. imaplib.IMAP4.logout()
          7. imaplib.IMAP4.search(charset, criterion1 [,criterion2 [,...]])
          8. imaplib.lMAP4.select([mbox=”INBOX“ [,readonly=0])
        3. CLASSES
          1. poplib.POP3(host [,port=110])
        4. METHODS
          1. poplib.POP3.apop(user, secret)
          2. poplib.POP3.dele(messnum)
          3. poplib.POP3.pass_(password)
          4. poplib.POP3.quit()
          5. poplib.POP3.retr(messnum)
          6. poplib.POP3.rset()
          7. poplib.POP3.top(messnum, lines)
          8. poplib.POP3.stat()
          9. poplib.POP3.user(username)
        5. CLASSES
          1. smtplib.SMTP([host=”localhost“ [,port=25]])
        6. METHODS
          1. smtplib.SMTP.login(user, passwd)
          2. smtplib.SMTP.quit()
          3. smtplib.SMTP.sendmail(from_, to_, mess [,mail_options=[] [,rcpt_options=[]]])
      3. 5.1.3. Message Collections and Message Parts
        1. CLASSES
          1. mailbox.UnixMailbox(file [,factory=rfc822.Message])
          2. mailbox.PortableUnixMailbox(file [,factory=rfc822.Message])
          3. mailbox.BabylMailbox(file [,factory=rfc822.Message])
          4. mailbox.MmdfMailbox(file [,factory=rfc822.Message])
          5. mailbox.MHMailbox(dirname [,factory=rfc822.Message])
          6. mailbox.Maildir(dirname [,factory=rfc822.Message])
        2. FUNCTIONS
          1. mimetypes.guess_type(url [,strict=0])
          2. mimetypes.guess_extension(type [,strict=0])
          3. mimetypes.init([list-of-files])
          4. mimetypes.read_mime_types(fname)
        3. ATTRIBUTES
          1. mimetypes.common_types
          2. mimetypes.inited
          3. mimetypes.encodings_map
          4. mimetypes.knownfiles
          5. mimetypes.suffix_map
          6. mimetypes.types_map
    2. 5.2. World Wide Web Applications
      1. 5.2.1. Common Gateway Interface
        1. A CGI PRIMER
        2. CLASSES
          1. cgi.FieldStorage([fp=sys.stdin [,headers [,ob [,environ=os.environ [,keep_blank_values=0 [,strict_parsing=0]]]]]])
        3. METHODS
          1. cgi.FieldStorage.getfirst(key [,default=None])
          2. cgi.FieldStorage.getlist(key [,default=None])
          3. cgi.FieldStorage.getvalue(key [,default=None])
        4. ATTRIBUTES
          1. cgi.FieldStorage.file
          2. cgi.FieldStorage.filename
          3. cgi.FieldStorage.list
          4. cgi.FieldStorage.valuecgi.MiniFieldStorage.value
        5. METHODS
          1. cgitb.enable([display=1 [,logdir=None [context=5]]])
      2. 5.2.2. Parsing, Creating, and Manipulating HTML Documents
        1. ATTRIBUTES
          1. htmlentitydefs.entitydefs
        2. CLASSES
          1. HTMLParser.HTMLParser()
        3. METHODS AND ATTRIBUTES
          1. HTMLParser.HTMLParser.close()
          2. HTMLParser.HTMLParser.feed(data)
          3. HTMLParser.HTMLParser.getpos()
          4. HTMLParser.HTMLParser.handle_charref(name)
          5. HTMLParser.HTMLParser.handle_comment(data)
          6. HTMLParser.HTMLParser.handle_data(data)
          7. HTMLParser.HTMLParser.handle_decl(data)
          8. HTMLParser.HTMLParser.handle_endtag(tag)
          9. HTMLParser.HTMLParser.handle_entityref(name)
          10. HTMLParser.HTMLParser.handle_pi(data)
          11. HTMLParser.HTMLParser.handle_startendtag(tag, attrs)
          12. HTMLParser.HTMLParser.handle_starttag(tag, attrs)
          13. HTMLParser.HTMLParser.lasttag
          14. HTMLParser.HTMLParser.reset()
      3. 5.2.3. Accessing Internet Resources
        1. FUNCTIONS
          1. urllib.urlopen(url [,data])
          2. urllib.urlretrieve(url [,fname [,reporthook [,data]]])
          3. urllib.quote(s [,safe=“/”])
          4. urllib.quote_plus(s [,safe=“/”])
          5. urllib.unquote(s)
          6. urllib.unquote_plus(s)
          7. urllib.urlencode(query)
        2. CLASSES
          1. urllib.URLopener([proxies [,**x509]])
          2. urllib.FancyURLopener([proxies [,**x509]])
        3. METHODS AND ATTRIBUTES
          1. urllib.URLFancyopener.get_user_passwd(host, realm)
          2. urllib.URLopener.open(url [,data])urllib.URLFancyopener.open(url [,data])
          3. urllib.URLopener.open_unknown (url [,data])urllib.URLFancyopener.open_unknown (url [,data])
          4. urllib.URLFancyopener.prompt_user_passwd(host, realm)
          5. urllib.URLopener.retrieve(url [,fname [,reporthook [,data]]])urllib.URLFancyopener.retrieve(url [,fname [,reporthook [,data]]])
          6. urllib.URLopener.versionurllib.URFancyLopener.version
        4. FUNCTIONS
          1. urlparse.urlparse(url [,def_scheme=“” [,fragments=1]])
          2. urlparse.urlunparse(tup)
          3. urlparse.urljoin(base, file)
    3. 5.3. Synopses of Other Internet Modules
      1. 5.3.1. Standard Internet-Related Tools
        1. asyncore
        2. Cookie
        3. email.Charset
        4. ftplib
        5. gopherlib
        6. httplib
        7. ic, icopen
        8. icopen
        9. imghdr
        10. mailcap
        11. mhlib
        12. mimetools
        13. MimeWriter
        14. mimify
        15. netrc
        16. nntplib
        17. nsremote
        18. rfc822
        19. select
        20. sndhdr
        21. socket
        22. SocketServer
        23. telnetlib
        24. urllib2
        25. Webbrowser
      2. 5.3.2. Third-Party Internet Related Tools
        1. Quixote
        2. Twisted
        3. Zope
    4. 5.4. Understanding XML
      1. THE DATA MODEL
      2. OTHER XML FEATURES
      3. 5.4.1. Python Standard Library XML Modules
        1. xml.dom
        2. xml.dom.minidom
        3. xml.dom.pulldom
        4. xml.parsers.expat
        5. xml.sax
        6. xml.sax.handler
        7. xml.sax.saxutils
        8. xml.sax.xmlreader
        9. xmllib
        10. xmlrpclibSimpleXMLRPCServer
      4. 5.4.2. Third-Party XML-Related Tools
        1. gnosis.xml.indexer
        2. gnosis.xml.objectify
        3. gnosis.xml.pickle
        4. gnosis.xml.validity
        5. PyXML
        6. PYX
        7. 4Suite
        8. yaml
  9. A. A Selective and Impressionistic Short Review of Python
    1. A.1. What Kind of Language Is Python?
    2. A.2. Namespaces and Bindings
      1. A.2.1. Assignment and Dereferencing
      2. A.2.2. Function and Class Definitions
      3. A.2.3. import Statements
      4. A.2.4. for Statements
      5. A.2.5. except Statements
    3. A.3. Datatypes
      1. A.3.1. Simple Types
        1. bool
        2. int
        3. long
        4. float
        5. complex
        6. string
        7. unicode
      2. A.3.2. String Interpolation
      3. A.3.3. Printing
      4. A.3.4. Container Types
        1. tuple
        2. list
        3. dict
        4. sets.Set
      5. A.3.5. Compound Types
        1. class instance
    4. A.4. Flow Control
      1. A.4.1. if/then/else Statements
      2. A.4.2. Boolean Shortcutting
      3. A.4.3. for/continue/break Statements
      4. A.4.4. map(), filter(), reduce(), and List Comprehensions
      5. A.4.5. while/else/continue/break Statements
      6. A.4.6. Functions, Simple Generators, and the yield Statement
      7. A.4.7. Raising and Catching Exceptions
      8. A.4.8. Data as Code
        1. eval(s [,globals=globals() [,locals=locals()]])
        2. exec
        3. __import__(s [,globals=globals() [,locals=locals() [,fromlist]]])
        4. input([prompt])
        5. raw_input([prompt])
    5. A.5. Functional Programming
      1. A.5.1. Emphasizing Expressions Using lambda
      2. A.5.2. Special List Functions
      3. zip(seq1 [,seq2 [,...]])
        1. enumerate(collection)
      4. A.5.3. List-Application Functions as Flow Control
      5. A.5.4. Extended Call Syntax and apply()
  10. B. A Data Compression Primer
    1. B.1. Introduction
    2. B.2. Lossless and Lossy Compression
    3. B.3. A Data Set Example
    4. B.4. Whitespace Compression
    5. B.5. Run-Length Encoding
    6. B.6. Huffman Encoding
    7. B.7. Lempel Ziv-Compression
    8. B.8. Solving the Right Problem
    9. B.9. A Custom Text Compressor
    10. B.10. References
  11. C. Understanding Unicode
    1. C.1. Some Background on Characters
    2. C.2. What Is Unicode?
    3. C.3. Encodings
    4. C.4. Declarations
    5. C.5. Finding Codepoints
    6. C.6. Resources
  12. D. A State Machine for Adding Markup to Text
  13. E. Glossary