Text Processing in Python

Book description

Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.

Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.

Here is some of what you will find in thie book:

  • When do I use formal parsers to process structured and semi-structured data? Page 257

  • How do I work with full text indexing? Page 199

  • What patterns in text can be expressed using regular expressions? Page 204

  • How do I find a URL or an email address in text? Page 228

  • How do I process a report with a concrete state machine? Page 274

  • How do I parse, create, and manipulate internet formats? Page 345

  • How do I handle lossless and lossy compression? Page 454

  • How do I find codepoints in Unicode? Page 465



  • 0321112547B05022003

    Table of contents

    1. Copyright
    2. Preface
      1. 0.1. What Is Text Processing?
      2. 0.2. The Philosophy of Text Processing
      3. 0.3. What You'll Need to Use This Book
      4. 0.4. Conventions Used in This Book
      5. 0.5. A Word on Source Code Examples
      6. 0.6. External Resources
        1. 0.6.1. General Resources
        2. 0.6.2. Books
        3. 0.6.3. Software Directories
        4. 0.6.4. Specific Software
    3. Acknowledgments
    4. 1. Python Basics
      1. 1.1. Techniques and Patterns
        1. 1.1.1. Utilizing Higher-Order Functions in Text Processing
        2. 1.1.2. Exercise: More on combinatorial functions
          1. QUESTIONS
        3. 1.1.3. Specializing Python Datatypes
          1. PYTHONIC POLYMORPHISM
          2. ENHANCED OBJECTS
        4. 1.1.4. Base Classes for Datatypes
          1. METHODS
            1. object.__eq__(self, other)
            2. object.__ne__(self, other)
            3. object.__nonzero__(self)
            4. object.__len__(self)len(object)
            5. object.__repr__(self)repr(object)object.__str__(self)str(object)
          2. BUILT-IN FUNCTIONS
            1. open(fname [,mode [,buffering]])file(fname [,mode [,buffering]])
          3. METHODS AND ATTRIBUTES
            1. FILE.close()
            2. FILE.closed
            3. FILE.fileno()
            4. FILE.flush()
            5. FILE.isatty()
            6. FILE.mode
            7. FILE.name
            8. FILE.read ([size=sys.maxint])
            9. FILE.readline([size=sys.maxint])
            10. FILE.readlines([size=sys.maxint])
            11. FILE.seek(offset [,whence=0])
            12. FILE.tell()
            13. FILE.truncate([size=0])
            14. FILE.write(s)
            15. FILE.writelines(lines)
            16. FILE.xreadlines()
          4. METHODS
            1. int.__and__(self, other)int.__rand__(self, other)
            2. int.__hex__(self)
            3. int.__invert__(self)
            4. int.__lshift__(self, other)int.__rlshift__(self, other)
            5. int.__oct__(self)
            6. int.__or__(self, other)int.__ror__(self, other)
            7. int.__rshift__(self, other)int.__rrshift__(self, other)
            8. int.__xor__(self, other)int.__rxor__(self, other)
          5. DIGRESSION
          6. CAPABILITIES
          7. METHODS
            1. float.__abs__(self)
            2. float.__add__(self, other)float.__radd__(self, other)
            3. float.__cmp__(self, other)
            4. float.__div__(self, other)float.__rdiv__(self, other)
            5. float.__divmod__(self, other)float.__rdivmod__(self, other)
            6. float.__floordiv__(self, other)float.__rfloordiv__(self, other)
            7. float.__mod__(self, other)float.__rmod__(self, other)
            8. float.__mul__(self, other)float.__rmul__(self, other)
            9. float.__neg__(self)
            10. float.__pow__(self, other)float.__rpow__(self, other)
            11. float.__sub__(self, other)float.__rsub__(self, other)
            12. float.__truediv__(self, other)float.__rtruediv__(self, other)
          8. METHODS
            1. complex.conjugate(self)
            2. complex.imag
            3. complex.real
          9. METHODS
            1. dict.__cmp__(self, other)UserDict.UserDict.__cmp__(self, other)
            2. dict.__contains__(self, x)UserDict.UserDict.__contains__(self, x)
            3. dict.__delitem__(self, x)UserDict.UserDict.__delitem__(self, x)
            4. dict.__getitem__(self, x)UserDict.UserDict.__getitem__(self, x)
            5. dict.__len__(self)UserDict.UserDict.__len__(self)
            6. dict.__setitem__(self, key, val)UserDict.UserDict.__setitem__(self, key, val)
            7. dict.clear(self)UserDict.UserDict.clear(self)
            8. dict.copy(self)UserDict.UserDict.copy(self)
            9. dict.get(self, key [,default=None])UserDict.UserDict.get(self, key [,default=None])
            10. dict.has_key(self, key)UserDict.UserDict.has_key(self, key)
            11. dict.items(self)UserDict.UserDict.items(self)dict.iteritems(self)UserDict.UserDict.iteritems(self)
            12. dict.keys(self)UserDict.UserDict.keys(self)dict.iterkeys(self)UserDict.UserDict.iterkeys(self)
            13. dict.popitem(self)UserDict.UserDict.popitem(self)
            14. dict.setdefault(self, key [,default=None])UserDict.UserDict.setdefault(self, key [,default=None])
            15. dict.update(self, other)UserDict.UserDict.update(self, other)
            16. dict.values(self)UserDict.UserDict.values(self)dict.itervalues(self)UserDict.UserDict.itervalues(self)
          10. METHODS
            1. list.__add__(self, other)UserList.UserList.__add__(self, other)tuple.__add__(self, other)list.__iadd__(self, other)UserList.UserList.__iadd__(self, other)
            2. list.__contains__(self, x)UserList.UserList.__contains__(self, x)tuple.__contains__(self, x)
            3. list.__delitem__(self, x)UserList.UserList.__delitem__(self, x)
            4. list.__delslice__(self, start, end)UserList.UserList.__delslice__(self, start, end)
            5. list.__getitem__(self, pos)UserList.UserList.__getitem__(self, pos)tuple.__getitem__(self, pos)
            6. list.__getslice__(self, start, end)UserList.UserList.__getslice__(self, start, end)tuple.__getslice__(self, start, end)
            7. list.__hash__(self)UserList.UserList.__hash__(self)tuple.__hash__(self)
            8. list.__len__(selfUserList.UserList.__len__(selftuple.__len__(self
            9. list.__mul__(self, num)UserList.UserList.__mul__(self, num)tuple.__mul__(self, num)list.__rmul__(self, num)UserList.UserList.__rmul__(self, num)tuple.__rmul__(self, num)list.__imul__(self, num)UserList.UserList.__imul__(self, num)
            10. list.__setitem__(self, pos, val)UserList.UserList.__setitem__(self, pos, val)
            11. list.__setslice__(self, start, end, other)UserList.UserList.__setslice__(self, start, end, other)
            12. list.append(self, item)UserList.UserList.append(self, item)
            13. list.count(self, item)UserList.UserList.count(self, item)
            14. list.extend(self, seq)UserList.UserList.extend (self, seq)
            15. list.index(self, item)UserList.UserList.index(self, item)
            16. list.insert(self, pos, item)UserList.UserList.insert(self, pos, item)
            17. list.pop(self [,pos=-1])UserList.UserList.pop(self [,pos=-1])
            18. list.remove(self, item)UserList.UserList.remove(self, item)
            19. list.reverse(self)UserList.UserList.reverse(self)
            20. list.sort(self [cmpfunc])UserList.UserList.sort(self [,cmpfunc])
          11. METHODS
            1. str.__contains__(self, x)UserString.UserString.__contains__(self, x)
        5. 1.1.5. Exercise: Filling out the forms (or deciding not to)
          1. DISCUSSION
          2. QUESTIONS
        6. 1.1.6. Problem: Working with lines from a large file
          1. A CACHED LINE LIST
          2. A RANDOM LINE
      2. 1.2. Standard Modules
        1. 1.2.1. Working with the Python Interpreter
          1. FUNCTIONS
            1. copy.copy(obj)
            2. copy.deepcopy(obj)
          2. FUNCTIONS
            1. getopt.getopt(args, options [,long_options]])
          3. ATTRIBUTES
            1. sys.argv
            2. sys.byteorder
            3. sys.copyright
            4. sys.hexversion
            5. sys.maxint
            6. sys.maxunicode
            7. sys.path
            8. sys.platform
            9. sys.stderrsys.__stderr__
            10. sys.stdinsys.__stdin__
            11. sys.stdoutsys.__stdout__
            12. sys.version
            13. sys.version_info
          4. FUNCTIONS
            1. sys.exit ([code=0])
            2. sys.getdefaultencoding()
            3. sys.getrefcount(obj)
          5. BUILT-IN
            1. type(o)
          6. CONSTANTS
            1. types.BuiltinFunctionTypetypes.BuiltinMethodType
            2. types.BufferType
            3. types.Class Type
            4. types.CodeType
            5. types.ComplexType
            6. types.DictTypetypes.DictionaryType
            7. types.EllipsisType
            8. types.FileType
            9. types.FloatType
            10. types.FrameType
            11. types.FunctionTypetypes.LambdaType
            12. types.GeneratorType
            13. types.InstanceType
            14. types.IntType
            15. types.ListType
            16. types.LongType
            17. types.MethodTypetypes.Unbound MethodType
            18. types.ModuleType
            19. types.NoneType
            20. types.StringType
            21. types.TracebackType
            22. types.TupleType
            23. types.UnicodeType
            24. types.SliceType
            25. types.StringTypes
            26. types.TypeType
            27. types.XRangeType
        2. 1.2.2. Working with the Local Filesystem
          1. FUNCTIONS
            1. dircache.listdir(path)
            2. dircache.opendir(path)
            3. dircache.annotate(path, lst)
          2. FUNCTIONS
            1. filecmp.cmp(fname1, fname2 [,shallow=1 [,use_statcache=0]])
            2. filecmp.cmpfiles(dirname1, dirname2, fnamelist [,shallow=1 [,use_statcache=0]])
          3. CLASSES
            1. filecmp.dircmp(dirname1, dirname2 [,ignore=...[,hide=...])
          4. METHODS AND ATTRIBUTES
            1. filecmp.dircmp.report()
            2. filecmp.dircmp.report_partial_closure()
            3. filecmp.dircmp.report_partial_closure()
            4. filecmp.dircmp.left_list
            5. filecmp.dircmp.right_list
            6. filecmp.dircmp.common
            7. filecmp.dircmp.left_only
            8. filecmp.dircmp.right_only
            9. filecmp.dircmp.common_dirs
            10. filecmp.dircmp.common_files
            11. filecmp.dircmp.common_funny
            12. filecmp.dircmp.same_files
            13. filecmp.dircmp.diff_files
            14. filecmp.dircmp.funny_files
            15. filecmp.dircmp.subdirs
          5. FUNCTIONS
            1. fileinput.input([files=sys.argv[1:] [,inplace=0 [,backup=“.bak”]]])
            2. fileinput.close()
            3. fileinput.nextfile()
            4. fileinput.filelineno()
            5. fileinput.filename()
            6. fileinput.isfirstline()
            7. fileinput.isstdin()
            8. fileinput.lineno()
          6. CLASSES
            1. fileinput.Filelnput([files [,inplace=0 [,backup=“.bak”]]])
          7. FUNCTIONS
            1. glob.glob(pat)
          8. FUNCTIONS
            1. linecache.getline(fname, linenum)
            2. linecache.clearcache()
            3. linecache.checkcache()
          9. FUNCTIONS
            1. os.path.abspath(pathname)
            2. os.path.basename(pathname)
            3. os .path.commonprefix(pathlist)
            4. os.path.dirname(pathname)
            5. os.path.exists(pathname)
            6. os.path.expanduser(pathname)
            7. os.path.expandvars(pathname)
            8. os.path.getatime(pathname)
            9. os.path.getmtime(pathname)
            10. os.path.getsize(pathname)
            11. os.path.isabs(pathname)
            12. os.path.isdir(pathname)
            13. os.path.isfile(pathname)
            14. os.path.islink(pathname)
            15. os.path.ismount(pathname)
            16. os.path.join(path1 [,path2 [...]])
            17. os.path.normcase(pathname)
            18. os.path.normpath(pathname)
            19. os.path.realpath(pathname)
            20. os.path.samefile(pathname1, pathname2)
            21. os.path.sameopenfile(fp1, fp2)
            22. os.path.split(pathname)
            23. os.path.splitdrive(pathname)
            24. os.path.walk(pathname, visitfunc, arg)
          10. FUNCTIONS
            1. shutil.copy(src, dst)
            2. shutil.copy2(src, dst)
            3. shutil.copyfile(src, dst)
            4. shutil.copyfileobj(fpsrc, fpdst [,buffer=-1])
            5. shutil.copymode(src, dst)
            6. shutil.copystat(src, dst)
            7. shutil.copytree(src, dst [,symlinks=0])
            8. shutil.rmtree(dirname [ignore [,errorhandler]])
          11. FUNCTIONS
            1. stat.S_ISDIR(mode)
            2. stat.S_ISCHR(mode)
            3. stat.S_ISBLK(mode)
            4. stat.S_ISREG(mode)
            5. stat.S_ISFIFO(mode)
            6. stat.S_ISLNK(mode)
            7. stat.S_ISSOCK(mode)
          12. CONSTANTS
            1. stat.ST_MODE
            2. stat.ST_INO
            3. stat.ST_DEV
            4. stat.ST_NLINK
            5. stat.ST_UID
            6. stat.ST_GID
            7. stat.ST_SIZE
            8. stat.ST_ATIME
            9. stat.ST_MTIME
            10. stat.ST_CTIME
          13. FUNCTIONS
            1. tempfile.mktemp([suffix=“”])
            2. tempfile.TemporaryFile([mode=“w+b” [,buffsize=-1 [suffix=“”]]])
          14. FUNCTIONS
            1. xreadlines.xreadlines(fp)
        3. 1.2.3. Running External Commands and Accessing OS Features
          1. FUNCTIONS
            1. commands.getoutput(cmd)
            2. commands.getstatusoutput(cmd)
            3. commands.getstatus(filename)
          2. FUNCTIONS
            1. os.access(pathname, operation)
            2. os.chdir(pathname)
            3. os.chmod(pathname, mode)
            4. os.chown(pathname, uid, gid)
            5. os.chroot(pathname)
            6. os.getcwd()
            7. os.getenv(var [,value=None])
            8. os.getpid()
            9. os.kill(pid, sig)
            10. os.link(src, dst)
            11. os.listdir(pathname)
            12. os.lstat(pathname)
            13. os.mkdir(pathname [,mode=0777])
            14. os.mkdirs(pathname [,mode=0777])
            15. os.mkfifo(pathname [,mode=0666])
            16. os.nice(increment)
            17. os.popen(cmd [,mode=“r” [,bufsize]])
            18. os.popen2(cmd [,mode [,bufsize]])
            19. os.popen3(cmd [,mode [,bufsize]])
            20. os.popen4(cmd [,mode [,bufsize]])
            21. os.putenv(var, value)
            22. os.readlink(linkname)
            23. os.remove(filename)
            24. os.removedirs(pathname)
            25. os.rename(src, dst)
            26. os.renames(src, dst)
            27. os.rmdir(pathname)
            28. os.startfile(path)
            29. os.stat(pathname)
            30. os.strerror(code)
            31. os.symlink(src, dst)
            32. os.system(cmd)
            33. os.tempnam([dir [,prefix]])
            34. os.tmpfile()
            35. os.uname()
            36. os.unlink(filename)
            37. os.utime(pathname, times)
          3. CONSTANTS AND ATTRIBUTES
            1. os.altsep
            2. os.curdir
            3. os.defpath
            4. os.environ
            5. os.linesep
            6. os.name
            7. os.pardir
            8. os.pathsep
            9. os.sep
        4. 1.2.4. Special Data Values and Formats
          1. FUNCTIONS
            1. random.betavariate(alpha, beta)
            2. random.choice(seq)
            3. random.cunifvariate(mean, arc)
            4. random.expovariate(lambda_)
            5. random.gamma(alpha, beta)
            6. random.gauss(mu, sigma)
            7. random.lognormvariate(mu, sigma)
            8. random.normalvariate(mu, sigma)
            9. random.paretovariate(alpha)
            10. random.random()
            11. random.randrange([start=0,] stop [,step=1])
            12. random.seed([x=time.time()])
            13. random.shuffle(seq [,random=random.random])
            14. random.uniform(min, max)
            15. random.vonmisesvariate(mu, kappa)
            16. random.weibullvariate(alpha, beta)
          2. FUNCTIONS
            1. struct.calcsize(fmt)
            2. struct.pack(fmt, v1 [,v2 [...]])
            3. struct.unpack(fmt, s)
          3. CONSTANTS AND ATTRIBUTES
            1. time.accept2dyear
            2. time.altzonetime.daylighttime.timezonetime.tzname
          4. FUNCTIONS
            1. time.asctime([tuple=time.localtime()])
            2. time.clock()
            3. time.ctime([seconds=time.time()])
            4. time.gmtime([seconds=time.time()])
            5. time.localtime([seconds=time.time()])
            6. time.mktime(tuple)
            7. time.sleep(seconds)
            8. time.strftime(format [,tuple=time.localtime()])
            9. time.strptime(s [,format=“%a %b %d %H:%M:%S %Y”])
            10. time.time()
      3. 1.3. Other Modules in the Standard Library
        1. __builtin__
        2. 1.3.1. Serializing and Storing Python Objects
          1. FUNCTIONS
            1. DBM.open(fname [,flag=“r” [,mode=0666]])
          2. METHODS
            1. DBM.close()
            2. DBM.first()
            3. DBM.has_key(key)
            4. DBM.keys()
            5. DBM.Iast()
            6. DBM.next()
            7. DBM.previous()
            8. DBM.sync()
          3. MODULES
            1. anydbm
            2. bsddb
            3. dbhash
            4. dbm
            5. dumbdbm
            6. gdbm
            7. whichdb
          4. FUNCTIONS
            1. pickle.dump(o, file [,bin=0])cPickle.dump(o, file [,bin=0])
            2. pickle.dumps(o [,bin=0])cPickle.dumps(o [,bin=0])
            3. pickle.load(file)cPickle.load(file)
            4. pickle.loads(s)cPickle.load(s)
            5. marshal
          5. FUNCTIONS
            1. pprint.isreadable(o)
            2. pprint.isrecursive(o)
            3. pprint.pformat(o)
            4. pprint.pprint(o [,stream=sys.stdout])
          6. CLASSES
            1. pprint.PrettyPrinter(width=80, depth=..., indent=1, stream=sys.stdout)
          7. METHODS
          8. CLASSES
            1. repr.Repr()
          9. ATTRIBUTES
            1. repr.maxlevel
            2. repr.maxdictrepr.maxlistrepr.maxtuple
            3. repr.maxlong
            4. repr.maxstring
            5. repr.maxother
          10. FUNCTIONS
            1. repr.repr(o)
            2. repr.repr_TYPE(o, level)
        3. 1.3.2. Platform-Specific Operations
          1. _winreg
          2. AE
          3. aepack
          4. aetypes
          5. applesingle
          6. buildtools
          7. calendar
          8. Carbon.AE, Carbon.App, Carbon.CF, Carbon.Cm, Carbon.Ctl, Carbon.Dlg, Carbon.Evt, Carbon.Fm, Carbon.Help, Carbon.List, Carbon.Menu, Carbon.Mlte, Carbon.Qd, Carbon.Qdoffs, Carbon.Qt, Carbon.Res, Carbon.Scrap, Carbon.Snd, Carbon.TE, Carbon.Win
          9. cd
          10. cfmfile
          11. ColorPicker
          12. ctb
          13. dl
          14. EasyDialogs
          15. fcntl
          16. findertools
          17. fl, FL, flp
          18. fm, FM
          19. fpectl
          20. FrameWork, MiniAEFrame
          21. gettext
          22. grp
          23. locale
          24. mac, macerrors, macpath
          25. macfs, macfsn, macostools
          26. MacOS
          27. macresource
          28. macspeech
          29. mactty
          30. mkcwproject
          31. msvcrt
          32. Nac
          33. nis
          34. pipes
          35. PixMapWrapper
          36. posix, posixfile
          37. preferences
          38. pty
          39. pwd
          40. pythonprefs
          41. py_resource
          42. quietconsole
          43. resource
          44. syslog
          45. tty, termios, TERMIOS
          46. W
          47. waste
          48. winsound
          49. xdrlib
        4. 1.3.3. Working with Multimedia Formats
          1. aifc
          2. al, AL
          3. audioop
          4. chunk
          5. colorsys
          6. gl, DEVICE, GL
          7. imageop
          8. imgfile
          9. jpeg
          10. rgbimg
          11. sunau
          12. sunaudiodev, SUNAUDIODEV
          13. videoreader
          14. wave
        5. 1.3.4. Miscellaneous Other Modules
          1. array
          2. atexit
          3. BaseHTTPServer, SimpleHTTPServer, SimpleXMLRPCServer, CGIHTTPServer
          4. Bastion
          5. bisect
          6. cmath
          7. cmd
          8. code
          9. codeop
          10. compileall
          11. compile, compile.ast, compile.visitor
          12. copy_reg
          13. curses, curses.ascii, curses.panel, curses.textpad, curses.wrapper
          14. dircache
          15. dis
          16. distutils
          17. doctest
          18. errno
          19. fpformat
          20. gc
          21. getpass
          22. imp
          23. inspect
          24. keyword
          25. math
          26. mutex
          27. new
          28. pdb
          29. popen2
          30. profile
          31. pstats
          32. pyclbr
          33. pydoc
          34. py_compile
          35. Queue
          36. readline, rlcompleter
          37. rexec
          38. sched
          39. signal
          40. site, user
          41. statcache
          42. statvfs
          43. thread, threading
          44. Tkinter, ScrolledText, Tix, turtle
          45. traceback
          46. unittest
          47. warnings
          48. weakref
          49. whrandom
    5. 2. Basic String Operations
      1. 2.1. Some Common Tasks
        1. 2.1.1. Problem: Quickly sorting lines on custom criteria
        2. 2.1.2. Problem: Reformatting paragraphs of text
        3. 2.1.3. Problem: Column statistics for delimited or flat-record files
        4. 2.1.4. Problem: Counting characters, words, lines, and paragraphs
        5. 2.1.5. Problem: Transmitting binary data as ASCII
        6. 2.1.6. Problem: Creating word or letter histograms
        7. 2.1.7. Problem: Reading a file backwards by record, line, or paragraph
          1. QUESTIONS
      2. 2.2. Standard Modules
        1. 2.2.1. Basic String Transformations
          1. CONSTANTS
            1. string.digits
            2. string.hexdigits
            3. string.octdigits
            4. string.lowercase
            5. string.uppercase
            6. string.letters
            7. string.punctuation
            8. string.whitespace
            9. string.printable
          2. FUNCTIONS
            1. string.atof(s=...)
            2. string.atoi(s=...[,base=10])
            3. string.atol(s=...[,base=10])
            4. string.capitalize(s=...)”“.capitalize()
            5. string.capwords(s=...)”“.title()
            6. string.center(s=. . . , width=...)”“.center(width)
            7. string.count(s, sub [,start [,end]])”“.count(sub [,start [,end]])
            8. ”“.endswith(suffix [,start [,end]])
            9. string.expandtabs(s=...[,tabsize=8])”“.expandtabs([,tabsize=8])
            10. string.find(s, sub [,start [,end]])”“.find(sub [,start [,end]])
            11. string.index(s, sub [,start [,end]])”“.index(sub [,start [,end]])
            12. ”“.isalpha()
            13. ”“.isalnum()
            14. ”“.isdigit()
            15. ”“.islower()
            16. ”“.isspace()
            17. ”“.istitle()
            18. ”“.isupper()
            19. string.join(words=...[,sep=” “])”“.join (words)
            20. string.joinfields(...)
            21. string.ljust(s=..., width=...)”“.Ijust(width)
            22. string.lower(s=...)”“.lower()
            23. string.lstrip(s=...)”“.lstrip([chars=string.whitespace])
            24. string.maketrans(from, to)
            25. string.replace(s=..., old=..., new=...[,maxsplit=...])”“.replace(old, new [,maxsplit])
            26. string.rfind(s, sub [,start [,end]])”“.rfind(sub [,start [,end]])
            27. string.rindex(s, sub [,start [,end]])”“.rindex(sub [,start [,end]])
            28. string.rjust(s=..., width=...)”“.rjust(width)
            29. string.rstrip(s=...)”“.rstrip([chars=string.whitespace])
            30. string.split(s=...[,sep=...[,maxsplit=...]])”“.split([,sep [,maxsplit]])
            31. string.splitfields(...)
            32. ”“.splitlines([keepends=0])
            33. ”“.startswith(prefix [,start [,end]])
            34. string.strip(s=...)”“.strip([chars=string.whitespace])
            35. string.swapcase(s=...)”“.swapcase()
            36. string.translate(s=..., table=...[,deletechars=”“])”“.translate(table [,deletechars=”“])
            37. string.upper(s=...)”“.upper()
            38. string.zfill(s=..., width=...)
        2. 2.2.2. Strings as Files, and Files as Strings
          1. CLASSES
            1. mmap.mmap(fileno, length [,tagname]) (Windows)mmap.mmap(fileno, length [,flags=MAP_SHARED, prot=PROT_READ|PROT_WRITE])
          2. METHODS
            1. mmap.mmap.close()
            2. mmap.mmap.find(sub [,pos])
            3. mmap.mmap.flush([offset, size])
            4. mmap.mmap.move(target, source, length)
            5. mmap.mmap.read(num)
            6. mmap.mmap.read_byte()
            7. mmap.mmap.readline()
            8. mmap.mmap.resize(newsize)
            9. mmap.mmap.seek(offset [,mode])
            10. mmap.mmap.size()
            11. mmap.mmap.tell()
            12. mmap.mmap.write(s)
            13. mmap.mmap.write_byte(c)
          3. CONSTANTS
            1. cStringIO.InputType
            2. cStringlO.OutputType
          4. CLASSES
            1. StringlO.StringIO ([buf=...])cStringIO.StringIO([buf])
          5. METHODS
            1. StringIO.StringIO.close()cStringIO.StringIO.close()
            2. StringIO.StringIO.flush()cStringIO.StringIO.flush()
            3. StringIO.StringIO.getvalue()cStringIO.StringIO.getvalue()
            4. StringIO.StringIO.isatty()cStringIO.StringIO.isatty()
            5. StringIO.StringIO.read ([num])cStringIO.StringIO.read ([num])
            6. StringIO.StringIO.readline([length=...])cStringIO.StringIO.readline([length])
            7. StringIO.StringIO.readlines([sizehint=...])cStringIO.StringIO.readlines([sizehint]
            8. cStringIO.StringIO.reset()
            9. StringIO.StringIO.seek(offset [,mode=0])cStringIO.StringIO.seek(offset [,mode])
            10. StringIO.StringIO.tell()cStringIO.StringIO.tell()
            11. StringIO.StringIO.truncate([len=0])cStringIO.StringIO.truncate ([len])
            12. StringIO.StringIO.write(s=...)cStringIO.StringIO.write(s)
            13. StringIO.StringIO.writelines(list=...)cStringIO.String IO.writelines(list)
        3. 2.2.3. Converting Between Binary and ASCII
          1. FUNCTIONS
            1. base64.encode(input=..., output=...)
            2. base64.encodestring(s=...)
            3. base64.decode(input=..., output=...)
            4. base64.decodestring(s=...)
          2. FUNCTIONS
            1. binascii.a2b_base64(s)
            2. binascii.a2b_hex(s)
            3. binascii.a2b_hqx(s)
            4. binascii.a2b_qp(s [,header=0])
            5. binascii.a2b_uu(s)
            6. binascii.b2a_base64(s)
            7. binascii.b2a_hex(s)
            8. binascii.b2a_hqx(s)
            9. binascii.b2a_qp(s [,quotetabs=0 [,istext=1 [header=0]]])
            10. binascii.b2a_uu(s)
            11. binascii.crc32(s [,crc])
            12. binascii.crc_hqx(s, crc)
            13. binascii.hexlify(s)
            14. binascii.rlecode_hqx(s)
            15. binascii.rledecode_hqx(s)
            16. binascii.unhexlify(s)
          3. EXCEPTIONS
            1. binascii.Error
            2. binascii.Incomplete
          4. FUNCTIONS
            1. binhex.binhex(inp=..., out=...)
            2. binhex.hexbin(inp=...[,out=...])
          5. CLASSES
          6. FUNCTIONS
            1. quopri.encode(input, output, quotetabs)
            2. quopri.encodestring(s [,quotetabs=0])
            3. quopri.decode(input=..., output=...[,header=0])
            4. quopri.decodestring(s [,header=0])
          7. FUNCTIONS
            1. uu.encode(in, out, [name=...[,mode=0666]])
            2. uu.decode(in, [,out_file=...[, mode=...])
        4. 2.2.4. Cryptography
          1. mxCryptoamkCrypto
          2. Python Cryptography
          3. M2Crypto
          4. fcrypt
          5. FUNCTIONS
            1. crypt.crypt(passwd, salt)
          6. CONSTANTS
            1. md5.MD5Type
          7. CLASSES
            1. md5.new([s])
            2. md5.md5([s])
          8. METHODS
            1. md5.copy()
            2. md5.digest()
            3. md5.hexdigest()
            4. md5.update(s)
          9. CLASSES
            1. rotor.newrotor(key [,numrotors])
          10. METHODS
            1. rotor.decrypt(s)
            2. rotor.decryptmore(s)
            3. rotor.encrypt(s)
            4. rotor.encryptmore(s)
            5. rotor.setkey (key)
          11. CLASSES
            1. sha.new([s])
            2. sha.sha ([s])
          12. METHODS
            1. sha.copy()
            2. sha.digest()
            3. sha.hexdigest()
            4. sha.update(s)
        5. 2.2.5. Compression
          1. CLASSES
            1. gzip.GzipFile([filename=...[,mode=”rb“ [,compresslevel=9 [,fileobj=...]]]])
            2. gzip.open(filename=...[mode='rb [,compresslevel=9]])
          2. METHODS AND ATTRIBUTES
            1. gzip.close()
            2. gzip.flush()
            3. gzip.isatty()
            4. gzip.myfileobj
            5. gzip.read([num])
            6. gzip.readline([length])
            7. gzip.readlines([sizehint=...])
            8. gzip.write(s)
            9. gzip.writelines(list)
          3. CONSTANTS
          4. FUNCTIONS
            1. zipfile.is_zipfile(filename=...)
          5. CLASSES
            1. zipfile.PyZipFile(pathname)
            2. zipfile.ZipFile(file=...[,mode='r' [,compression=ZIP_STORED]])
            3. zipfile.Ziplnfo()
          6. METHODS AND ATTRIBUTES
            1. zipfile.ZipFile.close()
            2. zipfile.ZipFile.getinfo(name=...)
            3. zipfile.ZipFile.infolist()
            4. zipfile.ZipFile.namelist()
            5. zipfile.ZipFile.printdir()
            6. zipfile.ZipFile.read(name=...)
            7. zipfile.ZipFile.testzip()
            8. zipfile.ZipFile.write(filename=...[,arcname=...[,compress_type=...]])
            9. zipfile.ZipFile.writestr(zinfo=..., bytes=...)
            10. zipfile.ZipFile.NameTolnfo
            11. zipfile.ZipFile.compression
            12. zipfile.ZipFile.debug = 0
            13. zipfile.ZipFile.filelist
            14. zipfile.ZipFile.filename
            15. zipfile.ZipFile.fp
            16. zipfile.ZipFile.mode
            17. zipfile.ZipFile.start_dir
            18. zipfile.Ziplnfo.CRC
            19. zipfile.ZipInfo.comment
            20. zipfile.ZipInfo.compress_size
            21. zipfile.ZipInfo.compress_type
            22. zipfile.ZipInfo.create_system
            23. zipfile.ZipInfo.create_version
            24. zipfile.ZipInfo.date_time
            25. zipfile.ZipInfo.external_attr
            26. zipfile.ZipInfo.extract_version
            27. zipfile.ZipInfo.file_offset
            28. zipfile.ZipInfo.file size
            29. zipfile.ZipInfo.filename
            30. zipfile.ZipInfo.header_offset
            31. zipfile.ZipInfo.volume
          7. EXCEPTIONS
            1. zipfile.error
            2. zipfile.BadZipFile
          8. CONSTANTS
            1. zlib.ZLIB_VERSION
            2. zlib.Z_BEST_COMPRESSION = 9
            3. zlib.Z_BEST_SPEED = 1
            4. zlib.Z_HUFFMAN_ONLY = 2
          9. FUNCTIONS
            1. zlib.adler32(s [,crc])
            2. zlib.compress(s [,level])
            3. zlib.crc32(s [,crc])
            4. zlib.decompress(s [,winsize [,buffsize]])
          10. CLASS FACTORIES
            1. zlib.compressobj([level])
            2. zlib.decompressobj([winsize])
          11. METHODS AND ATTRIBUTES
            1. zlib.compressobj.compress(s)
            2. zlib.compressobj.flush([mode])
            3. zlib.decompressobj.unused_data
            4. zlib.decompressobj.decompress (s)
            5. zlib.decompressobj.flush()
          12. EXCEPTIONS
            1. zlib.error
        6. 2.2.6. Unicode
          1. ascii, us-ascii
          2. base64
          3. latin-1, iso-8859-1
          4. quopri
          5. rot13
          6. utf-7
          7. utf-8
          8. utf-16
          9. utf-16-le
          10. utf-16-be
          11. unicode-escape
          12. raw-unicode-escape
          13. strict
          14. ignore
          15. replace
          16. u”“.encode([enc [,errmode]])”“.encode([enc [,errmode]])
          17. unicode(s [,enc [,errmode]])
          18. unichr(cp)
          19. codecs.open(filename=...[,mode='rb' [,encoding=...[,errors='strict' [,buffering=1]]]])
          20. codecs.EncodedFile(file=..., data_encoding=...[,file_encoding=...[,errors='strict']])
          21. FUNCTIONS
            1. unicodedata.bidirectional(unichr)
            2. unicodedata.category (unichr)
            3. unicodedata.combining(unichr)
            4. unicodedata.decimal(unichr [,default])
            5. unicodedata.decomposition(unichr)
            6. unicodedata.digit(unichr [,default])
            7. unicodedata.lookup(name)
            8. unicodedata.mirrored(unichr)
            9. unicodedata.name(unichr)
            10. unicodedata.numeric(unichr [,default])
      3. 2.3. Solving Problems
        1. 2.3.1. Exercise: Many ways to take out the garbage
          1. DISCUSSION
          2. QUESTIONS
        2. 2.3.2. Exercise: Making sure things are what they should be
          1. DISCUSSION
          2. QUESTIONS
        3. 2.3.3. Exercise: Finding needles in haystacks (full-text indexing)
          1. DISCUSSION
          2. QUESTIONS
    6. 3. Regular Expressions
      1. 3.1. A Regular Expression Tutorial
        1. 3.1.1. Just What Is a Regular Expression, Anyway?
        2. 3.1.2. Matching Patterns in Text: The Basics
        3. 3.1.3. Matching Patterns in Text: Intermediate
        4. 3.1.4. Advanced Regular Expression Extensions
      2. 3.2. Some Common Tasks
        1. 3.2.1. Problem: Making a text block flush left
        2. 3.2.2. Problem: Summarizing command-line option documentation
        3. 3.2.3. Problem: Detecting duplicate words
        4. 3.2.4. Problem: Checking for server errors
        5. 3.2.5. Problem: Reading lines with continuation characters
        6. 3.2.6. Problem: Identifying URLs and email addresses in texts
        7. 3.2.7. Problem: Pretty-printing numbers
      3. 3.3. Standard Modules
        1. 3.3.1. Versions and Optimizations
        2. 3.3.2. Simple Pattern Matching
          1. FUNCTIONS
            1. fnmatch.fnmatch(s, pat)
            2. fnmatch.fnmatchcase(s, pat)
            3. fnmatch.filter(lst, pat)
        3. 3.3.3. Regular Expression Modules
          1. FUNCTIONS
            1. reconvert.convert(s)
          2. PATTERN SUMMARY
          3. ATOMIC OPERATORS
            1. Plain symbol
            2. Escape: “\”
            3. Grouping operators: “(”, “)”
            4. Backreference: “\d”, “\dd”
            5. Character classes: “[”, “]”
            6. Digit character class: “\d”
            7. Non-digit character class: “\D”
            8. Alphanumeric character class: “\w”
            9. Non-alphanumeric character class: “\W”
            10. Whitespace character class: “\s”
            11. Non-whitespace character class: “\S”
            12. Wildcard character: “.”
            13. Beginning of line: “^”
            14. Beginning of string: “\A”
            15. End of line: “$”
            16. End of string: “\Z”
            17. Word boundary: “\b”
            18. Non-word boundary: “\B”
            19. Alternation operator: “ |”
          4. QUANTIFIERS
            1. Universal quantifier: “*”
            2. Non-greedy universal quantifier: “*?”
            3. Existential quantifier: “+”
            4. Non-greedy existential quantifier: “+?”
            5. Potentiality quantifier: “?”
            6. Non-greedy potentiality quantifier: “??”
            7. Exact numeric quantifier: “{num}”
            8. Lower-bound quantifier: “{min,}”
            9. Bounded numeric quantifier: “{min,max}”
            10. Non-greedy bounded quantifier: “{min,max}?”
          5. GROUP-LIKE PATTERNS
            1. Pattern modifiers: “(?Limsux)”
            2. Comments: “(?#...)”
            3. Non-backreferenced atom: “(?:...)”
            4. Positive Lookahead assertion: “(?=...)”
            5. Negative Lookahead assertion: “(?!...)”
            6. Positive Lookbehind assertion: “(?< =...)”
            7. Negative Lookbehind assertion: “(?<!...)”
            8. Named group identifier: “(?P<name>)”
            9. Named group backreference: “(?P=name)”
          6. CONSTANTS
            1. re.l, re.IGNORECASE
            2. re.L, re.LOCALE
            3. re.M, re.MULTILINE
            4. re.S, re.DOTALL
            5. re.U, re.UNICODE
            6. re.X, re.VERBOSE
            7. re.engine
          7. FUNCTIONS
            1. re.escape(s)
            2. re.findall(pattern=..., string=...)
            3. re.purge()
            4. re.split(pattern=..., string=...[,maxsplit=0])
            5. re.sub(pattern=..., repl=..., string=...[,count=0])
            6. re.subn(pattern=..., repl=..., string=...[,count=0])
          8. CLASS FACTORIES
            1. re.compile(pattern=...[,flags=...])
            2. re.match(pattern=..., string=...[,flags=...])
            3. re.search(pattern=..., string=...[,flags=...])
          9. METHODS AND ATTRIBUTES
            1. re.compile.findall(s)
            2. re.compile.flags
            3. re.compile.groupindex
            4. re.compile.match(s [,start [,end]])
            5. re.compile.pattern
            6. re.compile.search(s [,start [,end]])
            7. re.compile.split(s [,maxsplit])
            8. re.compile.sub(repl, s [,count=0])
            9. re.compile.subn()
            10. re.match.end([group])re.search.end ([group])
            11. re.match.endpos, re.search.endpos
            12. re.match.expand(template)re.search.expand(template)
            13. re.match.group([group [,...]])re.search.group([group [,...]])
            14. re.match.groupdict([defval])re.search.groupdict([defval])
            15. re.match.groups([defval])re.search.groups([defval])
            16. re.match.lastgroup, re.search.lastgroup
            17. re.match.lastindex, re.search.lastindex
            18. re.match.pos, re.search.pos
            19. re.match.re, re.search.re
            20. re.match.span ([group])re.search.span([group])
            21. re.match.start ([group])re.search.start ([group])
            22. re.match.string, re.search.string
          10. EXCEPTIONS
            1. re.error
    7. 4. Parsers and State Machines
      1. 4.1. An Introduction to Parsers
        1. 4.1.1. When Data Becomes Deep and Texts Become Stateful
        2. 4.1.2. What Is a Grammar?
        3. 4.1.3. An EBNF Grammar for IF/THEN/END Structures
        4. 4.1.4. Pencil-and-Paper Parsing
        5. 4.1.5. Exercise: Some variations on the language
      2. 4.2. An Introduction to State Machines
        1. 4.2.1. Understanding State Machines
        2. 4.2.2. Text Processing State Machines
        3. 4.2.3. When Not to Use a State Machine
          1. Sidebar: A digression on functional programming
        4. 4.2.4. When to Use a State Machine
        5. 4.2.5. An Abstract State Machine Class
        6. 4.2.6. Processing a Report with a Concrete State Machine
        7. 4.2.7. Subgraphs and State Reuse
        8. 4.2.8. Exercise: Finding other solutions
      3. 4.3. Parser Libraries for Python
        1. 4.3.1. Specialized Parsers in the Standard Library
          1. ConfigParser
          2. difflib.../Tools/scripts/ndiff.py
          3. formatter
          4. htmllib
          5. multifile
          6. parsersymboltokentokenize
          7. robotparser
          8. sgmllib
          9. shlex
          10. tabnanny
        2. 4.3.2. Low-Level State Machine Parsing
          1. BENCHMARKS
            1. Example: Buyer/Order Report Parsing
            2. Tag table state in buyers
            3. Subtable states in buyers
            4. Example: Marking up smart ASCII
          2. DEBUGGING A TAG TABLE
          3. CONSTANTS
            1. mx.TextTools.a2zmx.TextTools.a2z_set
            2. mx.TextTools.A2Zmx.TextTools.A2Z_set
            3. mx.TextTools.umlautemx.TextTools.umlaute_set
            4. mx.TextTools.Umlautemx.TextTools.Umlaute_set
            5. mx.TextTools.alphamx.TextTools.alpha_set
            6. mx.TextTools.german_alphamx.TextTools.german_alpha_set
            7. mx.TextTools.numbermx.TextTools.number_set
            8. mx.TextTools.alphanumericmx.TextTools.alphanumeric_set
            9. mx.TextTools.whitemx.TextTools.white_set
            10. mx.TextTools.newlinemx.TextTools.newline_set
            11. mx.TextTools.formfeedmx.TextTools.formfeed_set
            12. mx.TextTools.whitespacemx.TextTools.whitespace_set
            13. mx.TextTools.anymx.TextTools.any_set
          4. COMMANDS
          5. UNCONDITIONAL COMMANDS
            1. mx.TextTools.Failmx.TextTools.Jump
            2. mx.TextTools.Skipmx.TextTools.Move
          6. MATCHING PARTICULAR CHARACTERS
            1. mx.TextTools.AllInmx.TextTools.AllInSetmx.TextTools.AllInCharSet
            2. mx.TextTools.AIINotIn
            3. mx.TextTools.ls
            4. mx.TextTools.IsNot
            5. mx.TextTools.IsInmx.TextToo1s.IsInSetmx.TextTools.IsInCharSet
            6. mx.TextTools.IsNotIn
          7. MATCHING SEQUENCES
            1. mx.TextTools.Word
            2. mx.TextTools.WordStartmx.TextTools.sWordStartmx.TextTools.WordEndmx.TextTools.sWordEnd
            3. mx.TextTools.sFindWord
            4. mx.TextTools.EOF
          8. COMPOUND MATCHES
            1. mx.TextTools.Tablemx.TextTools.SubTable
            2. mx.TextTools.TableInListmx.TextTools.SubTableInList
            3. mx.TextTools.Call
            4. mx.TextTools.CallArg
          9. MODIFIERS
            1. mx.TextTools.CallTag
            2. mx.TextTools.AppendMatch
            3. mx.TextTools.AppendToTagobj
            4. mx.TextTools.AppendTagobj
            5. mx.TextTools.LookAhead
          10. CLASSES
            1. mx.TextTools.BMS(word [,translate])mx.TextTools.FS(word [,translate])mx.TextTools.TextSearch(word [,translate [,algorithm=BOYERMOORE]])
            2. mx.TextTools.CharSet(definition)
          11. METHODS AND ATTRIBUTES
            1. mx.TextTools.BMS.search(s [,start [,end]])mx.TextTools.FS.search(s [,start [,end]])mx.TextTools.TextSearch.search(s [,start [,end]])
            2. mx.TextTools.BMS.find(s, [,start [,end]])mx.TextTools.FS.find(s, [,start [,end]])mx.TextTools.TextSearch.search(s [,start [,end]])
            3. mx.TextTools.BMS.findall(s [,start [,end]])mx.TextTools.FS.findall(s [,start [,end]])mx.TextTools.TextSearch.search(s [,start [,end]])
            4. mx.TextTools.BMS.matchmx.TextTools.FS.matchmx.TextTools.TextSearch.match
            5. mx.TextTools.BMS.translatemx.TextTools.FS.translatemx.TextTools.TextSearch.match
            6. mx.TextTools.CharSet.contains(c)
            7. mx.TextTools.CharSet.search(s [,direction [,start=0 [,stop=len(s)]]])
            8. mx.TextTools.CharSet.match(s [,direction [,start=0 [,stop=len(s)]]])
            9. mx.TextTools.CharSet.split(s [,start=0 [,stop=len(text)]])
            10. mx.TextTools.CharSet.splitx(s [,start=0 [,stop=len(text)]])
            11. mx.TextTools.CharSet.strip(s [,where=0 [,start=0 [,stop=len(s)]]])
          12. FUNCTIONS
            1. mx.TextTools.cmp(t1, t2)
            2. mx.TextTools.invset(s)
            3. mx.TextTools.set(s [,includechars=1])
            4. mx.TextTools.tag(s, table [,start [,end [,taglist]]])
          13. UTILITY FUNCTIONS
            1. mx.TextTools.charsplit(s, char, [start [,end]])
            2. mx.TextTools.collapse(s, sep=' ')
            3. mx.TextTools.countlines(s)
            4. mx.TextTools.find(s, search_obj, [start, [,end]])
            5. mx.TextTools.findall(s, search_obj [,start [,end]])
            6. mx.TextTools.hex2str(hexstr)
            7. mx.TextTools.is_whitespace(s [,start [,end]])
            8. mx.TextTools.isascii(s)
            9. mx.TextTools.join(joinlist [,sep=”“ [,start [,end]]])
            10. mx.TextTools.lower(s)
            11. mx.TextTools.prefix(s, prefixes [,start [,stop [,translate]]])
            12. mx.TextTools.multireplace(s ,replacements [,start [,stop]])
            13. mx.TextTools.replace(s, old, new [,start [,stop]])
            14. mx.TextTools.setfind(s, set [,start [,end]])
            15. mx.TextTools.setsplit(s, set [,start [,stop]])
            16. mx.TextTools.setsplitx(text,set[,start =0, stop =len(text)])
            17. mx.TextTools.splitat(s, char, [n=1 [,start [end]]])
            18. mx.TextTools.splitlines(s)
            19. mx.TextTools.splitwords(s)
            20. mx.TextTools.str2hex(s)
            21. mx.TextTools.suffix(s, suffixes [,start [,stop [,translate]]])
            22. mx.TextTools.upper(s)
        3. 4.3.3. High-Level EBNF Parsing
          1. Example: Marking up smart ASCII (Redux)
          2. GENERATING AND USING A TAGLIST
          3. THE TAGLIST AND THE OUTPUT
          4. GRAMMAR
          5. DECLARATION PATTERNS
          6. LITERALS
            1. Literal string
            2. Character class: “[”, “]”
          7. QUANTIFIERS
            1. Universal quantifier: “*”
            2. Existential quantifier: “+”
            3. Potentiality quantifier: “?”
            4. Lookahead quantifier: “?”
            5. Error on Failure: “!”
          8. STRUCTURES
            1. Alternation operator: “/”
            2. Sequence operator: “,”
            3. Negation operator: “-”
            4. Grouping operators: “(”, “)”
          9. USEFUL PRODUCTIONS
            1. simpleparse.common.calendar_names
            2. simpleparse.common.chartypes
            3. simpleparse.common.comments
            4. simpleparse.common.iso_date
            5. simpleparse.common.iso_date_loose
            6. simpleparse.common.numbers
            7. simpleparse.common.phonetics
            8. simpleparse.common.strings
            9. simpleparse.common.timezone_names
          10. GOTCHAS
        4. 4.3.4. High-Level Programmatic Parsing
          1. Example: Marking up smart ASCII (yet again)
          2. GENERATING A TOKEN LIST
            1. Parsing a token list
          3. LEX
          4. YACC
          5. MORE ON PLY PARSERS
            1. Error Recovery
            2. The Parser State Machine
            3. Precedence and Associativity
    8. 5. Internet Tools and Techniques
      1. 5.1. Working with Email and Newsgroups
        1. 5.1.1. Manipulating and Creating Message Texts
          1. CLASSES
            1. email.MIMEBase.MIMEBase(maintype, subtype, **params)
            2. email.MIMENonMultipart.MIMENonMultipart(maintype, subtype, **params)
            3. email.MIMEMultipart.MIMEMultipart([subtype=“mixed” [boundary, [,*subparts [,**params]]]])
            4. email.MIMEAudio.MIMEAudio(audiodata [,subtype [,encoder [,**params]]])
            5. email.MIMEImage.MIMEImage(imagedata [,subtype [,encoder [,**params]]])
            6. email.MIMEText.MIMEText(text [,subtype [,charset]])
          2. FUNCTIONS
            1. email.message_from_file(file [,_class=email.Message.Message [,strict=0]])
            2. email.message_from_string(s [,_class=email.Message.Message [,strict=0]])
          3. FUNCTIONS
            1. email.Encoders.encode_quopri(mess)
            2. email.Encoders.encode_base64(mess)
            3. email.Encoders.encode_7or8bit(mess)
          4. CLASSES
            1. email.Generator.Generator(file [,mangle_from_=l [,maxheaderlen=78]])
            2. email.Generator.DecodedGenerator(file [,mangle_from_ [,maxheaderlen [,fmt]]])
          5. METHODS
            1. email.Generator.Generator.clone()email.Generator.DecodedGenerator.clone()
            2. email.Generator.Generator.flatten(mess [,unixfrom=0])email.Generator.DecodedGenerator.flatten(mess [,unixfrom=0])
            3. email.Generator.Generator.write(s)email.Generator.DecodedGenerator.write(s)
          6. CLASSES
            1. email.Header.Header([s=“” [,charset [,maxlinelen=76 [,header_name=“”[,continuation_ws=“ ”]]]]])
          7. METHODS
            1. email.Header.Header.append(s [,charset])
            2. email.Header.Header.encode()email.Header.Header.__str__()
          8. FUNCTIONS
            1. email.Header.decode_header(header)
            2. email.Header.make_header(decoded_seq [,maxlinelen [,header_name [,continuation_ws]]])
          9. FUNCTIONS
            1. email.Iterators.body_line_iterator(mess)
            2. email.Iterators.typed_subpart_iterator(mess [,maintype=“text” [,subtype]])
            3. email.Iterators._structure(mess [,file=sys.stdout])
          10. CLASSES
            1. email.Message.Message()
          11. METHODS AND ATTRIBUTES
            1. email.Message.Message.add_header(field, value [,**params])
            2. email.Message.Message.as_string([unixfrom=0])
            3. email.Message.Message.attach(mess)
            4. email.Message.Message.del_param(param [,header=“ Content-Type” [,requote=1]])
            5. email.Message.Message.epilogue
            6. email.Message.Message.get_all(field [,failobj=None])
            7. email.Message.Message.get_boundary([failobj=None])
            8. email.Message.Message.get_charsets([failobj=None])
            9. email.Message.Message.get_content_charset([failobj=None])
            10. email.Message.Message.get_content_maintype()
            11. email.Message.Message.get_content_subtype()
            12. email.Message.Message.get_content_type()
            13. email.Message.Message.get_default_type()
            14. email.Message.Message.get_filename([failobj=None])
            15. email.Message.Message.get_param(param [,failobj [,header=...[,unquote=1]]])
            16. email.Message.Message.get_params([,failobj=None [,header=...[,unquote=1]]])
            17. email.Message.Message.get_payload([i [,decode=0]])
            18. email.Message.Message.get_unixfrom()
            19. email.Message.Message.is_multipart()
            20. email.Message.Message.preamble
            21. email.Message.Message.replace_header(field, value)
            22. email.Message.Message.set_boundary(s)
            23. email.Message.Message.set_default_type(ctype)
            24. email.Message.Message.set_param(param, value [,header=“ Content-Type” [,requote=1 [,charset [,language]]]])
            25. email.Message.Message.set_payload(payload [,charset=None])
            26. email.Message.Message.set_type(ctype [,header=“Content-Type” [,requote=1]])
            27. email.Message.Message.set_unixfrom(s)
            28. email.Message.Message.walk()
          12. CLASSES
            1. email.Parser.Parser([_class=email.Message.Message [,strict=0]])
            2. email.Parser.HeaderParser([_class=email.Message.Message [,strict=0]])
          13. METHODS
            1. email.Parser.Parser.parse(file [,headersonly=0])email.Parser.HeaderParser.parse(file [,headersonly=0])
            2. email.Parser.Parser.parsestr(s [,headersonly=0])email.Parser.HeaderParser.parsestr(s [,headersonly=0])
          14. FUNCTIONS
            1. email.Utils.decode_rfc2231(s)
            2. email.Utils.encode_rfc2231(s [,charset [,language]])
            3. email.Utils.formataddr(pair)
            4. email.Utils.formataddr([timeval [,localtime=0]])
            5. email.Utils.getaddresses(addresses)
            6. email.Utils.make_msgid([seed])
            7. email.Utils.mktime_tz(tuple)
            8. email.Utils.parseaddr(address)
            9. email.Utils.parsedate(datestr)
            10. email.Utils.parsedate_tz(datestr)
            11. email.Utils.quote(s)
            12. email.Utils.unquote(s)
        2. 5.1.2. Communicating with Mail Servers
          1. CLASSES
            1. imaplib.IMAP4([host=”localhost“ [port=143]])
          2. METHODS
            1. imaplib.IMAP4.close()
            2. imaplib.IMAP4.expunge()
            3. imaplib.IMAP4.fetch(message_set, message_parts)
            4. imaplib.IMAP4.list([dirname=”“ [,pattern=”*“])
            5. imaplib.IMAP4.login(user, passwd)
            6. imaplib.IMAP4.logout()
            7. imaplib.IMAP4.search(charset, criterion1 [,criterion2 [,...]])
            8. imaplib.lMAP4.select([mbox=”INBOX“ [,readonly=0])
          3. CLASSES
            1. poplib.POP3(host [,port=110])
          4. METHODS
            1. poplib.POP3.apop(user, secret)
            2. poplib.POP3.dele(messnum)
            3. poplib.POP3.pass_(password)
            4. poplib.POP3.quit()
            5. poplib.POP3.retr(messnum)
            6. poplib.POP3.rset()
            7. poplib.POP3.top(messnum, lines)
            8. poplib.POP3.stat()
            9. poplib.POP3.user(username)
          5. CLASSES
            1. smtplib.SMTP([host=”localhost“ [,port=25]])
          6. METHODS
            1. smtplib.SMTP.login(user, passwd)
            2. smtplib.SMTP.quit()
            3. smtplib.SMTP.sendmail(from_, to_, mess [,mail_options=[] [,rcpt_options=[]]])
        3. 5.1.3. Message Collections and Message Parts
          1. CLASSES
            1. mailbox.UnixMailbox(file [,factory=rfc822.Message])
            2. mailbox.PortableUnixMailbox(file [,factory=rfc822.Message])
            3. mailbox.BabylMailbox(file [,factory=rfc822.Message])
            4. mailbox.MmdfMailbox(file [,factory=rfc822.Message])
            5. mailbox.MHMailbox(dirname [,factory=rfc822.Message])
            6. mailbox.Maildir(dirname [,factory=rfc822.Message])
          2. FUNCTIONS
            1. mimetypes.guess_type(url [,strict=0])
            2. mimetypes.guess_extension(type [,strict=0])
            3. mimetypes.init([list-of-files])
            4. mimetypes.read_mime_types(fname)
          3. ATTRIBUTES
            1. mimetypes.common_types
            2. mimetypes.inited
            3. mimetypes.encodings_map
            4. mimetypes.knownfiles
            5. mimetypes.suffix_map
            6. mimetypes.types_map
      2. 5.2. World Wide Web Applications
        1. 5.2.1. Common Gateway Interface
          1. A CGI PRIMER
          2. CLASSES
            1. cgi.FieldStorage([fp=sys.stdin [,headers [,ob [,environ=os.environ [,keep_blank_values=0 [,strict_parsing=0]]]]]])
          3. METHODS
            1. cgi.FieldStorage.getfirst(key [,default=None])
            2. cgi.FieldStorage.getlist(key [,default=None])
            3. cgi.FieldStorage.getvalue(key [,default=None])
          4. ATTRIBUTES
            1. cgi.FieldStorage.file
            2. cgi.FieldStorage.filename
            3. cgi.FieldStorage.list
            4. cgi.FieldStorage.valuecgi.MiniFieldStorage.value
          5. METHODS
            1. cgitb.enable([display=1 [,logdir=None [context=5]]])
        2. 5.2.2. Parsing, Creating, and Manipulating HTML Documents
          1. ATTRIBUTES
            1. htmlentitydefs.entitydefs
          2. CLASSES
            1. HTMLParser.HTMLParser()
          3. METHODS AND ATTRIBUTES
            1. HTMLParser.HTMLParser.close()
            2. HTMLParser.HTMLParser.feed(data)
            3. HTMLParser.HTMLParser.getpos()
            4. HTMLParser.HTMLParser.handle_charref(name)
            5. HTMLParser.HTMLParser.handle_comment(data)
            6. HTMLParser.HTMLParser.handle_data(data)
            7. HTMLParser.HTMLParser.handle_decl(data)
            8. HTMLParser.HTMLParser.handle_endtag(tag)
            9. HTMLParser.HTMLParser.handle_entityref(name)
            10. HTMLParser.HTMLParser.handle_pi(data)
            11. HTMLParser.HTMLParser.handle_startendtag(tag, attrs)
            12. HTMLParser.HTMLParser.handle_starttag(tag, attrs)
            13. HTMLParser.HTMLParser.lasttag
            14. HTMLParser.HTMLParser.reset()
        3. 5.2.3. Accessing Internet Resources
          1. FUNCTIONS
            1. urllib.urlopen(url [,data])
            2. urllib.urlretrieve(url [,fname [,reporthook [,data]]])
            3. urllib.quote(s [,safe=“/”])
            4. urllib.quote_plus(s [,safe=“/”])
            5. urllib.unquote(s)
            6. urllib.unquote_plus(s)
            7. urllib.urlencode(query)
          2. CLASSES
            1. urllib.URLopener([proxies [,**x509]])
            2. urllib.FancyURLopener([proxies [,**x509]])
          3. METHODS AND ATTRIBUTES
            1. urllib.URLFancyopener.get_user_passwd(host, realm)
            2. urllib.URLopener.open(url [,data])urllib.URLFancyopener.open(url [,data])
            3. urllib.URLopener.open_unknown (url [,data])urllib.URLFancyopener.open_unknown (url [,data])
            4. urllib.URLFancyopener.prompt_user_passwd(host, realm)
            5. urllib.URLopener.retrieve(url [,fname [,reporthook [,data]]])urllib.URLFancyopener.retrieve(url [,fname [,reporthook [,data]]])
            6. urllib.URLopener.versionurllib.URFancyLopener.version
          4. FUNCTIONS
            1. urlparse.urlparse(url [,def_scheme=“” [,fragments=1]])
            2. urlparse.urlunparse(tup)
            3. urlparse.urljoin(base, file)
      3. 5.3. Synopses of Other Internet Modules
        1. 5.3.1. Standard Internet-Related Tools
          1. asyncore
          2. Cookie
          3. email.Charset
          4. ftplib
          5. gopherlib
          6. httplib
          7. ic, icopen
          8. icopen
          9. imghdr
          10. mailcap
          11. mhlib
          12. mimetools
          13. MimeWriter
          14. mimify
          15. netrc
          16. nntplib
          17. nsremote
          18. rfc822
          19. select
          20. sndhdr
          21. socket
          22. SocketServer
          23. telnetlib
          24. urllib2
          25. Webbrowser
        2. 5.3.2. Third-Party Internet Related Tools
          1. Quixote
          2. Twisted
          3. Zope
      4. 5.4. Understanding XML
        1. THE DATA MODEL
        2. OTHER XML FEATURES
        3. 5.4.1. Python Standard Library XML Modules
          1. xml.dom
          2. xml.dom.minidom
          3. xml.dom.pulldom
          4. xml.parsers.expat
          5. xml.sax
          6. xml.sax.handler
          7. xml.sax.saxutils
          8. xml.sax.xmlreader
          9. xmllib
          10. xmlrpclibSimpleXMLRPCServer
        4. 5.4.2. Third-Party XML-Related Tools
          1. gnosis.xml.indexer
          2. gnosis.xml.objectify
          3. gnosis.xml.pickle
          4. gnosis.xml.validity
          5. PyXML
          6. PYX
          7. 4Suite
          8. yaml
    9. A. A Selective and Impressionistic Short Review of Python
      1. A.1. What Kind of Language Is Python?
      2. A.2. Namespaces and Bindings
        1. A.2.1. Assignment and Dereferencing
        2. A.2.2. Function and Class Definitions
        3. A.2.3. import Statements
        4. A.2.4. for Statements
        5. A.2.5. except Statements
      3. A.3. Datatypes
        1. A.3.1. Simple Types
          1. bool
          2. int
          3. long
          4. float
          5. complex
          6. string
          7. unicode
        2. A.3.2. String Interpolation
        3. A.3.3. Printing
        4. A.3.4. Container Types
          1. tuple
          2. list
          3. dict
          4. sets.Set
        5. A.3.5. Compound Types
          1. class instance
      4. A.4. Flow Control
        1. A.4.1. if/then/else Statements
        2. A.4.2. Boolean Shortcutting
        3. A.4.3. for/continue/break Statements
        4. A.4.4. map(), filter(), reduce(), and List Comprehensions
        5. A.4.5. while/else/continue/break Statements
        6. A.4.6. Functions, Simple Generators, and the yield Statement
        7. A.4.7. Raising and Catching Exceptions
        8. A.4.8. Data as Code
          1. eval(s [,globals=globals() [,locals=locals()]])
          2. exec
          3. __import__(s [,globals=globals() [,locals=locals() [,fromlist]]])
          4. input([prompt])
          5. raw_input([prompt])
      5. A.5. Functional Programming
        1. A.5.1. Emphasizing Expressions Using lambda
        2. A.5.2. Special List Functions
        3. zip(seq1 [,seq2 [,...]])
          1. enumerate(collection)
        4. A.5.3. List-Application Functions as Flow Control
        5. A.5.4. Extended Call Syntax and apply()
    10. B. A Data Compression Primer
      1. B.1. Introduction
      2. B.2. Lossless and Lossy Compression
      3. B.3. A Data Set Example
      4. B.4. Whitespace Compression
      5. B.5. Run-Length Encoding
      6. B.6. Huffman Encoding
      7. B.7. Lempel Ziv-Compression
      8. B.8. Solving the Right Problem
      9. B.9. A Custom Text Compressor
      10. B.10. References
    11. C. Understanding Unicode
      1. C.1. Some Background on Characters
      2. C.2. What Is Unicode?
      3. C.3. Encodings
      4. C.4. Declarations
      5. C.5. Finding Codepoints
      6. C.6. Resources
    12. D. A State Machine for Adding Markup to Text
    13. E. Glossary

    Product information

    • Title: Text Processing in Python
    • Author(s): David Mertz
    • Release date: June 2003
    • Publisher(s): Addison-Wesley Professional
    • ISBN: None