You are previewing Learning Python, 4th Edition.

Learning Python, 4th Edition

Cover of Learning Python, 4th Edition by Mark Lutz Published by O'Reilly Media, Inc.
  1. Learning Python
  2. Dedication
  3. A Note Regarding Supplemental Files
  4. Preface
    1. About This Fourth Edition
      1. Coverage for Both 3.0 and 2.6
      2. New Chapters
      3. Changes to Existing Material
      4. Specific Language Extensions in 2.6 and 3.0
      5. Specific Language Removals in 3.0
    2. About The Third Edition
      1. The Third Edition’s Python Language Changes
      2. The Third Edition’s Python Training Changes
      3. The Third Edition’s Structural Changes
      4. The Third Edition’s Scope Changes
    3. About This Book
      1. This Book’s Prerequisites
      2. This Book’s Scope and Other Books
      3. This Book’s Style and Structure
    4. Book Updates
    5. About the Programs in This Book
    6. Using Code Examples
    7. Font Conventions
    8. Safari® Books Online
    9. How to Contact Us
    10. Acknowledgments
  5. I. Getting Started
    1. 1. A Python Q&A Session
      1. Why Do People Use Python?
      2. Is Python a “Scripting Language”?
      3. OK, but What’s the Downside?
      4. Who Uses Python Today?
      5. What Can I Do with Python?
      6. How Is Python Supported?
      7. What Are Python’s Technical Strengths?
      8. How Does Python Stack Up to Language X?
      9. Chapter Summary
      10. Test Your Knowledge: Quiz
      11. Test Your Knowledge: Answers
    2. 2. How Python Runs Programs
      1. Introducing the Python Interpreter
      2. Program Execution
      3. Execution Model Variations
      4. Chapter Summary
      5. Test Your Knowledge: Quiz
      6. Test Your Knowledge: Answers
    3. 3. How You Run Programs
      1. The Interactive Prompt
      2. System Command Lines and Files
      3. Clicking File Icons
      4. Module Imports and Reloads
      5. Using exec to Run Module Files
      6. The IDLE User Interface
      7. Other IDEs
      8. Other Launch Options
      9. Which Option Should I Use?
      10. Chapter Summary
      11. Test Your Knowledge: Quiz
      12. Test Your Knowledge: Answers
      13. Test Your Knowledge: Part I Exercises
  6. II. Types and Operations
    1. 4. Introducing Python Object Types
      1. Why Use Built-in Types?
      2. Numbers
      3. Strings
      4. Lists
      5. Dictionaries
      6. Tuples
      7. Files
      8. Other Core Types
      9. Chapter Summary
      10. Test Your Knowledge: Quiz
      11. Test Your Knowledge: Answers
    2. 5. Numeric Types
      1. Numeric Type Basics
      2. Numbers in Action
      3. Other Numeric Types
      4. Numeric Extensions
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    3. 6. The Dynamic Typing Interlude
      1. The Case of the Missing Declaration Statements
      2. Shared References
      3. Dynamic Typing Is Everywhere
      4. Chapter Summary
      5. Test Your Knowledge: Quiz
      6. Test Your Knowledge: Answers
    4. 7. Strings
      1. String Literals
      2. Strings in Action
      3. String Methods
      4. String Formatting Expressions
      5. String Formatting Method Calls
      6. General Type Categories
      7. Chapter Summary
      8. Test Your Knowledge: Quiz
      9. Test Your Knowledge: Answers
    5. 8. Lists and Dictionaries
      1. Lists
      2. Lists in Action
      3. Dictionaries
      4. Dictionaries in Action
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    6. 9. Tuples, Files, and Everything Else
      1. Tuples
      2. Files
      3. Type Categories Revisited
      4. Object Flexibility
      5. References Versus Copies
      6. Comparisons, Equality, and Truth
      7. Python’s Type Hierarchies
      8. Other Types in Python
      9. Built-in Type Gotchas
      10. Chapter Summary
      11. Test Your Knowledge: Quiz
      12. Test Your Knowledge: Answers
      13. Test Your Knowledge: Part II Exercises
  7. III. Statements and Syntax
    1. 10. Introducing Python Statements
      1. Python Program Structure Revisited
      2. A Tale of Two ifs
      3. A Quick Example: Interactive Loops
      4. Chapter Summary
      5. Test Your Knowledge: Quiz
      6. Test Your Knowledge: Answers
    2. 11. Assignments, Expressions, and Prints
      1. Assignment Statements
      2. Expression Statements
      3. Print Operations
      4. Chapter Summary
      5. Test Your Knowledge: Quiz
      6. Test Your Knowledge: Answers
    3. 12. if Tests and Syntax Rules
      1. if Statements
      2. Python Syntax Rules
      3. Truth Tests
      4. The if/else Ternary Expression
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    4. 13. while and for Loops
      1. while Loops
      2. break, continue, pass, and the Loop else
      3. for Loops
      4. Loop Coding Techniques
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    5. 14. Iterations and Comprehensions, Part 1
      1. Iterators: A First Look
      2. List Comprehensions: A First Look
      3. Other Iteration Contexts
      4. New Iterables in Python 3.0
      5. Other Iterator Topics
      6. Chapter Summary
      7. Test Your Knowledge: Quiz
      8. Test Your Knowledge: Answers
    6. 15. The Documentation Interlude
      1. Python Documentation Sources
      2. Common Coding Gotchas
      3. Chapter Summary
      4. Test Your Knowledge: Quiz
      5. Test Your Knowledge: Answers
      6. Test Your Knowledge: Part III Exercises
  8. IV. Functions
    1. 16. Function Basics
      1. Why Use Functions?
      2. Coding Functions
      3. A First Example: Definitions and Calls
      4. A Second Example: Intersecting Sequences
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    2. 17. Scopes
      1. Python Scope Basics
      2. The global Statement
      3. Scopes and Nested Functions
      4. The nonlocal Statement
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    3. 18. Arguments
      1. Argument-Passing Basics
      2. Special Argument-Matching Modes
      3. The min Wakeup Call!
      4. Generalized Set Functions
      5. Emulating the Python 3.0 print Function
      6. Chapter Summary
      7. Test Your Knowledge: Quiz
      8. Test Your Knowledge: Answers
    4. 19. Advanced Function Topics
      1. Function Design Concepts
      2. Recursive Functions
      3. Function Objects: Attributes and Annotations
      4. Anonymous Functions: lambda
      5. Mapping Functions over Sequences: map
      6. Functional Programming Tools: filter and reduce
      7. Chapter Summary
      8. Test Your Knowledge: Quiz
      9. Test Your Knowledge: Answers
    5. 20. Iterations and Comprehensions, Part 2
      1. List Comprehensions Revisited: Functional Tools
      2. Iterators Revisited: Generators
      3. 3.0 Comprehension Syntax Summary
      4. Timing Iteration Alternatives
      5. Function Gotchas
      6. Chapter Summary
      7. Test Your Knowledge: Quiz
      8. Test Your Knowledge: Answers
      9. Test Your Knowledge: Part IV Exercises
  9. V. Modules
    1. 21. Modules: The Big Picture
      1. Why Use Modules?
      2. Python Program Architecture
      3. How Imports Work
      4. The Module Search Path
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    2. 22. Module Coding Basics
      1. Module Creation
      2. Module Usage
      3. Module Namespaces
      4. Reloading Modules
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    3. 23. Module Packages
      1. Package Import Basics
      2. Package Import Example
      3. Why Use Package Imports?
      4. Package Relative Imports
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    4. 24. Advanced Module Topics
      1. Data Hiding in Modules
      2. Enabling Future Language Features
      3. Mixed Usage Modes: __name__ and __main__
      4. Changing the Module Search Path
      5. The as Extension for import and from
      6. Modules Are Objects: Metaprograms
      7. Importing Modules by Name String
      8. Transitive Module Reloads
      9. Module Design Concepts
      10. Module Gotchas
      11. Chapter Summary
      12. Test Your Knowledge: Quiz
      13. Test Your Knowledge: Answers
      14. Test Your Knowledge: Part V Exercises
  10. VI. Classes and OOP
    1. 25. OOP: The Big Picture
      1. Why Use Classes?
      2. OOP from 30,000 Feet
      3. Chapter Summary
      4. Test Your Knowledge: Quiz
      5. Test Your Knowledge: Answers
    2. 26. Class Coding Basics
      1. Classes Generate Multiple Instance Objects
      2. Classes Are Customized by Inheritance
      3. Classes Can Intercept Python Operators
      4. The World’s Simplest Python Class
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
    3. 27. A More Realistic Example
      1. Step 1: Making Instances
      2. Step 2: Adding Behavior Methods
      3. Step 3: Operator Overloading
      4. Step 4: Customizing Behavior by Subclassing
      5. Step 5: Customizing Constructors, Too
      6. Step 6: Using Introspection Tools
      7. Step 7 (Final): Storing Objects in a Database
      8. Future Directions
      9. Chapter Summary
      10. Test Your Knowledge: Quiz
      11. Test Your Knowledge: Answers
    4. 28. Class Coding Details
      1. The class Statement
      2. Methods
      3. Inheritance
      4. Namespaces: The Whole Story
      5. Documentation Strings Revisited
      6. Classes Versus Modules
      7. Chapter Summary
      8. Test Your Knowledge: Quiz
      9. Test Your Knowledge: Answers
    5. 29. Operator Overloading
      1. The Basics
      2. Indexing and Slicing: __getitem__ and __setitem__
      3. Index Iteration: __getitem__
      4. Iterator Objects: __iter__ and __next__
      5. Membership: __contains__, __iter__, and __getitem__
      6. Attribute Reference: __getattr__ and __setattr__
      7. String Representation: __repr__ and __str__
      8. Right-Side and In-Place Addition: __radd__ and __iadd__
      9. Call Expressions: __call__
      10. Comparisons: __lt__, __gt__, and Others
      11. Boolean Tests: __bool__ and __len__
      12. Object Destruction: __del__
      13. Chapter Summary
      14. Test Your Knowledge: Quiz
      15. Test Your Knowledge: Answers
    6. 30. Designing with Classes
      1. Python and OOP
      2. OOP and Inheritance: “Is-a” Relationships
      3. OOP and Composition: “Has-a” Relationships
      4. OOP and Delegation: “Wrapper” Objects
      5. Pseudoprivate Class Attributes
      6. Methods Are Objects: Bound or Unbound
      7. Multiple Inheritance: “Mix-in” Classes
      8. Classes Are Objects: Generic Object Factories
      9. Other Design-Related Topics
      10. Chapter Summary
      11. Test Your Knowledge: Quiz
      12. Test Your Knowledge: Answers
    7. 31. Advanced Class Topics
      1. Extending Built-in Types
      2. The “New-Style” Class Model
      3. New-Style Class Changes
      4. New-Style Class Extensions
      5. Static and Class Methods
      6. Decorators and Metaclasses: Part 1
      7. Class Gotchas
      8. Chapter Summary
      9. Test Your Knowledge: Quiz
      10. Test Your Knowledge: Answers
      11. Test Your Knowledge: Part VI Exercises
  11. VII. Exceptions and Tools
    1. 32. Exception Basics
      1. Why Use Exceptions?
      2. Exceptions: The Short Story
      3. Chapter Summary
      4. Test Your Knowledge: Quiz
      5. Test Your Knowledge: Answers
    2. 33. Exception Coding Details
      1. The try/except/else Statement
      2. The try/finally Statement
      3. Unified try/except/finally
      4. The raise Statement
      5. The assert Statement
      6. with/as Context Managers
      7. Chapter Summary
      8. Test Your Knowledge: Quiz
      9. Test Your Knowledge: Answers
    3. 34. Exception Objects
      1. Exceptions: Back to the Future
      2. Why Exception Hierarchies?
      3. Built-in Exception Classes
      4. Custom Print Displays
      5. Custom Data and Behavior
      6. Chapter Summary
      7. Test Your Knowledge: Quiz
      8. Test Your Knowledge: Answers
    4. 35. Designing with Exceptions
      1. Nesting Exception Handlers
      2. Exception Idioms
      3. Exception Design Tips and Gotchas
      4. Core Language Summary
      5. Chapter Summary
      6. Test Your Knowledge: Quiz
      7. Test Your Knowledge: Answers
      8. Test Your Knowledge: Part VII Exercises
  12. VIII. Advanced Topics
    1. 36. Unicode and Byte Strings
      1. String Changes in 3.0
      2. String Basics
      3. Python 3.0 Strings in Action
      4. Coding Unicode Strings
      5. Using 3.0 Bytes Objects
      6. Using 3.0 (and 2.6) bytearray Objects
      7. Using Text and Binary Files
      8. Using Unicode Files
      9. Other String Tool Changes in 3.0
      10. Chapter Summary
      11. Test Your Knowledge: Quiz
      12. Test Your Knowledge: Answers
    2. 37. Managed Attributes
      1. Why Manage Attributes?
      2. Properties
      3. Descriptors
      4. __getattr__ and __getattribute__
      5. Example: Attribute Validations
      6. Chapter Summary
      7. Test Your Knowledge: Quiz
    3. 38. Decorators
      1. What’s a Decorator?
      2. The Basics
      3. Coding Function Decorators
      4. Coding Class Decorators
      5. Managing Functions and Classes Directly
      6. Example: “Private” and “Public” Attributes
      7. Example: Validating Function Arguments
      8. Chapter Summary
      9. Test Your Knowledge: Quiz
      10. Test Your Knowledge: Answers
    4. 39. Metaclasses
      1. To Metaclass or Not to Metaclass
      2. The Metaclass Model
      3. Declaring Metaclasses
      4. Coding Metaclasses
      5. Example: Adding Methods to Classes
      6. Example: Applying Decorators to Methods
      7. Chapter Summary
      8. Test Your Knowledge: Quiz
      9. Test Your Knowledge: Answers
  13. IX. Appendixes
    1. A. Installation and Configuration
      1. Installing the Python Interpreter
      2. Configuring Python
      3. For More Help
    2. B. Solutions to End-of-Part Exercises
      1. Part I, Getting Started
      2. Part II, Types and Operations
      3. Part III, Statements and Syntax
      4. Part IV, Functions
      5. Part V, Modules
      6. Part VI, Classes and OOP
      7. Part VII, Exceptions and Tools
  14. Index
  15. About the Author
  16. Colophon
  17. Copyright
O'Reilly logo

Chapter 4. Introducing Python Object Types

This chapter begins our tour of the Python language. In an informal sense, in Python, we do things with stuff. “Things” take the form of operations like addition and concatenation, and “stuff” refers to the objects on which we perform those operations. In this part of the book, our focus is on that stuff, and the things our programs can do with it.

Somewhat more formally, in Python, data takes the form of objects—either built-in objects that Python provides, or objects we create using Python or external language tools such as C extension libraries. Although we’ll firm up this definition later, objects are essentially just pieces of memory, with values and sets of associated operations.

Because objects are the most fundamental notion in Python programming, we’ll start this chapter with a survey of Python’s built-in object types.

By way of introduction, however, let’s first establish a clear picture of how this chapter fits into the overall Python picture. From a more concrete perspective, Python programs can be decomposed into modules, statements, expressions, and objects, as follows:

  1. Programs are composed of modules.

  2. Modules contain statements.

  3. Statements contain expressions.

  4. Expressions create and process objects.

The discussion of modules in Chapter 3 introduced the highest level of this hierarchy. This part’s chapters begin at the bottom, exploring both built-in objects and the expressions you can code to use them.

Why Use Built-in Types?

If you’ve used lower-level languages such as C or C++, you know that much of your work centers on implementing objects—also known as data structures—to represent the components in your application’s domain. You need to lay out memory structures, manage memory allocation, implement search and access routines, and so on. These chores are about as tedious (and error-prone) as they sound, and they usually distract from your program’s real goals.

In typical Python programs, most of this grunt work goes away. Because Python provides powerful object types as an intrinsic part of the language, there’s usually no need to code object implementations before you start solving problems. In fact, unless you have a need for special processing that built-in types don’t provide, you’re almost always better off using a built-in object instead of implementing your own. Here are some reasons why:

  • Built-in objects make programs easy to write. For simple tasks, built-in types are often all you need to represent the structure of problem domains. Because you get powerful tools such as collections (lists) and search tables (dictionaries) for free, you can use them immediately. You can get a lot of work done with Python’s built-in object types alone.

  • Built-in objects are components of extensions. For more complex tasks, you may need to provide your own objects using Python classes or C language interfaces. But as you’ll see in later parts of this book, objects implemented manually are often built on top of built-in types such as lists and dictionaries. For instance, a stack data structure may be implemented as a class that manages or customizes a built-in list.

  • Built-in objects are often more efficient than custom data structures. Python’s built-in types employ already optimized data structure algorithms that are implemented in C for speed. Although you can write similar object types on your own, you’ll usually be hard-pressed to get the level of performance built-in object types provide.

  • Built-in objects are a standard part of the language. In some ways, Python borrows both from languages that rely on built-in tools (e.g., LISP) and languages that rely on the programmer to provide tool implementations or frameworks of their own (e.g., C++). Although you can implement unique object types in Python, you don’t need to do so just to get started. Moreover, because Python’s built-ins are standard, they’re always the same; proprietary frameworks, on the other hand, tend to differ from site to site.

In other words, not only do built-in object types make programming easier, but they’re also more powerful and efficient than most of what can be created from scratch. Regardless of whether you implement new object types, built-in objects form the core of every Python program.

Python’s Core Data Types

Table 4-1 previews Python’s built-in object types and some of the syntax used to code their literals—that is, the expressions that generate these objects.[12] Some of these types will probably seem familiar if you’ve used other languages; for instance, numbers and strings represent numeric and textual values, respectively, and files provide an interface for processing files stored on your computer.

Table 4-1. Built-in objects preview

Object type

Example literals/creation

Numbers

1234, 3.1415, 3+4j, Decimal, Fraction

Strings

'spam', "guido's", b'a\x01c'

Lists

[1, [2, 'three'], 4]

Dictionaries

{'food': 'spam', 'taste': 'yum'}

Tuples

(1, 'spam', 4, 'U')

Files

myfile = open('eggs', 'r')

Sets

set('abc'), {'a', 'b', 'c'}

Other core types

Booleans, types, None

Program unit types

Functions, modules, classes (Part IV, Part V, Part VI)

Implementation-related types

Compiled code, stack tracebacks (Part IV, Part VII)

Table 4-1 isn’t really complete, because everything we process in Python programs is a kind of object. For instance, when we perform text pattern matching in Python, we create pattern objects, and when we perform network scripting, we use socket objects. These other kinds of objects are generally created by importing and using modules and have behavior all their own.

As we’ll see in later parts of the book, program units such as functions, modules, and classes are objects in Python too—they are created with statements and expressions such as def, class, import, and lambda and may be passed around scripts freely, stored within other objects, and so on. Python also provides a set of implementation-related types such as compiled code objects, which are generally of interest to tool builders more than application developers; these are also discussed in later parts of this text.

We usually call the other object types in Table 4-1 core data types, though, because they are effectively built into the Python language—that is, there is specific expression syntax for generating most of them. For instance, when you run the following code:

>>> 'spam'

you are, technically speaking, running a literal expression that generates and returns a new string object. There is specific Python language syntax to make this object. Similarly, an expression wrapped in square brackets makes a list, one in curly braces makes a dictionary, and so on. Even though, as we’ll see, there are no type declarations in Python, the syntax of the expressions you run determines the types of objects you create and use. In fact, object-generation expressions like those in Table 4-1 are generally where types originate in the Python language.

Just as importantly, once you create an object, you bind its operation set for all time—you can perform only string operations on a string and list operations on a list. As you’ll learn, Python is dynamically typed (it keeps track of types for you automatically instead of requiring declaration code), but it is also strongly typed (you can perform on an object only operations that are valid for its type).

Functionally, the object types in Table 4-1 are more general and powerful than what you may be accustomed to. For instance, you’ll find that lists and dictionaries alone are powerful data representation tools that obviate most of the work you do to support collections and searching in lower-level languages. In short, lists provide ordered collections of other objects, while dictionaries store objects by key; both lists and dictionaries may be nested, can grow and shrink on demand, and may contain objects of any type.

We’ll study each of the object types in Table 4-1 in detail in upcoming chapters. Before digging into the details, though, let’s begin by taking a quick look at Python’s core objects in action. The rest of this chapter provides a preview of the operations we’ll explore in more depth in the chapters that follow. Don’t expect to find the full story here—the goal of this chapter is just to whet your appetite and introduce some key ideas. Still, the best way to get started is to get started, so let’s jump right into some real code.

Numbers

If you’ve done any programming or scripting in the past, some of the object types in Table 4-1 will probably seem familiar. Even if you haven’t, numbers are fairly straightforward. Python’s core objects set includes the usual suspects: integers (numbers without a fractional part), floating-point numbers (roughly, numbers with a decimal point in them), and more exotic numeric types (complex numbers with imaginary parts, fixed-precision decimals, rational fractions with numerator and denominator, and full-featured sets).

Although it offers some fancier options, Python’s basic number types are, well, basic. Numbers in Python support the normal mathematical operations. For instance, the plus sign (+) performs addition, a star (*) is used for multiplication, and two stars (**) are used for exponentiation:

>>> 123 + 222                    # Integer addition
345
>>> 1.5 * 4                      # Floating-point multiplication
6.0
>>> 2 ** 100                     # 2 to the power 100
1267650600228229401496703205376

Notice the last result here: Python 3.0’s integer type automatically provides extra precision for large numbers like this when needed (in 2.6, a separate long integer type handles numbers too large for the normal integer type in similar ways). You can, for instance, compute 2 to the power 1,000,000 as an integer in Python, but you probably shouldn’t try to print the result—with more than 300,000 digits, you may be waiting awhile!

>>> len(str(2 ** 1000000))       # How many digits in a really BIG number?
301030

Once you start experimenting with floating-point numbers, you’re likely to stumble across something that may look a bit odd on first glance:

>>> 3.1415 * 2                   # repr: as code
6.2830000000000004
>>> print(3.1415 * 2)            # str: user-friendly
6.283

The first result isn’t a bug; it’s a display issue. It turns out that there are two ways to print every object: with full precision (as in the first result shown here), and in a user-friendly form (as in the second). Formally, the first form is known as an object’s as-code repr, and the second is its user-friendly str. The difference can matter when we step up to using classes; for now, if something looks odd, try showing it with a print built-in call statement.

Besides expressions, there are a handful of useful numeric modules that ship with Python—modules are just packages of additional tools that we import to use:

>>> import math
>>> math.pi
3.1415926535897931
>>> math.sqrt(85)
9.2195444572928871

The math module contains more advanced numeric tools as functions, while the random module performs random number generation and random selections (here, from a Python list, introduced later in this chapter):

>>> import random
>>> random.random()
0.59268735266273953
>>> random.choice([1, 2, 3, 4])
1

Python also includes more exotic numeric objects—such as complex, fixed-precision, and rational numbers, as well as sets and Booleans—and the third-party open source extension domain has even more (e.g., matrixes and vectors). We’ll defer discussion of these types until later in the book.

So far, we’ve been using Python much like a simple calculator; to do better justice to its built-in types, let’s move on to explore strings.

Strings

Strings are used to record textual information as well as arbitrary collections of bytes. They are our first example of what we call a sequence in Python—that is, a positionally ordered collection of other objects. Sequences maintain a left-to-right order among the items they contain: their items are stored and fetched by their relative position. Strictly speaking, strings are sequences of one-character strings; other types of sequences include lists and tuples, covered later.

Sequence Operations

As sequences, strings support operations that assume a positional ordering among items. For example, if we have a four-character string, we can verify its length with the built-in len function and fetch its components with indexing expressions:

>>> S = 'Spam'
>>> len(S)               # Length
4
>>> S[0]                 # The first item in S, indexing by zero-based position
'S'
>>> S[1]                 # The second item from the left
'p'

In Python, indexes are coded as offsets from the front, and so start from 0: the first item is at index 0, the second is at index 1, and so on.

Notice how we assign the string to a variable named S here. We’ll go into detail on how this works later (especially in Chapter 6), but Python variables never need to be declared ahead of time. A variable is created when you assign it a value, may be assigned any type of object, and is replaced with its value when it shows up in an expression. It must also have been previously assigned by the time you use its value. For the purposes of this chapter, it’s enough to know that we need to assign an object to a variable in order to save it for later use.

In Python, we can also index backward, from the end—positive indexes count from the left, and negative indexes count back from the right:

>>> S[-1]                # The last item from the end in S
'm'
>>> S[-2]                # The second to last item from the end
'a'

Formally, a negative index is simply added to the string’s size, so the following two operations are equivalent (though the first is easier to code and less easy to get wrong):

>>> S[-1]                # The last item in S
'm'
>>> S[len(S)-1]          # Negative indexing, the hard way
'm'

Notice that we can use an arbitrary expression in the square brackets, not just a hardcoded number literal—anywhere that Python expects a value, we can use a literal, a variable, or any expression. Python’s syntax is completely general this way.

In addition to simple positional indexing, sequences also support a more general form of indexing known as slicing, which is a way to extract an entire section (slice) in a single step. For example:

>>> S                     # A 4-character string
'Spam'
>>> S[1:3]                # Slice of S from offsets 1 through 2 (not 3)
'pa'

Probably the easiest way to think of slices is that they are a way to extract an entire column from a string in a single step. Their general form, X[I:J], means “give me everything in X from offset I up to but not including offset J.” The result is returned in a new object. The second of the preceding operations, for instance, gives us all the characters in string S from offsets 1 through 2 (that is, 3 – 1) as a new string. The effect is to slice or “parse out” the two characters in the middle.

In a slice, the left bound defaults to zero, and the right bound defaults to the length of the sequence being sliced. This leads to some common usage variations:

>>> S[1:]                 # Everything past the first (1:len(S))
'pam'
>>> S                     # S itself hasn't changed
'Spam'
>>> S[0:3]                # Everything but the last
'Spa'
>>> S[:3]                 # Same as S[0:3]
'Spa'
>>> S[:-1]                # Everything but the last again, but simpler (0:-1)
'Spa'
>>> S[:]                  # All of S as a top-level copy (0:len(S))
'Spam'

Note how negative offsets can be used to give bounds for slices, too, and how the last operation effectively copies the entire string. As you’ll learn later, there is no reason to copy a string, but this form can be useful for sequences like lists.

Finally, as sequences, strings also support concatenation with the plus sign (joining two strings into a new string) and repetition (making a new string by repeating another):

>>> S
'Spam'
>>> S + 'xyz'             # Concatenation
'Spamxyz'
>>> S                     # S is unchanged
'Spam'
>>> S * 8                 # Repetition
'SpamSpamSpamSpamSpamSpamSpamSpam'

Notice that the plus sign (+) means different things for different objects: addition for numbers, and concatenation for strings. This is a general property of Python that we’ll call polymorphism later in the book—in sum, the meaning of an operation depends on the objects being operated on. As you’ll see when we study dynamic typing, this polymorphism property accounts for much of the conciseness and flexibility of Python code. Because types aren’t constrained, a Python-coded operation can normally work on many different types of objects automatically, as long as they support a compatible interface (like the + operation here). This turns out to be a huge idea in Python; you’ll learn more about it later on our tour.

Immutability

Notice that in the prior examples, we were not changing the original string with any of the operations we ran on it. Every string operation is defined to produce a new string as its result, because strings are immutable in Python—they cannot be changed in-place after they are created. For example, you can’t change a string by assigning to one of its positions, but you can always build a new one and assign it to the same name. Because Python cleans up old objects as you go (as you’ll see later), this isn’t as inefficient as it may sound:

>>> S
'Spam'
>>> S[0] = 'z'             # Immutable objects cannot be changed
...error text omitted...
TypeError: 'str' object does not support item assignment

>>> S = 'z' + S[1:]        # But we can run expressions to make new objects
>>> S
'zpam'

Every object in Python is classified as either immutable (unchangeable) or not. In terms of the core types, numbers, strings, and tuples are immutable; lists and dictionaries are not (they can be changed in-place freely). Among other things, immutability can be used to guarantee that an object remains constant throughout your program.

Type-Specific Methods

Every string operation we’ve studied so far is really a sequence operation—that is, these operations will work on other sequences in Python as well, including lists and tuples. In addition to generic sequence operations, though, strings also have operations all their own, available as methods—functions attached to the object, which are triggered with a call expression.

For example, the string find method is the basic substring search operation (it returns the offset of the passed-in substring, or −1 if it is not present), and the string replace method performs global searches and replacements:

>>> S.find('pa')            # Find the offset of a substring
1
>>> S
'Spam'
>>> S.replace('pa', 'XYZ')  # Replace occurrences of a substring with another
'SXYZm'
>>> S
'Spam'

Again, despite the names of these string methods, we are not changing the original strings here, but creating new strings as the results—because strings are immutable, we have to do it this way. String methods are the first line of text-processing tools in Python. Other methods split a string into substrings on a delimiter (handy as a simple form of parsing), perform case conversions, test the content of the string (digits, letters, and so on), and strip whitespace characters off the ends of the string:

>>> line = 'aaa,bbb,ccccc,dd'
>>> line.split(',')          # Split on a delimiter into a list of substrings
['aaa', 'bbb', 'ccccc', 'dd']
>>> S = 'spam'
>>> S.upper()                # Upper- and lowercase conversions
'SPAM'

>>> S.isalpha()              # Content tests: isalpha, isdigit, etc.
True

>>> line = 'aaa,bbb,ccccc,dd\n'
>>> line = line.rstrip()     # Remove whitespace characters on the right side
>>> line
'aaa,bbb,ccccc,dd'

Strings also support an advanced substitution operation known as formatting, available as both an expression (the original) and a string method call (new in 2.6 and 3.0):

>>> '%s, eggs, and %s' % ('spam', 'SPAM!')          # Formatting expression (all)
'spam, eggs, and SPAM!'

>>> '{0}, eggs, and {1}'.format('spam', 'SPAM!')    # Formatting method (2.6, 3.0)
'spam, eggs, and SPAM!'

One note here: although sequence operations are generic, methods are not—although some types share some method names, string method operations generally work only on strings, and nothing else. As a rule of thumb, Python’s toolset is layered: generic operations that span multiple types show up as built-in functions or expressions (e.g., len(X), X[0]), but type-specific operations are method calls (e.g., aString.upper()). Finding the tools you need among all these categories will become more natural as you use Python more, but the next section gives a few tips you can use right now.

Getting Help

The methods introduced in the prior section are a representative, but small, sample of what is available for string objects. In general, this book is not exhaustive in its look at object methods. For more details, you can always call the built-in dir function, which returns a list of all the attributes available for a given object. Because methods are function attributes, they will show up in this list. Assuming S is still the string, here are its attributes on Python 3.0 (Python 2.6 varies slightly):

>>> dir(S)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__',
'__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__',
'__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__',
'__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', '_formatter_field_name_split', '_formatter_parser',
'capitalize', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find',
'format', 'index', 'isalnum','isalpha', 'isdecimal', 'isdigit', 'isidentifier',
'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join',
'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind',
'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines',
'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

You probably won’t care about the names with underscores in this list until later in the book, when we study operator overloading in classes—they represent the implementation of the string object and are available to support customization. In general, leading and trailing double underscores is the naming pattern Python uses for implementation details. The names without the underscores in this list are the callable methods on string objects.

The dir function simply gives the methods’ names. To ask what they do, you can pass them to the help function:

>>> help(S.replace)
Help on built-in function replace:

replace(...)
    S.replace (old, new[, count]) -> str

    Return a copy of S with all occurrences of substring
    old replaced by new.  If the optional argument count is
    given, only the first count occurrences are replaced.

help is one of a handful of interfaces to a system of code that ships with Python known as PyDoc—a tool for extracting documentation from objects. Later in the book, you’ll see that PyDoc can also render its reports in HTML format.

You can also ask for help on an entire string (e.g., help(S)), but you may get more help than you want to see—i.e., information about every string method. It’s generally better to ask about a specific method.

For more details, you can also consult Python’s standard library reference manual or commercially published reference books, but dir and help are the first line of documentation in Python.

Other Ways to Code Strings

So far, we’ve looked at the string object’s sequence operations and type-specific methods. Python also provides a variety of ways for us to code strings, which we’ll explore in greater depth later. For instance, special characters can be represented as backslash escape sequences:

>>> S = 'A\nB\tC'            # \n is end-of-line, \t is tab
>>> len(S)                   # Each stands for just one character
5

>>> ord('\n')                # \n is a byte with the binary value 10 in ASCII
10

>>> S = 'A\0B\0C'            # \0, a binary zero byte, does not terminate string
>>> len(S)
5

Python allows strings to be enclosed in single or double quote characters (they mean the same thing). It also allows multiline string literals enclosed in triple quotes (single or double)—when this form is used, all the lines are concatenated together, and end-of-line characters are added where line breaks appear. This is a minor syntactic convenience, but it’s useful for embedding things like HTML and XML code in a Python script:

>>> msg = """ aaaaaaaaaaaaa
bbb'''bbbbbbbbbb""bbbbbbb'bbbb
cccccccccccccc"""
>>> msg
' aaaaaaaaaaaaa\nbbb\'\'\'bbbbbbbbbb""bbbbbbb\'bbbb\ncccccccccccccc'

Python also supports a raw string literal that turns off the backslash escape mechanism (such string literals start with the letter r), as well as Unicode string support that supports internationalization. In 3.0, the basic str string type handles Unicode too (which makes sense, given that ASCII text is a simple kind of Unicode), and a bytes type represents raw byte strings; in 2.6, Unicode is a separate type, and str handles both 8-bit strings and binary data. Files are also changed in 3.0 to return and accept str for text and bytes for binary data. We’ll meet all these special string forms in later chapters.

Pattern Matching

One point worth noting before we move on is that none of the string object’s methods support pattern-based text processing. Text pattern matching is an advanced tool outside this book’s scope, but readers with backgrounds in other scripting languages may be interested to know that to do pattern matching in Python, we import a module called re. This module has analogous calls for searching, splitting, and replacement, but because we can use patterns to specify substrings, we can be much more general:

>>> import re
>>> match = re.match('Hello[ \t]*(.*)world', 'Hello    Python world')
>>> match.group(1)
'Python '

This example searches for a substring that begins with the word “Hello,” followed by zero or more tabs or spaces, followed by arbitrary characters to be saved as a matched group, terminated by the word “world.” If such a substring is found, portions of the substring matched by parts of the pattern enclosed in parentheses are available as groups. The following pattern, for example, picks out three groups separated by slashes:

>>> match = re.match('/(.*)/(.*)/(.*)', '/usr/home/lumberjack')
>>> match.groups()
('usr', 'home', 'lumberjack')

Pattern matching is a fairly advanced text-processing tool by itself, but there is also support in Python for even more advanced text and language processing, including XML parsing and natural language analysis. I’ve already said enough about strings for this tutorial, though, so let’s move on to the next type.

Lists

The Python list object is the most general sequence provided by the language. Lists are positionally ordered collections of arbitrarily typed objects, and they have no fixed size. They are also mutable—unlike strings, lists can be modified in-place by assignment to offsets as well as a variety of list method calls.

Sequence Operations

Because they are sequences, lists support all the sequence operations we discussed for strings; the only difference is that the results are usually lists instead of strings. For instance, given a three-item list:

>>> L = [123, 'spam', 1.23]          # A list of three different-type objects
>>> len(L)                           # Number of items in the list
3

we can index, slice, and so on, just as for strings:

>>> L[0]                             # Indexing by position
123

>>> L[:-1]                           # Slicing a list returns a new list
[123, 'spam']

>>> L + [4, 5, 6]                    # Concatenation makes a new list too
[123, 'spam', 1.23, 4, 5, 6]

>>> L                                # We're not changing the original list
[123, 'spam', 1.23]

Type-Specific Operations

Python’s lists are related to arrays in other languages, but they tend to be more powerful. For one thing, they have no fixed type constraint—the list we just looked at, for example, contains three objects of completely different types (an integer, a string, and a floating-point number). Further, lists have no fixed size. That is, they can grow and shrink on demand, in response to list-specific operations:

>>> L.append('NI')                  # Growing: add object at end of list
>>> L
[123, 'spam', 1.23, 'NI']

>>> L.pop(2)                        # Shrinking: delete an item in the middle
1.23

>>> L                               # "del L[2]" deletes from a list too
[123, 'spam', 'NI']

Here, the list append method expands the list’s size and inserts an item at the end; the pop method (or an equivalent del statement) then removes an item at a given offset, causing the list to shrink. Other list methods insert an item at an arbitrary position (insert), remove a given item by value (remove), and so on. Because lists are mutable, most list methods also change the list object in-place, instead of creating a new one:

>>> M = ['bb', 'aa', 'cc']
>>> M.sort()
>>> M
['aa', 'bb', 'cc']
>>> M.reverse()
>>> M
['cc', 'bb', 'aa']

The list sort method here, for example, orders the list in ascending fashion by default, and reverse reverses it—in both cases, the methods modify the list directly.

Bounds Checking

Although lists have no fixed size, Python still doesn’t allow us to reference items that are not present. Indexing off the end of a list is always a mistake, but so is assigning off the end:

>>> L
[123, 'spam', 'NI']

>>> L[99]
...error text omitted...
IndexError: list index out of range

>>> L[99] = 1
...error text omitted...
IndexError: list assignment index out of range

This is intentional, as it’s usually an error to try to assign off the end of a list (and a particularly nasty one in the C language, which doesn’t do as much error checking as Python). Rather than silently growing the list in response, Python reports an error. To grow a list, we call list methods such as append instead.

Nesting

One nice feature of Python’s core data types is that they support arbitrary nesting—we can nest them in any combination, and as deeply as we like (for example, we can have a list that contains a dictionary, which contains another list, and so on). One immediate application of this feature is to represent matrixes, or “multidimensional arrays” in Python. A list with nested lists will do the job for basic applications:

>>> M = [[1, 2, 3],               # A 3 × 3 matrix, as nested lists
         [4, 5, 6],               # Code can span lines if bracketed
         [7, 8, 9]]
>>> M
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Here, we’ve coded a list that contains three other lists. The effect is to represent a 3 × 3 matrix of numbers. Such a structure can be accessed in a variety of ways:

>>> M[1]                          # Get row 2
[4, 5, 6]

>>> M[1][2]                       # Get row 2, then get item 3 within the row
6

The first operation here fetches the entire second row, and the second grabs the third item within that row. Stringing together index operations takes us deeper and deeper into our nested-object structure.[13]

Comprehensions

In addition to sequence operations and list methods, Python includes a more advanced operation known as a list comprehension expression, which turns out to be a powerful way to process structures like our matrix. Suppose, for instance, that we need to extract the second column of our sample matrix. It’s easy to grab rows by simple indexing because the matrix is stored by rows, but it’s almost as easy to get a column with a list comprehension:

>>> col2 = [row[1] for row in M]             # Collect the items in column 2
>>> col2
[2, 5, 8]

>>> M                                        # The matrix is unchanged
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

List comprehensions derive from set notation; they are a way to build a new list by running an expression on each item in a sequence, one at a time, from left to right. List comprehensions are coded in square brackets (to tip you off to the fact that they make a list) and are composed of an expression and a looping construct that share a variable name (row, here). The preceding list comprehension means basically what it says: “Give me row[1] for each row in matrix M, in a new list.” The result is a new list containing column 2 of the matrix.

List comprehensions can be more complex in practice:

>>> [row[1] + 1 for row in M]                 # Add 1 to each item in column 2
[3, 6, 9]

>>> [row[1] for row in M if row[1] % 2 == 0]  # Filter out odd items
[2, 8]

The first operation here, for instance, adds 1 to each item as it is collected, and the second uses an if clause to filter odd numbers out of the result using the % modulus expression (remainder of division). List comprehensions make new lists of results, but they can be used to iterate over any iterable object. Here, for instance, we use list comprehensions to step over a hardcoded list of coordinates and a string:

>>> diag = [M[i][i] for i in [0, 1, 2]]      # Collect a diagonal from matrix
>>> diag
[1, 5, 9]

>>> doubles = [c * 2 for c in 'spam']        # Repeat characters in a string
>>> doubles
['ss', 'pp', 'aa', 'mm']

List comprehensions, and relatives like the map and filter built-in functions, are a bit too involved for me to say more about them here. The main point of this brief introduction is to illustrate that Python includes both simple and advanced tools in its arsenal. List comprehensions are an optional feature, but they tend to be handy in practice and often provide a substantial processing speed advantage. They also work on any type that is a sequence in Python, as well as some types that are not. You’ll hear much more about them later in this book.

As a preview, though, you’ll find that in recent Pythons, comprehension syntax in parentheses can also be used to create generators that produce results on demand (the sum built-in, for instance, sums items in a sequence):

>>> G = (sum(row) for row in M)              # Create a generator of row sums
>>> next(G)                                  # iter(G) not required here
6
>>> next(G)                                  # Run the iteration protocol
15

The map built-in can do similar work, by generating the results of running items through a function. Wrapping it in list forces it to return all its values in Python 3.0:

>>> list(map(sum, M))                        # Map sum over items in M
[6, 15, 24]

In Python 3.0, comprehension syntax can also be used to create sets and dictionaries:

>>> {sum(row) for row in M}                  # Create a set of row sums
{24, 6, 15}

>>> {i : sum(M[i]) for i in range(3)}        # Creates key/value table of row sums
{0: 6, 1: 15, 2: 24}

In fact, lists, sets, and dictionaries can all be built with comprehensions in 3.0:

>>> [ord(x) for x in 'spaam']                # List of character ordinals
[115, 112, 97, 97, 109]
>>> {ord(x) for x in 'spaam'}                # Sets remove duplicates
{112, 97, 115, 109}
>>> {x: ord(x) for x in 'spaam'}             # Dictionary keys are unique
{'a': 97, 'p': 112, 's': 115, 'm': 109}

To understand objects like generators, sets, and dictionaries, though, we must move ahead.

Dictionaries

Python dictionaries are something completely different (Monty Python reference intended)—they are not sequences at all, but are instead known as mappings. Mappings are also collections of other objects, but they store objects by key instead of by relative position. In fact, mappings don’t maintain any reliable left-to-right order; they simply map keys to associated values. Dictionaries, the only mapping type in Python’s core objects set, are also mutable: they may be changed in-place and can grow and shrink on demand, like lists.

Mapping Operations

When written as literals, dictionaries are coded in curly braces and consist of a series of “key: value” pairs. Dictionaries are useful anytime we need to associate a set of values with keys—to describe the properties of something, for instance. As an example, consider the following three-item dictionary (with keys “food,” “quantity,” and “color”):

>>> D = {'food': 'Spam', 'quantity': 4, 'color': 'pink'}

We can index this dictionary by key to fetch and change the keys’ associated values. The dictionary index operation uses the same syntax as that used for sequences, but the item in the square brackets is a key, not a relative position:

>>> D['food']              # Fetch value of key 'food'
'Spam'

>>> D['quantity'] += 1     # Add 1 to 'quantity' value
>>> D
{'food': 'Spam', 'color': 'pink', 'quantity': 5}

Although the curly-braces literal form does see use, it is perhaps more common to see dictionaries built up in different ways. The following code, for example, starts with an empty dictionary and fills it out one key at a time. Unlike out-of-bounds assignments in lists, which are forbidden, assignments to new dictionary keys create those keys:

>>> D = {}
>>> D['name'] = 'Bob'      # Create keys by assignment
>>> D['job']  = 'dev'
>>> D['age']  = 40

>>> D
{'age': 40, 'job': 'dev', 'name': 'Bob'}

>>> print(D['name'])
Bob

Here, we’re effectively using dictionary keys as field names in a record that describes someone. In other applications, dictionaries can also be used to replace searching operations—indexing a dictionary by key is often the fastest way to code a search in Python. As we'll learn later, dictionaries may also be made by passing keyword arguments to the type name: dict(name='Bob', job='dev', age=40) makes the same dictionary.

Nesting Revisited

In the prior example, we used a dictionary to describe a hypothetical person, with three keys. Suppose, though, that the information is more complex. Perhaps we need to record a first name and a last name, along with multiple job titles. This leads to another application of Python’s object nesting in action. The following dictionary, coded all at once as a literal, captures more structured information:

>>> rec = {'name': {'first': 'Bob', 'last': 'Smith'},
           'job':  ['dev', 'mgr'],
           'age':  40.5}

Here, we again have a three-key dictionary at the top (keys “name,” “job,” and “age”), but the values have become more complex: a nested dictionary for the name to support multiple parts, and a nested list for the job to support multiple roles and future expansion. We can access the components of this structure much as we did for our matrix earlier, but this time some of our indexes are dictionary keys, not list offsets:

>>> rec['name']                         # 'name' is a nested dictionary
{'last': 'Smith', 'first': 'Bob'}

>>> rec['name']['last']                 # Index the nested dictionary
'Smith'

>>> rec['job']                          # 'job' is a nested list
['dev', 'mgr']
>>> rec['job'][-1]                      # Index the nested list
'mgr'

>>> rec['job'].append('janitor')        # Expand Bob's job description in-place
>>> rec
{'age': 40.5, 'job': ['dev', 'mgr', 'janitor'], 'name': {'last': 'Smith',
'first': 'Bob'}}

Notice how the last operation here expands the nested job list—because the job list is a separate piece of memory from the dictionary that contains it, it can grow and shrink freely (object memory layout will be discussed further later in this book).

The real reason for showing you this example is to demonstrate the flexibility of Python’s core data types. As you can see, nesting allows us to build up complex information structures directly and easily. Building a similar structure in a low-level language like C would be tedious and require much more code: we would have to lay out and declare structures and arrays, fill out values, link everything together, and so on. In Python, this is all automatic—running the expression creates the entire nested object structure for us. In fact, this is one of the main benefits of scripting languages like Python.

Just as importantly, in a lower-level language we would have to be careful to clean up all of the object’s space when we no longer need it. In Python, when we lose the last reference to the object—by assigning its variable to something else, for example—all of the memory space occupied by that object’s structure is automatically cleaned up for us:

>>> rec = 0                             # Now the object's space is reclaimed

Technically speaking, Python has a feature known as garbage collection that cleans up unused memory as your program runs and frees you from having to manage such details in your code. In Python, the space is reclaimed immediately, as soon as the last reference to an object is removed. We’ll study how this works later in this book; for now, it’s enough to know that you can use objects freely, without worrying about creating their space or cleaning up as you go.[14]

Sorting Keys: for Loops

As mappings, as we’ve already seen, dictionaries only support accessing items by key. However, they also support type-specific operations with method calls that are useful in a variety of common use cases.

As mentioned earlier, because dictionaries are not sequences, they don’t maintain any dependable left-to-right order. This means that if we make a dictionary and print it back, its keys may come back in a different order than that in which we typed them:

>>> D = {'a': 1, 'b': 2, 'c': 3}
>>> D
{'a': 1, 'c': 3, 'b': 2}

What do we do, though, if we do need to impose an ordering on a dictionary’s items? One common solution is to grab a list of keys with the dictionary keys method, sort that with the list sort method, and then step through the result with a Python for loop (be sure to press the Enter key twice after coding the for loop below—as explained in Chapter 3, an empty line means “go” at the interactive prompt, and the prompt changes to “...” on some interfaces):

>>> Ks = list(D.keys())                # Unordered keys list
>>> Ks                                 # A list in 2.6, "view" in 3.0: use list()
['a', 'c', 'b']

>>> Ks.sort()                          # Sorted keys list
>>> Ks
['a', 'b', 'c']

>>> for key in Ks:                     # Iterate though sorted keys
        print(key, '=>', D[key])       # <== press Enter twice here

a => 1
b => 2
c => 3

This is a three-step process, although, as we’ll see in later chapters, in recent versions of Python it can be done in one step with the newer sorted built-in function. The sorted call returns the result and sorts a variety of object types, in this case sorting dictionary keys automatically:

>>> D
{'a': 1, 'c': 3, 'b': 2}

>>> for key in sorted(D):
        print(key, '=>', D[key])

a => 1
b => 2
c => 3

Besides showcasing dictionaries, this use case serves to introduce the Python for loop. The for loop is a simple and efficient way to step through all the items in a sequence and run a block of code for each item in turn. A user-defined loop variable (key, here) is used to reference the current item each time through. The net effect in our example is to print the unordered dictionary’s keys and values, in sorted-key order.

The for loop, and its more general cousin the while loop, are the main ways we code repetitive tasks as statements in our scripts. Really, though, the for loop (like its relative the list comprehension, which we met earlier) is a sequence operation. It works on any object that is a sequence and, like the list comprehension, even on some things that are not. Here, for example, it is stepping across the characters in a string, printing the uppercase version of each as it goes:

>>> for c in 'spam':
        print(c.upper())

S
P
A
M

Python’s while loop is a more general sort of looping tool, not limited to stepping across sequences:

>>> x = 4
>>> while x > 0:
        print('spam!' * x)
        x -= 1

spam!spam!spam!spam!
spam!spam!spam!
spam!spam!
spam!

We’ll discuss looping statements, syntax, and tools in depth later in the book.

Iteration and Optimization

If the last section’s for loop looks like the list comprehension expression introduced earlier, it should: both are really general iteration tools. In fact, both will work on any object that follows the iteration protocol—a pervasive idea in Python that essentially means a physically stored sequence in memory, or an object that generates one item at a time in the context of an iteration operation. An object falls into the latter category if it responds to the iter built-in with an object that advances in response to next. The generator comprehension expression we saw earlier is such an object.

I’ll have more to say about the iteration protocol later in this book. For now, keep in mind that every Python tool that scans an object from left to right uses the iteration protocol. This is why the sorted call used in the prior section works on the dictionary directly—we don’t have to call the keys method to get a sequence because dictionaries are iterable objects, with a next that returns successive keys.

This also means that any list comprehension expression, such as this one, which computes the squares of a list of numbers:

>>> squares = [x ** 2 for x in [1, 2, 3, 4, 5]]
>>> squares
[1, 4, 9, 16, 25]

can always be coded as an equivalent for loop that builds the result list manually by appending as it goes:

>>> squares = []
>>> for x in [1, 2, 3, 4, 5]:          # This is what a list comprehension does
        squares.append(x ** 2)         # Both run the iteration protocol internally

>>> squares
[1, 4, 9, 16, 25]

The list comprehension, though, and related functional programming tools like map and filter, will generally run faster than a for loop today (perhaps even twice as fast)—a property that could matter in your programs for large data sets. Having said that, though, I should point out that performance measures are tricky business in Python because it optimizes so much, and performance can vary from release to release.

A major rule of thumb in Python is to code for simplicity and readability first and worry about performance later, after your program is working, and after you’ve proved that there is a genuine performance concern. More often than not, your code will be quick enough as it is. If you do need to tweak code for performance, though, Python includes tools to help you out, including the time and timeit modules and the profile module. You’ll find more on these later in this book, and in the Python manuals.

Missing Keys: if Tests

One other note about dictionaries before we move on. Although we can assign to a new key to expand a dictionary, fetching a nonexistent key is still a mistake:

>>> D
{'a': 1, 'c': 3, 'b': 2}

>>> D['e'] = 99                      # Assigning new keys grows dictionaries
>>> D
{'a': 1, 'c': 3, 'b': 2, 'e': 99}

>>> D['f']                           # Referencing a nonexistent key is an error
...error text omitted...
KeyError: 'f'

This is what we want—it’s usually a programming error to fetch something that isn’t really there. But in some generic programs, we can’t always know what keys will be present when we write our code. How do we handle such cases and avoid errors? One trick is to test ahead of time. The dictionary in membership expression allows us to query the existence of a key and branch on the result with a Python if statement (as with the for, be sure to press Enter twice to run the if interactively here):

>>> 'f' in D
False

>>> if not 'f' in D:
       print('missing')

missing

I’ll have much more to say about the if statement and statement syntax in general later in this book, but the form we’re using here is straightforward: it consists of the word if, followed by an expression that is interpreted as a true or false result, followed by a block of code to run if the test is true. In its full form, the if statement can also have an else clause for a default case, and one or more elif (else if) clauses for other tests. It’s the main selection tool in Python, and it’s the way we code logic in our scripts.

Still, there are other ways to create dictionaries and avoid accessing nonexistent keys: the get method (a conditional index with a default); the Python 2.X has_key method (which is no longer available in 3.0); the try statement (a tool we’ll first meet in Chapter 10 that catches and recovers from exceptions altogether); and the if/else expression (essentially, an if statement squeezed onto a single line). Here are a few examples:

>>> value = D.get('x', 0)                      # Index but with a default
>>> value
0
>>> value = D['x'] if 'x' in D else 0          # if/else expression form
>>> value
0

We’ll save the details on such alternatives until a later chapter. For now, let’s move on to tuples.

Tuples

The tuple object (pronounced “toople” or “tuhple,” depending on who you ask) is roughly like a list that cannot be changed—tuples are sequences, like lists, but they are immutable, like strings. Syntactically, they are coded in parentheses instead of square brackets, and they support arbitrary types, arbitrary nesting, and the usual sequence operations:

>>> T = (1, 2, 3, 4)            # A 4-item tuple
>>> len(T)                      # Length
4

>> T + (5, 6)                   # Concatenation
(1, 2, 3, 4, 5, 6)

>>> T[0]                        # Indexing, slicing, and more
1

Tuples also have two type-specific callable methods in Python 3.0, but not nearly as many as lists:

>>> T.index(4)                  # Tuple methods: 4 appears at offset 3
3
>>> T.count(4)                  # 4 appears once
1

The primary distinction for tuples is that they cannot be changed once created. That is, they are immutable sequences:

>>> T[0] = 2                    # Tuples are immutable
...error text omitted...
TypeError: 'tuple' object does not support item assignment

Like lists and dictionaries, tuples support mixed types and nesting, but they don’t grow and shrink because they are immutable:

>>> T = ('spam', 3.0, [11, 22, 33])
>>> T[1]
3.0
>>> T[2][1]
22
>>> T.append(4)
AttributeError: 'tuple' object has no attribute 'append'

Why Tuples?

So, why have a type that is like a list, but supports fewer operations? Frankly, tuples are not generally used as often as lists in practice, but their immutability is the whole point. If you pass a collection of objects around your program as a list, it can be changed anywhere; if you use a tuple, it cannot. That is, tuples provide a sort of integrity constraint that is convenient in programs larger than those we’ll write here. We’ll talk more about tuples later in the book. For now, though, let’s jump ahead to our last major core type: the file.

Files

File objects are Python code’s main interface to external files on your computer. Files are a core type, but they’re something of an oddball—there is no specific literal syntax for creating them. Rather, to create a file object, you call the built-in open function, passing in an external filename and a processing mode as strings. For example, to create a text output file, you would pass in its name and the 'w' processing mode string to write data:

>>> f = open('data.txt', 'w')      # Make a new file in output mode
>>> f.write('Hello\n')             # Write strings of bytes to it
6
>>> f.write('world\n')             # Returns number of bytes written in Python 3.0
6
>>> f.close()                      # Close to flush output buffers to disk

This creates a file in the current directory and writes text to it (the filename can be a full directory path if you need to access a file elsewhere on your computer). To read back what you just wrote, reopen the file in 'r' processing mode, for reading text input—this is the default if you omit the mode in the call. Then read the file’s content into a string, and display it. A file’s contents are always a string in your script, regardless of the type of data the file contains:

>>> f = open('data.txt')           # 'r' is the default processing mode
>>> text = f.read()                # Read entire file into a string
>>> text
'Hello\nworld\n'

>>> print(text)                    # print interprets control characters
Hello
world

>>> text.split()                   # File content is always a string
['Hello', 'world']

Other file object methods support additional features we don’t have time to cover here. For instance, file objects provide more ways of reading and writing (read accepts an optional byte size, readline reads one line at a time, and so on), as well as other tools (seek moves to a new file position). As we’ll see later, though, the best way to read a file today is to not read it at all—files provide an iterator that automatically reads line by line in for loops and other contexts.

We’ll meet the full set of file methods later in this book, but if you want a quick preview now, run a dir call on any open file and a help on any of the method names that come back:

>>> dir(f)
[ ...many names omitted...
'buffer', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty',
'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline',
'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write',
'writelines']

>>>help(f.seek)
...try it and see...

Later in the book, we’ll also see that files in Python 3.0 draw a sharp distinction between text and binary data. Text files represent content as strings and perform Unicode encoding and decoding automatically, while binary files represent content as a special bytes string type and allow you to access file content unaltered (the following partial example assumes there is already a binary file in your current directory):

>>> data = open('data.bin', 'rb').read()       # Open binary file
>>> data                                       # bytes string holds binary data
b'\x00\x00\x00\x07spam\x00\x08'
>>> data[4:8]
b'spam'

Although you won’t generally need to care about this distinction if you deal only with ASCII text, Python 3.0’s strings and files are an asset if you deal with internationalized applications or byte-oriented data.

Other File-Like Tools

The open function is the workhorse for most file processing you will do in Python. For more advanced tasks, though, Python comes with additional file-like tools: pipes, FIFOs, sockets, keyed-access files, persistent object shelves, descriptor-based files, relational and object-oriented database interfaces, and more. Descriptor files, for instance, support file locking and other low-level tools, and sockets provide an interface for networking and interprocess communication. We won’t cover many of these topics in this book, but you’ll find them useful once you start programming Python in earnest.

Other Core Types

Beyond the core types we’ve seen so far, there are others that may or may not qualify for membership in the set, depending on how broadly it is defined. Sets, for example, are a recent addition to the language that are neither mappings nor sequences; rather, they are unordered collections of unique and immutable objects. Sets are created by calling the built-in set function or using new set literals and expressions in 3.0, and they support the usual mathematical set operations (the choice of new {...} syntax for set literals in 3.0 makes sense, since sets are much like the keys of a valueless dictionary):

>>> X = set('spam')                 # Make a set out of a sequence in 2.6 and 3.0
>>> Y = {'h', 'a', 'm'}             # Make a set with new 3.0 set literals
>>> X, Y
({'a', 'p', 's', 'm'}, {'a', 'h', 'm'})

>>> X & Y                           # Intersection
{'a', 'm'}

>>> X | Y                           # Union
{'a', 'p', 's', 'h', 'm'}

>>> X – Y                           # Difference
{'p', 's'}

>>> {x ** 2 for x in [1, 2, 3, 4]}  # Set comprehensions in 3.0
{16, 1, 4, 9}

In addition, Python recently grew a few new numeric types: decimal numbers (fixed-precision floating-point numbers) and fraction numbers (rational numbers with both a numerator and a denominator). Both can be used to work around the limitations and inherent inaccuracies of floating-point math:

>>> 1 / 3                           # Floating-point (use .0 in Python 2.6)
0.33333333333333331
>>> (2/3) + (1/2)
1.1666666666666665

>>> import decimal                  # Decimals: fixed precision
>>> d = decimal.Decimal('3.141')
>>> d + 1
Decimal('4.141')

>>> decimal.getcontext().prec = 2
>>> decimal.Decimal('1.00') / decimal.Decimal('3.00')
Decimal('0.33')

>>> from fractions import Fraction  # Fractions: numerator+denominator
>>> f = Fraction(2, 3)
>>> f + 1
Fraction(5, 3)
>>> f + Fraction(1, 2)
Fraction(7, 6)

Python also comes with Booleans (with predefined True and False objects that are essentially just the integers 1 and 0 with custom display logic), and it has long supported a special placeholder object called None commonly used to initialize names and objects:

>>> 1 > 2, 1 < 2                    # Booleans
(False, True)
>>> bool('spam')
True

>>> X = None                        # None placeholder
>>> print(X)
None
>>> L = [None] * 100                # Initialize a list of 100 Nones
>>> L
[None, None, None, None, None, None, None, None, None, None, None, None,
None, None, None, None, None, None, None, None, ...a list of 100 Nones...]

How to Break Your Code’s Flexibility

I’ll have more to say about all of Python’s object types later, but one merits special treatment here. The type object, returned by the type built-in function, is an object that gives the type of another object; its result differs slightly in 3.0, because types have merged with classes completely (something we’ll explore in the context of “new-style” classes in Part VI). Assuming L is still the list of the prior section:

# In Python 2.6:

>>> type(L)                         # Types: type of L is list type object
<type 'list'>
>>> type(type(L))                   # Even types are objects
<type 'type'>

# In Python 3.0:

>>> type(L)                         # 3.0: types are classes, and vice versa
<class 'list'>
>>> type(type(L))                   # See Chapter 31 for more on class types
<class 'type'>

Besides allowing you to explore your objects interactively, the practical application of this is that it allows code to check the types of the objects it processes. In fact, there are at least three ways to do so in a Python script:

>>> if type(L) == type([]):         # Type testing, if you must...
        print('yes')

yes
>>> if type(L) == list:             # Using the type name
        print('yes')

yes
>>> if isinstance(L, list):         # Object-oriented tests
        print('yes')

yes

Now that I’ve shown you all these ways to do type testing, however, I am required by law to tell you that doing so is almost always the wrong thing to do in a Python program (and often a sign of an ex-C programmer first starting to use Python!). The reason why won’t become completely clear until later in the book, when we start writing larger code units such as functions, but it’s a (perhaps the) core Python concept. By checking for specific types in your code, you effectively break its flexibility—you limit it to working on just one type. Without such tests, your code may be able to work on a whole range of types.

This is related to the idea of polymorphism mentioned earlier, and it stems from Python’s lack of type declarations. As you’ll learn, in Python, we code to object interfaces (operations supported), not to types. Not caring about specific types means that code is automatically applicable to many of them—any object with a compatible interface will work, regardless of its specific type. Although type checking is supported—and even required, in some rare cases—you’ll see that it’s not usually the “Pythonic” way of thinking. In fact, you’ll find that polymorphism is probably the key idea behind using Python well.

User-Defined Classes

We’ll study object-oriented programming in Python—an optional but powerful feature of the language that cuts development time by supporting programming by customization—in depth later in this book. In abstract terms, though, classes define new types of objects that extend the core set, so they merit a passing glance here. Say, for example, that you wish to have a type of object that models employees. Although there is no such specific core type in Python, the following user-defined class might fit the bill:

>>> class Worker:
         def __init__(self, name, pay):          # Initialize when created
             self.name = name                    # self is the new object
             self.pay  = pay
         def lastName(self):
             return self.name.split()[-1]        # Split string on blanks
         def giveRaise(self, percent):
             self.pay *= (1.0 + percent)         # Update pay in-place

This class defines a new kind of object that will have name and pay attributes (sometimes called state information), as well as two bits of behavior coded as functions (normally called methods). Calling the class like a function generates instances of our new type, and the class’s methods automatically receive the instance being processed by a given method call (in the self argument):

>>> bob = Worker('Bob Smith', 50000)             # Make two instances
>>> sue = Worker('Sue Jones', 60000)             # Each has name and pay attrs
>>> bob.lastName()                               # Call method: bob is self
'Smith'
>>> sue.lastName()                               # sue is the self subject
'Jones'
>>> sue.giveRaise(.10)                           # Updates sue's pay
>>> sue.pay
66000.0

The implied “self” object is why we call this an object-oriented model: there is always an implied subject in functions within a class. In a sense, though, the class-based type simply builds on and uses core types—a user-defined Worker object here, for example, is just a collection of a string and a number (name and pay, respectively), plus functions for processing those two built-in objects.

The larger story of classes is that their inheritance mechanism supports software hierarchies that lend themselves to customization by extension. We extend software by writing new classes, not by changing what already works. You should also know that classes are an optional feature of Python, and simpler built-in types such as lists and dictionaries are often better tools than user-coded classes. This is all well beyond the bounds of our introductory object-type tutorial, though, so consider this just a preview; for full disclosure on user-defined types coded with classes, you’ll have to read on to Part VI.

And Everything Else

As mentioned earlier, everything you can process in a Python script is a type of object, so our object type tour is necessarily incomplete. However, even though everything in Python is an “object,” only those types of objects we’ve met so far are considered part of Python’s core type set. Other types in Python either are objects related to program execution (like functions, modules, classes, and compiled code), which we will study later, or are implemented by imported module functions, not language syntax. The latter of these also tend to have application-specific roles—text patterns, database interfaces, network connections, and so on.

Moreover, keep in mind that the objects we’ve met here are objects, but not necessarily object-oriented—a concept that usually requires inheritance and the Python class statement, which we’ll meet again later in this book. Still, Python’s core objects are the workhorses of almost every Python script you’re likely to meet, and they usually are the basis of larger noncore types.

Chapter Summary

And that’s a wrap for our concise data type tour. This chapter has offered a brief introduction to Python’s core object types and the sorts of operations we can apply to them. We’ve studied generic operations that work on many object types (sequence operations such as indexing and slicing, for example), as well as type-specific operations available as method calls (for instance, string splits and list appends). We’ve also defined some key terms, such as immutability, sequences, and polymorphism.

Along the way, we’ve seen that Python’s core object types are more flexible and powerful than what is available in lower-level languages such as C. For instance, Python’s lists and dictionaries obviate most of the work you do to support collections and searching in lower-level languages. Lists are ordered collections of other objects, and dictionaries are collections of other objects that are indexed by key instead of by position. Both dictionaries and lists may be nested, can grow and shrink on demand, and may contain objects of any type. Moreover, their space is automatically cleaned up as you go.

I’ve skipped most of the details here in order to provide a quick tour, so you shouldn’t expect all of this chapter to have made sense yet. In the next few chapters, we’ll start to dig deeper, filling in details of Python’s core object types that were omitted here so you can gain a more complete understanding. We’ll start off in the next chapter with an in-depth look at Python numbers. First, though, another quiz to review.

Test Your Knowledge: Quiz

We’ll explore the concepts introduced in this chapter in more detail in upcoming chapters, so we’ll just cover the big ideas here:

  1. Name four of Python’s core data types.

  2. Why are they called “core” data types?

  3. What does “immutable” mean, and which three of Python’s core types are considered immutable?

  4. What does “sequence” mean, and which three types fall into that category?

  5. What does “mapping” mean, and which core type is a mapping?

  6. What is “polymorphism,” and why should you care?

Test Your Knowledge: Answers

  1. Numbers, strings, lists, dictionaries, tuples, files, and sets are generally considered to be the core object (data) types. Types, None, and Booleans are sometimes classified this way as well. There are multiple number types (integer, floating point, complex, fraction, and decimal) and multiple string types (simple strings and Unicode strings in Python 2.X, and text strings and byte strings in Python 3.X).

  2. They are known as “core” types because they are part of the Python language itself and are always available; to create other objects, you generally must call functions in imported modules. Most of the core types have specific syntax for generating the objects: 'spam', for example, is an expression that makes a string and determines the set of operations that can be applied to it. Because of this, core types are hardwired into Python’s syntax. In contrast, you must call the built-in open function to create a file object.

  3. An “immutable” object is an object that cannot be changed after it is created. Numbers, strings, and tuples in Python fall into this category. While you cannot change an immutable object in-place, you can always make a new one by running an expression.

  4. A “sequence” is a positionally ordered collection of objects. Strings, lists, and tuples are all sequences in Python. They share common sequence operations, such as indexing, concatenation, and slicing, but also have type-specific method calls.

  5. The term “mapping” denotes an object that maps keys to associated values. Python’s dictionary is the only mapping type in the core type set. Mappings do not maintain any left-to-right positional ordering; they support access to data stored by key, plus type-specific method calls.

  6. “Polymorphism” means that the meaning of an operation (like a +) depends on the objects being operated on. This turns out to be a key idea (perhaps the key idea) behind using Python well—not constraining code to specific types makes that code automatically applicable to many types.



[12] In this book, the term literal simply means an expression whose syntax generates an object—sometimes also called a constant. Note that the term “constant” does not imply objects or variables that can never be changed (i.e., this term is unrelated to C++’s const or Python’s “immutable”—a topic explored in the section Immutability).

[13] This matrix structure works for small-scale tasks, but for more serious number crunching you will probably want to use one of the numeric extensions to Python, such as the open source NumPy system. Such tools can store and process large matrixes much more efficiently than our nested list structure. NumPy has been said to turn Python into the equivalent of a free and more powerful version of the Matlab system, and organizations such as NASA, Los Alamos, and JPMorgan Chase use this tool for scientific and financial tasks. Search the Web for more details.

[14] Keep in mind that the rec record we just created really could be a database record, when we employ Python’s object persistence system—an easy way to store native Python objects in files or access-by-key databases. We won’t go into details here, but watch for discussion of Python’s pickle and shelve modules later in this book.

The best content for your career. Discover unlimited learning on demand for around $1/day.