Python Data Handling - A Deeper Dive
Manipulating data is a core part of writing almost any Python program. To represent data, Python provides a small collection of built-in types such as lists, sets, dictionaries, and classes. Additionally, there are useful objects in the standard collections module that are commonly used to solve a variety of data-related problems. Finally, there are third party libraries such as numpy and Pandas that provide additional data handling resources.
In this live training, we’re going to take a deeper look at data representation in Python. Topics will include performance tradeoffs, common programming idioms, and details about Python’s underlying object model.
What you'll learn-and how you can apply it
- Learn about how and when to use different built-in types according to the problem that’s being addressed.
- Gain a much deeper awareness of how different types are implemented and their associated costs.
- Write much more efficient and elegant code for manipulating data.
This training course is for you because...
- You want to improve the way in which you write Python data handling scripts
- You’re a data scientist and you want to expand your Python knowledge beyond standard tools such as numpy and Pandas.
- You’ve written programs for handling data, but have run into various performance-related problems.
- This course assumes a prior introduction to Python programming. Attendees should know the basics of editing, running, and debugging simple programs.
- Some prior exposure to numpy or Pandas will be useful, but is not required.
Materials and downloads needed in advance of class:
- Python 3.6, numpy, and Pandas is recommended.
- Installing the “Anaconda Python” distribution for Python 3.6 will satisfy all requirements.
The Python Programming Language (video)
About your instructor
The timeframes are only estimates and may vary according to how the class is progressing
Segment 1 Data Structure Shootout (30 min)
- Instructor will describe different techniques for representing records
- Participants will try an experiment trying to figure out most efficient data representation for a large CSV file of data.
Segment 2 The collections module (25 min)
- Instructors will describe a few common data handling problems and show solutions with the collections module.
- Participants will use collections to answer a few questions about the data in Segment 1
Segment 3 Python Object Model (30 min)
- Instructor will describe the manner in which Python handles objects and some implementation details about Python containers.
- Participants will see if they can use this newfound knowledge to more efficiently handle the data read in Segment 1.
Break (15 min)
Segment 4 Thinking in Functions (25 min)
- Instructors will describe common programming idioms related to a functional programming style. These include list comprehensions, map, reduce, etc.
- Participants will try some simple data handling experiments to reinforce concepts.
Segment 5 Thinking in Columns (30 min)
- Instructors will describe an alternative view on data based on arrays and columns. Numpy arrays and Pandas dataframes are introduced. Common array-oriented programming idioms are introduced.
- Participants will rework some earlier examples using arrays and column oriented thinking.
Segment 6 Thinking in Streams (25 min)
- Instructors will introduce stream processing as a useful idiom for solving a variety of data handling problems. Topics will include Python iteration, generator functions, and generator expressions.
- Participants will reformulate earlier examples to utilize a stream-processing approach.