O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Science Essentials in Python

Book Description

Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work. This one-stop solution covers the essential data science you need in Python.

Table of Contents

  1.  Acknowledgments
  2.  Preface
    1. About This Book
    2. About the Audience
    3. About the Software
    4. Notes on Quotes
    5. The Book Forum
    6. Your Turn
  3. 1. What Is Data Science?
    1. Unit 1. Data Analysis Sequence
    2. Unit 2. Data Acquisition Pipeline
    3. Unit 3. Report Structure
    4. Your Turn
  4. 2. Core Python for Data Science
    1. Unit 4. Understanding Basic String Functions
    2. Unit 5. Choosing the Right Data Structure
    3. Unit 6. Comprehending Lists Through List Comprehension
    4. Unit 7. Counting with Counters
    5. Unit 8. Working with Files
    6. Unit 9. Reaching the Web
    7. Unit 10. Pattern Matching with Regular Expressions
    8. Unit 11. Globbing File Names and Other Strings
    9. Unit 12. Pickling and Unpickling Data
    10. Your Turn
  5. 3. Working with Text Data
    1. Unit 13. Processing HTML Files
    2. Unit 14. Handling CSV Files
    3. Unit 15. Reading JSON Files
    4. Unit 16. Processing Texts in Natural Languages
    5. Your Turn
  6. 4. Working with Databases
    1. Unit 17. Setting Up a MySQL Database
    2. Unit 18. Using a MySQL Database: Command Line
    3. Unit 19. Using a MySQL Database: pymysql
    4. Unit 20. Taming Document Stores: MongoDB
    5. Your Turn
  7. 5. Working with Tabular Numeric Data
    1. Unit 21. Creating Arrays
    2. Unit 22. Transposing and Reshaping
    3. Unit 23. Indexing and Slicing
    4. Unit 24. Broadcasting
    5. Unit 25. Demystifying Universal Functions
    6. Unit 26. Understanding Conditional Functions
    7. Unit 27. Aggregating and Ordering Arrays
    8. Unit 28. Treating Arrays as Sets
    9. Unit 29. Saving and Reading Arrays
    10. Unit 30. Generating a Synthetic Sine Wave
    11. Your Turn
  8. 6. Working with Data Series and Frames
    1. Unit 31. Getting Used to Pandas Data Structures
    2. Unit 32. Reshaping Data
    3. Unit 33. Handling Missing Data
    4. Unit 34. Combining Data
    5. Unit 35. Ordering and Describing Data
    6. Unit 36. Transforming Data
    7. Unit 37. Taming Pandas File I/O
    8. Your Turn
  9. 7. Working with Network Data
    1. Unit 38. Dissecting Graphs
    2. Unit 39. Network Analysis Sequence
    3. Unit 40. Harnessing Networkx
    4. Your Turn
  10. 8. Plotting
    1. Unit 41. Basic Plotting with PyPlot
    2. Unit 42. Getting to Know Other Plot Types
    3. Unit 43. Mastering Embellishments
    4. Unit 44. Plotting with Pandas
    5. Your Turn
  11. 9. Probability and Statistics
    1. Unit 45. Reviewing Probability Distributions
    2. Unit 46. Recollecting Statistical Measures
    3. Unit 47. Doing Stats the Python Way
    4. Your Turn
  12. 10. Machine Learning
    1. Unit 48. Designing a Predictive Experiment
    2. Unit 49. Fitting a Linear Regression
    3. Unit 50. Grouping Data with K-Means Clustering
    4. Unit 51. Surviving in Random Decision Forests
    5. Your Turn
  13. A1. Further Reading
  14. A2. Solutions to Single-Star Projects
  15.  Bibliography