You are previewing Haskell Data Analysis Cookbook.
O'Reilly logo
Haskell Data Analysis Cookbook

Book Description

Explore intuitive data analysis techniques and powerful machine learning methods using over 130 practical recipes

In Detail

This book will take you on a voyage through all the steps involved in data analysis. It provides synergy between Haskell and data modeling, consisting of carefully chosen examples featuring some of the most popular machine learning techniques.

You will begin with how to obtain and clean data from various sources. You will then learn how to use various data structures such as trees and graphs. The meat of data analysis occurs in the topics involving statistical techniques, parallelism, concurrency, and machine learning algorithms, along with various examples of visualizing and exporting results. By the end of the book, you will be empowered with techniques to maximize your potential when using Haskell for data analysis.

What You Will Learn

  • Obtain and analyze raw data from various sources including text files, CSV files, databases, and websites
  • Implement practical tree and graph algorithms on various datasets
  • Apply statistical methods such as moving average and linear regression to understand patterns
  • Fiddle with parallel and concurrent code to speed up and simplify time-consuming algorithms
  • Find clusters in data using some of the most popular machine learning algorithms
  • Manage results by visualizing or exporting data
  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Haskell Data Analysis Cookbook
      1. Table of Contents
      2. Haskell Data Analysis Cookbook
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. The Hunt for Data
        1. Introduction
        2. Harnessing data from various sources
          1. How to do it...
            1. News
            2. Private
            3. Academic
            4. Nonprofits
            5. The United States government
        3. Accumulating text data from a file path
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Catching I/O code faults
          1. How to do it…
          2. How it works…
          3. There's more…
        5. Keeping and representing data from a CSV file
          1. Getting ready
          2. How to do it...
          3. How it works...
        6. Examining a JSON file with the aeson package
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more…
        7. Reading an XML file using the HXT package
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Capturing table rows from an HTML page
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Understanding how to perform HTTP GET requests
          1. Getting ready
          2. How to do it...
          3. How it works…
          4. See also…
        10. Learning how to perform HTTP POST requests
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Traversing online directories for data
          1. Getting ready
          2. How to do it...
          3. How it works...
        12. Using MongoDB queries in Haskell
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        13. Reading from a remote MongoDB server
          1. Getting ready
          2. How to do it...
          3. See also
        14. Exploring data from a SQLite database
          1. Getting ready
          2. How to do it…
      9. 2. Integrity and Inspection
        1. Introduction
        2. Trimming excess whitespace
          1. How to do it...
          2. How it works...
          3. There's more…
        3. Ignoring punctuation and specific characters
          1. How to do it...
          2. There's more...
        4. Coping with unexpected or missing input
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        5. Validating records by matching regular expressions
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Lexing and parsing an e-mail address
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Deduplication of nonconflicting data items
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        8. Deduplication of conflicting data items
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        9. Implementing a frequency table using Data.List
          1. How to do it...
          2. How it works...
          3. See also
        10. Implementing a frequency table using Data.MultiSet
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Computing the Manhattan distance
          1. Getting ready
          2. How to do it...
          3. See also
        12. Computing the Euclidean distance
          1. Getting ready
          2. How to do it...
          3. See also
        13. Comparing scaled data using the Pearson correlation coefficient
          1. How to do it...
          2. How it works...
        14. Comparing sparse data using cosine similarity
          1. How to do it...
          2. See also
      10. 3. The Science of Words
        1. Introduction
        2. Displaying a number in another base
          1. How to do it...
          2. How it works...
          3. See also
        3. Reading a number from another base
          1. How to do it...
          2. How it works...
          3. See also
        4. Searching for a substring using Data.ByteString
          1. How to do it...
          2. How it works...
          3. There's more...
          4. See also
        5. Searching a string using the Boyer-Moore-Horspool algorithm
          1. How to do it...
          2. How it works...
          3. There's more...
          4. See also
        6. Searching a string using the Rabin-Karp algorithm
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Splitting a string on lines, words, or arbitrary tokens
          1. Getting ready
          2. How to do it...
        8. Finding the longest common subsequence
          1. Getting ready
          2. How to do it...
          3. How it works...
        9. Computing a phonetic code
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        10. Computing the edit distance
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        11. Computing the Jaro-Winkler distance between two strings
          1. Getting ready
          2. How to do it...
          3. See also
        12. Finding strings within one-edit distance
          1. Getting ready
          2. How to do it...
          3. There's more...
          4. See also
        13. Fixing spelling mistakes
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
      11. 4. Data Hashing
        1. Introduction
        2. Hashing a primitive data type
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Hashing a custom data type
          1. Getting ready
          2. How to do it…
          3. There's more…
          4. See also
        4. Running popular cryptographic hash functions
          1. Getting ready
          2. How to do it…
          3. See also
        5. Running a cryptographic checksum on a file
          1. Getting ready
          2. How to do it…
          3. See also
        6. Performing fast comparisons between data types
          1. How to do it…
        7. Using a high-performance hash table
          1. Getting ready
          2. How to do it…
          3. How it works…
        8. Using Google's CityHash hash functions for strings
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Computing a Geohash for location coordinates
          1. Getting ready
          2. How to do it…
        10. Using a bloom filter to remove unique items
          1. Getting ready
          2. How to do it…
          3. How it works…
        11. Running MurmurHash, a simple but speedy hashing algorithm
          1. Getting ready
          2. How to do it…
        12. Measuring image similarity with perceptual hashes
          1. Getting ready
          2. How to do it…
          3. How it works…
      12. 5. The Dance with Trees
        1. Introduction
        2. Defining a binary tree data type
          1. Getting ready
          2. How to do it...
          3. See also
        3. Defining a rose tree (multiway tree) data type
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Traversing a tree depth-first
          1. Getting ready
          2. How to do it...
          3. How it works…
          4. See also
        5. Traversing a tree breadth-first
          1. Getting ready
          2. How to do it...
          3. How it works…
          4. See also
        6. Implementing a Foldable instance for a tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        7. Calculating the height of a tree
          1. Getting ready
          2. How to do it...
          3. How it works...
        8. Implementing a binary search tree data structure
          1. How to do it...
          2. How it works...
          3. See also
        9. Verifying the order property of a binary search tree
          1. Getting ready
          2. How to do it...
          3. How it works...
        10. Using a self-balancing tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more…
        11. Implementing a min-heap data structure
          1. Getting started
          2. How to do it...
          3. There's more…
        12. Encoding a string using a Huffman tree
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        13. Decoding a Huffman code
          1. Getting ready
          2. How to do it...
          3. See also
      13. 6. Graph Fundamentals
        1. Introduction
        2. Representing a graph from a list of edges
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        3. Representing a graph from an adjacency list
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        4. Conducting a topological sort on a graph
          1. Getting ready
          2. How to do it...
        5. Traversing a graph depth-first
          1. How to do it...
        6. Traversing a graph breadth-first
          1. How to do it...
        7. Visualizing a graph using Graphviz
          1. Getting ready
          2. How to do it...
        8. Using Directed Acyclic Word Graphs
          1. Getting ready
          2. How to do it...
        9. Working with hexagonal and square grid networks
          1. Getting started
          2. How to do it...
        10. Finding maximal cliques in a graph
          1. Getting started
          2. How to do it...
          3. How it works...
        11. Determining whether any two graphs are isomorphic
          1. Getting started
          2. How to do it...
      14. 7. Statistics and Analysis
        1. Introduction
        2. Calculating a moving average
          1. Getting ready
          2. How to do it…
          3. There's more…
          4. See also
        3. Calculating a moving median
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        4. Approximating a linear regression
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        5. Approximating a quadratic regression
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        6. Obtaining the covariance matrix from samples
          1. Getting ready
          2. How to do it…
        7. Finding all unique pairings in a list
          1. How it works…
          2. See also
        8. Using the Pearson correlation coefficient
          1. Getting ready
          2. How to do it…
        9. Evaluating a Bayesian network
          1. Getting ready
          2. How to do it…
        10. Creating a data structure for playing cards
          1. Getting ready
          2. How to do it…
        11. Using a Markov chain to generate text
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Creating n-grams from a list
          1. How to do it…
        13. Creating a neural network perceptron
          1. Getting ready
          2. How to do it…
      15. 8. Clustering and Classification
        1. Introduction
        2. Implementing the k-means clustering algorithm
          1. How to do it…
          2. How it works…
          3. There's more…
          4. See also
        3. Implementing hierarchical clustering
          1. How to do it…
          2. How it works…
          3. There's more…
          4. See also
        4. Using a hierarchical clustering library
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Finding the number of clusters
          1. Getting ready
          2. How to do it…
        6. Clustering words by their lexemes
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Classifying the parts of speech of words
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        8. Identifying key words in a corpus of text
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Training a parts-of-speech tagger
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        10. Implementing a decision tree classifier
          1. Getting ready
          2. How to do it…
          3. How it works…
        11. Implementing a k-Nearest Neighbors classifier
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Visualizing points using Graphics.EasyPlot
          1. Getting ready
          2. How to do it…
          3. How it works…
      16. 9. Parallel and Concurrent Design
        1. Introduction
        2. Using the Haskell Runtime System options
          1. How to do it…
          2. How it works…
          3. There's more…
        3. Evaluating a procedure in parallel
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        4. Controlling parallel algorithms in sequence
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        5. Forking I/O actions for concurrency
          1. How to do it…
          2. See also
        6. Communicating with a forked I/O action
          1. Getting ready
          2. How to do it…
          3. See also
        7. Killing forked threads
          1. How to do it…
          2. How it works...
        8. Parallelizing pure functions using the Par monad
          1. Getting ready
          2. How to do it…
          3. There's more…
          4. See also
        9. Mapping over a list in parallel
          1. How to do it…
          2. How it works…
          3. There's more…
          4. See also
        10. Accessing tuple elements in parallel
          1. How to do it…
          2. There's more…
          3. See also
        11. Implementing MapReduce to count word frequencies
          1. Getting ready
          2. How to do it…
        12. Manipulating images in parallel using Repa
          1. Getting ready
          2. How to do it…
          3. How it works…
        13. Benchmarking runtime performance in Haskell
          1. How to do it…
          2. See also
        14. Using the criterion package to measure performance
          1. Getting ready
          2. How to do it…
          3. How it works…
        15. Benchmarking runtime performance in the terminal
          1. Getting ready
          2. How to do it…
          3. See also
      17. 10. Real-time Data
        1. Introduction
        2. Streaming Twitter for real-time sentiment analysis
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Reading IRC chat room messages
          1. Getting ready
          2. How to do it…
          3. See also
        4. Responding to IRC messages
          1. Getting ready
          2. How to do it…
          3. See also
        5. Polling a web server for latest updates
          1. How to do it…
        6. Detecting real-time file directory changes
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Communicating in real time through sockets
          1. How to do it…
          2. How it works…
        8. Detecting faces and eyes through a camera stream
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Streaming camera frames for template matching
          1. Getting ready
          2. How to do it…
          3. There's more…
      18. 11. Visualizing Data
        1. Introduction
        2. Plotting a line chart using Google's Chart API
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Plotting a pie chart using Google's Chart API
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        4. Plotting bar graphs using Google's Chart API
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        5. Displaying a line graph using gnuplot
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        6. Displaying a scatter plot of two-dimensional points
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Interacting with points in a three-dimensional space
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        8. Visualizing a graph network
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        9. Customizing the looks of a graph network diagram
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Rendering a bar graph in JavaScript using D3.js
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        11. Rendering a scatter plot in JavaScript using D3.js
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        12. Diagramming a path from a list of vectors
          1. Getting ready
          2. How to do it…
          3. How it works…
      19. 12. Exporting and Presenting
        1. Introduction
        2. Exporting data to a CSV file
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Exporting data as JSON
          1. Getting ready
          2. How to do it…
          3. There's more…
          4. See also
        4. Using SQLite to store data
          1. Getting Ready
          2. How to do it…
          3. See also
        5. Saving data to a MongoDB database
          1. Getting ready
          2. How to do it…
          3. See also
        6. Presenting results in an HTML web page
          1. Getting ready
          2. How to do it…
          3. See also
        7. Creating a LaTeX table to display results
          1. Getting Ready
          2. How to do it…
          3. See also
        8. Personalizing messages using a text template
          1. Getting ready
          2. How to do it…
        9. Exporting matrix values to a file
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      20. Index