You are previewing Clojure Data Analysis Cookbook.
O'Reilly logo
Clojure Data Analysis Cookbook

Book Description

Over 110 recipes to help you dive into the world of practical data analysis using Clojure

  • Get a handle on the torrent of data the modern Internet has created

  • Recipes for every stage from collection to analysis

  • A practical approach to analyzing data to help you make informed decisions

  • In Detail

    Data is everywhere and it's increasingly important to be able to gain insights that we can act on. Using Clojure for data analysis and collection, this book will show you how to gain fresh insights and perspectives from your data with an essential collection of practical, structured recipes.

    "The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.

    You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as K-means clustering, neural networks, and association rules.

    Table of Contents

    1. Clojure Data Analysis Cookbook
      1. Table of Contents
      2. Clojure Data Analysis Cookbook
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Importing Data for Analysis
        1. Introduction
        2. Creating a new project
          1. Getting ready
          2. How to do it...
          3. How it works...
        3. Reading CSV data into Incanter datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        4. Reading JSON data into Incanter datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Reading data from Excel with Incanter
          1. Getting ready
          2. How to do it…
        6. Reading data from JDBC databases
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Reading XML data into Incanter datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
            1. Navigating structures with zippers
            2. Processing in a pipeline
            3. Comparing XML and JSON
        8. Scraping data from tables in web pages
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Scraping textual data from web pages
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Reading RDF data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        11. Reading RDF data with SPARQL
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        12. Aggregating data from different formats
          1. Getting ready
          2. How to do it…
            1. Creating the triple store
            2. Scraping exchange rates
            3. Loading currency data and tying it all together
          3. How it works…
      9. 2. Cleaning and Validating Data
        1. Introduction
        2. Cleaning data with regular expressions
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also...
        3. Maintaining consistency with synonym maps
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also…
        4. Identifying and removing duplicate data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Normalizing numbers
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Rescaling values
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Normalizing dates and times
          1. Getting ready
          2. How to do it…
          3. There's more…
        8. Lazily processing very large data sets
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Sampling from very large data sets
          1. How to do it…
            1. Sampling by percentage
            2. Sampling exactly
          2. How it works…
        10. Fixing spelling errors
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        11. Parsing custom data formats
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Validating data with Valip
          1. Getting ready
          2. How to do it…
          3. How it works…
      10. 3. Managing Complexity with Concurrent Programming
        1. Introduction
        2. Managing program complexity with STM
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        3. Managing program complexity with agents
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        4. Getting better performance with commute
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Combining agents and STM
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Maintaining consistency with ensure
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Introducing safe side effects into the STM
          1. Getting ready
          2. How to do it…
        8. Maintaining data consistency with validators
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Tracking processing with watchers
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Debugging concurrent programs with watchers
          1. Getting ready
          2. How to do it…
          3. There's more...
        11. Recovering from errors in agents
          1. How to do it…
            1. Failing on errors
            2. Continuing on errors
            3. Using a custom error handler
          2. There's more...
        12. Managing input with sized queues
          1. How to do it…
          2. How it works...
      11. 4. Improving Performance with Parallel Programming
        1. Introduction
        2. Parallelizing processing with pmap
          1. How to do it…
          2. How it works…
          3. There's more…
          4. See also
        3. Parallelizing processing with Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Partitioning Monte Carlo simulations for better pmap performance
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Estimating with Monte Carlo simulations
            2. Chunking data for pmap
        5. Finding the optimal partition size with simulated annealing
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Parallelizing with reducers
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        7. Generating online summary statistics with reducers
          1. Getting ready
          2. How to do it…
        8. Harnessing your GPU with OpenCL and Calx
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Writing the GPU code in C
            2. Wrapping it in Calx
          4. There's more…
        9. Using type hints
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        10. Benchmarking with Criterium
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      12. 5. Distributed Data Processing with Cascalog
        1. Introduction
        2. Distributed processing with Cascalog and Hadoop
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        3. Querying data with Cascalog
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Distributing data with Apache HDFS
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Parsing CSV files with Cascalog
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Complex queries with Cascalog
          1. Getting ready
          2. How to do it…
        7. Aggregating data with Cascalog
          1. Getting ready
          2. How to do it…
          3. There's more…
        8. Defining new Cascalog operators
          1. Getting ready
          2. How to do it…
            1. Creating Map operators
            2. Creating Map concatenation operations
            3. Creating filter operators
            4. Creating buffer operators
            5. Creating aggregate operators
            6. Creating parallel aggregate operators
        9. Composing Cascalog queries
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Handling errors in Cascalog workflows
          1. Getting ready
          2. How to do it…
        11. Transforming data with Cascalog
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Executing Cascalog queries in the Cloud with Pallet
          1. Getting ready
          2. How to do it...
          3. How it works...
      13. 6. Working with Incanter Datasets
        1. Introduction
        2. Loading Incanter's sample datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
        3. Loading Clojure data structures into datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        4. Viewing datasets interactively with view
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        5. Converting datasets to matrices
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        6. Using infix formulas in Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Selecting columns with $
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        8. Selecting rows with $
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Filtering datasets with $where
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Grouping data with $group-by
          1. Getting ready
          2. How to do it…
          3. How it works…
        11. Saving datasets to CSV and JSON
          1. Getting ready
          2. How to do it…
            1. Saving data as CSV
            2. Saving data as JSON
          3. How it works…
          4. See also
        12. Projecting from multiple datasets with $join
          1. Getting ready
          2. How to do it…
          3. How it works…
      14. 7. Preparing for and Performing Statistical Data Analysis with Incanter
        1. Introduction
        2. Generating summary statistics with $rollup
          1. Getting ready
          2. How to do it…
          3. How it works…
        3. Differencing variables to show changes
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Scaling variables to simplify variable relationships
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Working with time series data with Incanter Zoo
          1. Getting ready
          2. How to do it…
          3. There's more...
        6. Smoothing variables to decrease noise
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Validating sample statistics with bootstrapping
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Modeling linear relationships
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Modeling non-linear relationships
          1. Getting ready
          2. How to do it…
          3. How it works...
        10. Modeling multimodal Bayesian distributions
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
        11. Finding data errors with Benford's law
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      15. 8. Working with Mathematica and R
        1. Introduction
        2. Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Setting up Mathematica to talk to Clojuratica for Windows
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Calling Mathematica functions from Clojuratica
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Sending matrices to Mathematica from Clojuratica
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Evaluating Mathematica scripts from Clojuratica
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Creating functions from Mathematica
          1. Getting ready
          2. How to do it…
          3. How it works…
        8. Processing functions in parallel in Mathematica
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Setting up R to talk to Clojure
          1. Getting ready
          2. How to do it…
            1. Setting up R
            2. Setting up Clojure
          3. How it works…
        10. Calling R functions from Clojure
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        11. Passing vectors into R
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Evaluating R files from Clojure
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        13. Plotting in R from Clojure
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      16. 9. Clustering, Classifying, and Working with Weka
        1. Introduction
        2. Loading CSV and ARFF files into Weka
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Filtering and renaming columns in Weka datasets
          1. Getting ready
          2. How to do it…
            1. Renaming columns
            2. Removing columns
            3. Hiding columns
          3. How it works…
        4. Discovering groups of data using K-means clustering
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Clustering with K-means
            2. Analyzing the results
            3. Building macros
          4. See also
        5. Finding hierarchical clusters in Weka
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Clustering with SOMs in Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Classifying data with decision trees
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Classifying data with the Naive Bayesian classifier
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Classifying data with support vector machines
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Finding associations in data with the Apriori algorithm
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      17. 10. Graphing in Incanter
        1. Introduction
        2. Creating scatter plots with Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also
        3. Creating bar charts with Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Graphing non-numeric data in bar charts
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Creating histograms with Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Creating function plots with Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Adding equations to Incanter charts
          1. Getting ready
          2. How to do it…
          3. There's more…
        8. Adding lines to scatter charts
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Customizing charts with JFreeChart
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        10. Saving Incanter graphs to PNG
          1. Getting ready
          2. How to do it…
          3. How it works…
        11. Using PCA to graph multi-dimensional data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        12. Creating dynamic charts with Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
      18. 11. Creating Charts for the Web
        1. Introduction
        2. Serving data with Ring and Compojure
          1. Getting ready
          2. How to do it…
            1. Configuring and setting up the web application
            2. Serving data
            3. Defining routes and handlers
            4. Running the server
          3. How it works…
          4. There's more…
        3. Creating HTML with Hiccup
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Setting up to use ClojureScript
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Creating scatter plots with NVD3
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Creating bar charts with NVD3
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Creating histograms with NVD3
          1. Getting ready
          2. How to do it…
          3. How it works…
        8. Visualizing graphs with force-directed layouts
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Creating interactive visualizations with D3
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      19. Index