You are previewing Clojure Data Analysis Cookbook - Second Edition.
O'Reilly logo
Clojure Data Analysis Cookbook - Second Edition

Book Description

Dive into data analysis with Clojure through over 100 practical recipes for every stage of the analysis and collection process

In Detail

As data invades more and more of life and business, the need to analyze it effectively has never been greater. With Clojure and this book, you'll soon be getting to grips with every aspect of data analysis. You'll start with practical recipes that show you how to load and clean your data, then get concise instructions to perform all the essential analysis tasks from basic statistics to sophisticated machine learning and data clustering algorithms. Get a more intuitive handle on your data through hands-on visualization techniques that allow you to provide interesting, informative, and compelling reports, and use Clojure to publish your findings to the Web.

What You Will Learn

  • Read data from a variety of data formats

  • Transform data to make it more useful and easier to analyze

  • Process data concurrently and in parallel for faster performance

  • Harness multiple computers to analyze big data

  • Use powerful data analysis libraries such as Incanter, Hadoop, and Weka to get things done quickly

  • Apply powerful clustering and data mining techniques to better understand your data

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. Clojure Data Analysis Cookbook Second Edition
      1. Table of Contents
      2. Clojure Data Analysis Cookbook Second Edition
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Downloading the color images of this book
          3. Errata
          4. Piracy
          5. Questions
      8. 1. Importing Data for Analysis
        1. Introduction
        2. Creating a new project
          1. Getting ready
          2. How to do it...
          3. How it works...
        3. Reading CSV data into Incanter datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Reading JSON data into Incanter datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Reading data from Excel with Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Reading data from JDBC databases
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        7. Reading XML data into Incanter datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
            1. Navigating structures with zippers
            2. Processing in a pipeline
            3. Comparing XML and JSON
        8. Scraping data from tables in web pages
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Scraping textual data from web pages
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Reading RDF data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        11. Querying RDF data with SPARQL
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        12. Aggregating data from different formats
          1. Getting ready
          2. How to do it…
            1. Creating the triple store
            2. Scraping exchange rates
            3. Loading currency data and tying it all together
          3. How it works…
          4. See also
      9. 2. Cleaning and Validating Data
        1. Introduction
        2. Cleaning data with regular expressions
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
          5. See also
        3. Maintaining consistency with synonym maps
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        4. Identifying and removing duplicate data
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Regularizing numbers
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Calculating relative values
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Parsing dates and times
          1. Getting ready
          2. How to do it…
          3. There's more…
        8. Lazily processing very large data sets
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Sampling from very large data sets
          1. Getting ready
          2. How to do it…
            1. Sampling by percentage
            2. Sampling exactly
          3. How it works…
        10. Fixing spelling errors
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        11. Parsing custom data formats
          1. Getting ready
          2. How to do it…
          3. How it works…
        12. Validating data with Valip
          1. Getting ready
          2. How to do it…
          3. How it works…
      10. 3. Managing Complexity with Concurrent Programming
        1. Introduction
        2. Managing program complexity with STM
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        3. Managing program complexity with agents
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        4. Getting better performance with commute
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Combining agents and STM
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Maintaining consistency with ensure
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Introducing safe side effects into the STM
          1. Getting ready
          2. How to do it…
        8. Maintaining data consistency with validators
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        9. Monitoring processing with watchers
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Debugging concurrent programs with watchers
          1. Getting ready
          2. How to do it…
          3. There's more...
        11. Recovering from errors in agents
          1. How to do it…
            1. Failing on errors
            2. Continuing on errors
            3. Using a custom error handler
          2. There's more...
        12. Managing large inputs with sized queues
          1. How to do it…
          2. How it works...
      11. 4. Improving Performance with Parallel Programming
        1. Introduction
        2. Parallelizing processing with pmap
          1. How to do it…
          2. How it works…
          3. There's more…
          4. See also
        3. Parallelizing processing with Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Partitioning Monte Carlo simulations for better pmap performance
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Estimating with Monte Carlo simulations
            2. Chunking data for pmap
        5. Finding the optimal partition size with simulated annealing
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Combining function calls with reducers
          1. Getting ready
          2. How to do it…
            1. What happened here?
          3. There's more...
          4. See also
        7. Parallelizing with reducers
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        8. Generating online summary statistics for data streams with reducers
          1. Getting ready
          2. How to do it…
        9. Using type hints
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        10. Benchmarking with Criterium
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
      12. 5. Distributed Data Processing with Cascalog
        1. Introduction
        2. Initializing Cascalog and Hadoop for distributed processing
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also
        3. Querying data with Cascalog
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more
        4. Distributing data with Apache HDFS
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Parsing CSV files with Cascalog
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more
        6. Executing complex queries with Cascalog
          1. Getting ready
          2. How to do it…
        7. Aggregating data with Cascalog
          1. Getting ready
          2. How to do it…
          3. There's more
        8. Defining new Cascalog operators
          1. Getting ready
          2. How to do it…
            1. Creating map operators
            2. Creating map concatenation operators
            3. Creating filter operators
            4. Creating buffer operators
            5. Creating aggregate operators
            6. Creating parallel aggregate operators
        9. Composing Cascalog queries
          1. Getting ready
          2. How to do it…
          3. How it works…
        10. Transforming data with Cascalog
          1. Getting ready
          2. How to do it…
          3. How it works…
      13. 6. Working with Incanter Datasets
        1. Introduction
        2. Loading Incanter's sample datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
        3. Loading Clojure data structures into datasets
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also…
        4. Viewing datasets interactively with view
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also…
        5. Converting datasets to matrices
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also…
        6. Using infix formulas in Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Selecting columns with $
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also…
        8. Selecting rows with $
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Filtering datasets with $where
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Grouping data with $group-by
          1. Getting ready
          2. How to do it…
          3. How it works…
        11. Saving datasets to CSV and JSON
          1. Getting ready
          2. How to do it…
            1. Saving data as CSV
            2. Saving data as JSON
          3. How it works…
          4. See also…
        12. Projecting from multiple datasets with $join
          1. Getting ready
          2. How to do it…
          3. How it works…
      14. 7. Statistical Data Analysis with Incanter
        1. Introduction
        2. Generating summary statistics with $rollup
          1. Getting ready
          2. How to do it…
          3. How it works…
        3. Working with changes in values
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Scaling variables to simplify variable relationships
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Working with time series data with Incanter Zoo
          1. Getting ready
          2. How to do it…
          3. There's more...
        6. Smoothing variables to decrease variation
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Validating sample statistics with bootstrapping
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Modeling linear relationships
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Modeling non-linear relationships
          1. Getting ready
          2. How to do it…
          3. How it works...
        10. Modeling multinomial Bayesian distributions
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more...
        11. Finding data errors with Benford's law
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      15. 8. Working with Mathematica and R
        1. Introduction
        2. Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        3. Setting up Mathematica to talk to Clojuratica for Windows
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Calling Mathematica functions from Clojuratica
          1. Getting ready
          2. How to do it…
          3. How it works…
        5. Sending matrixes to Mathematica from Clojuratica
          1. Getting ready
          2. How to do it…
          3. How it works…
        6. Evaluating Mathematica scripts from Clojuratica
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Creating functions from Mathematica
          1. Getting ready
          2. How to do it…
          3. How it works…
        8. Setting up R to talk to Clojure
          1. Getting ready
          2. How to do it…
            1. Setting up R
            2. Setting up Clojure
          3. How it works…
        9. Calling R functions from Clojure
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Passing vectors into R
          1. Getting ready
          2. How to do it…
          3. How it works…
        11. Evaluating R files from Clojure
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        12. Plotting in R from Clojure
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      16. 9. Clustering, Classifying, and Working with Weka
        1. Introduction
        2. Loading CSV and ARFF files into Weka
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also…
        3. Filtering, renaming, and deleting columns in Weka datasets
          1. Getting ready
          2. How to do it…
            1. Renaming columns
            2. Removing columns
            3. Hiding columns
          3. How it works…
        4. Discovering groups of data using K-Means clustering
          1. Getting ready
          2. How to do it…
          3. How it works…
            1. Clustering with K-Means
            2. Analyzing the results
            3. Building macros
          4. See also…
        5. Finding hierarchical clusters in Weka
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Clustering with SOMs in Incanter
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        7. Classifying data with decision trees
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        8. Classifying data with the Naive Bayesian classifier
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Classifying data with support vector machines
          1. Getting ready
          2. How to do it…
          3. There's more…
        10. Finding associations in data with the Apriori algorithm
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      17. 10. Working with Unstructured and Textual Data
        1. Introduction
        2. Tokenizing text
          1. Getting ready
          2. How to do it…
          3. How it works…
        3. Finding sentences
          1. Getting ready
          2. How to do it…
          3. How it works…
        4. Focusing on content words with stoplists
          1. Getting ready
          2. How to do it…
        5. Getting document frequencies
          1. Getting ready
          2. How to do it…
        6. Scaling document frequencies by document size
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Scaling document frequencies with TF-IDF
          1. Getting ready
          2. How to do it…
          3. How it works…
        8. Finding people, places, and things with Named Entity Recognition
          1. Getting ready
          2. How to do it…
          3. How it works…
        9. Mapping documents to a sparse vector space representation
          1. Getting ready…
          2. How to do it…
        10. Performing topic modeling with MALLET
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. See also…
        11. Performing naïve Bayesian classification with MALLET
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
          5. See also…
      18. 11. Graphing in Incanter
        1. Introduction
        2. Creating scatter plots with Incanter
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
          5. See also
        3. Graphing non-numeric data in bar charts
          1. Getting ready
          2. How to do it...
          3. How it works...
        4. Creating histograms with Incanter
          1. Getting ready
          2. How to do it...
          3. How it works...
        5. Creating function plots with Incanter
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        6. Adding equations to Incanter charts
          1. Getting ready
          2. How to do it...
          3. There's more...
        7. Adding lines to scatter charts
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        8. Customizing charts with JFreeChart
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. See also
        9. Customizing chart colors and styles
          1. Getting ready
          2. How to do it...
        10. Saving Incanter graphs to PNG
          1. Getting ready
          2. How to do it...
          3. How it works...
        11. Using PCA to graph multi-dimensional data
          1. Getting ready
          2. How to do it...
          3. How it works...
          4. There's more...
        12. Creating dynamic charts with Incanter
          1. Getting ready
          2. How to do it...
          3. How it works...
      19. 12. Creating Charts for the Web
        1. Introduction
        2. Serving data with Ring and Compojure
          1. Getting ready
          2. How to do it…
            1. Configuring and setting up the web application
            2. Serving data
            3. Defining routes and handlers
            4. Running the server
          3. How it works…
          4. There's more…
        3. Creating HTML with Hiccup
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        4. Setting up to use ClojureScript
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        5. Creating scatter plots with NVD3
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        6. Creating bar charts with NVD3
          1. Getting ready
          2. How to do it…
          3. How it works…
        7. Creating histograms with NVD3
          1. Getting ready
          2. How to do it…
          3. How it works…
        8. Creating time series charts with D3
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        9. Visualizing graphs with force-directed layouts
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
        10. Creating interactive visualizations with D3
          1. Getting ready
          2. How to do it…
          3. How it works…
          4. There's more…
      20. Index