You are previewing Clojure Programming.

Clojure Programming

Cover of Clojure Programming by Chas Emerick... Published by O'Reilly Media, Inc.
  1. Clojure Programming
  2. SPECIAL OFFER: Upgrade this ebook with O’Reilly
  3. Preface
    1. Who Is This Book For?
      1. Engaged Java Developers
      2. Ruby, Python, and Other Developers
    2. How to Read This Book
      1. Start with Practical Applications of Clojure
      2. Start from the Ground Up with Clojure’s Foundational Concepts
    3. Who’s “We”?
      1. Chas Emerick
      2. Brian Carper
      3. Christophe Grand
    4. Acknowledgments
      1. And Last, but Certainly Far from Least
    5. Conventions Used in This Book
    6. Using Code Examples
    7. Safari® Books Online
    8. How to Contact Us
  4. 1. Down the Rabbit Hole
    1. Why Clojure?
    2. Obtaining Clojure
    3. The Clojure REPL
    4. No, Parentheses Actually Won’t Make You Go Blind
    5. Expressions, Operators, Syntax, and Precedence
    6. Homoiconicity
    7. The Reader
      1. Scalar Literals
      3. Whitespace and Commas
      4. Collection Literals
      5. Miscellaneous Reader Sugar
    8. Namespaces
    9. Symbol Evaluation
    10. Special Forms
      1. Suppressing Evaluation: quote
      2. Code Blocks: do
      3. Defining Vars: def
      4. Local Bindings: let
      5. Destructuring (let, Part 2)
      6. Creating Functions: fn
      7. Conditionals: if
      8. Looping: loop and recur
      9. Referring to Vars: var
      10. Java Interop: . and new
      11. Exception Handling: try and throw
      12. Specialized Mutation: set!
      13. Primitive Locking: monitor-enter and monitor-exit
    11. Putting It All Together
      1. eval
    12. This Is Just the Beginning
  5. I. Functional Programming and Concurrency
    1. 2. Functional Programming
      1. What Does Functional Programming Mean?
      2. On the Importance of Values
      3. First-Class and Higher-Order Functions
      4. Composition of Function(ality)
      5. Pure Functions
      6. Functional Programming in the Real World
    2. 3. Collections and Data Structures
      1. Abstractions over Implementations
      2. Concise Collection Access
      3. Data Structure Types
      4. Immutability and Persistence
      5. Metadata
      6. Putting Clojure’s Collections to Work
      7. In Summary
    3. 4. Concurrency and Parallelism
      1. Shifting Computation Through Time and Space
      2. Parallelism on the Cheap
      3. State and Identity
      4. Clojure Reference Types
      5. Classifying Concurrent Operations
      6. Atoms
      7. Notifications and Constraints
      8. Refs
      9. Vars
      10. Agents
      11. Using Java’s Concurrency Primitives
      12. Final Thoughts
  6. II. Building Abstractions
    1. 5. Macros
      1. What Is a Macro?
      2. Writing Your First Macro
      3. Debugging Macros
      4. Syntax
      5. When to Use Macros
      6. Hygiene
      7. Common Macro Idioms and Patterns
      8. The Implicit Arguments: &env and &form
      9. In Detail: -> and ->>
      10. Final Thoughts
    2. 6. Datatypes and Protocols
      1. Protocols
      2. Extending to Existing Types
      3. Defining Your Own Types
      4. Implementing Protocols
      5. Protocol Introspection
      6. Protocol Dispatch Edge Cases
      7. Participating in Clojure’s Collection Abstractions
      8. Final Thoughts
    3. 7. Multimethods
      1. Multimethods Basics
      2. Toward Hierarchies
      3. Hierarchies
      4. Making It Really Multiple!
      5. A Few More Things
      6. Final Thoughts
  7. III. Tools, Platform, and Projects
    1. 8. Organizing and Building Clojure Projects
      1. Project Geography
      2. Build
      3. Final Thoughts
    2. 9. Java and JVM Interoperability
      1. The JVM Is Clojure’s Foundation
      2. Using Java Classes, Methods, and Fields
      3. Handy Interop Utilities
      4. Exceptions and Error Handling
      5. Type Hinting for Performance
      6. Arrays
      7. Defining Classes and Implementing Interfaces
      8. Using Clojure from Java
      9. Collaborating Partners
    3. 10. REPL-Oriented Programming
      1. Interactive Development
      2. Tooling
      3. Debugging, Monitoring, and Patching Production in the REPL
      4. Limitations to Redefining Constructs
      5. In Summary
  8. IV. Practicums
    1. 11. Numerics and Mathematics
      1. Clojure Numerics
      2. Clojure Mathematics
      3. Equality and Equivalence
      4. Optimizing Numeric Performance
      5. Visualizing the Mandelbrot Set in Clojure
    2. 12. Design Patterns
      1. Dependency Injection
      2. Strategy Pattern
      3. Chain of Responsibility
      4. Aspect-Oriented Programming
      5. Final Thoughts
    3. 13. Testing
      1. Immutable Values and Pure Functions
      2. clojure.test
      3. Growing an HTML DSL
      4. Relying upon Assertions
    4. 14. Using Relational Databases
      2. Korma
      3. Hibernate
      4. Final Thoughts
    5. 15. Using Nonrelational Databases
      1. Getting Set Up with CouchDB and Clutch
      2. Basic CRUD Operations
      4. _changes: Abusing CouchDB as a Message Queue
      5. À la Carte Message Queues
      6. Final Thoughts
    6. 16. Clojure and the Web
      1. The “Clojure Stack”
      2. The Foundation: Ring
      3. Routing Requests with Compojure
      4. Templating
      5. Final Thoughts
    7. 17. Deploying Clojure Web Applications
      1. Java and Clojure Web Architecture
      2. Running Web Apps Locally
      3. Web Application Deployment
      4. Going Beyond Simple Web Application Deployment
  9. V. Miscellanea
    1. 18. Choosing Clojure Type Definition Forms Wisely
    2. 19. Introducing Clojure into Your Workplace
      1. Just the Facts…
      2. Emphasize Productivity
      3. Emphasize Community
      4. Be Prudent
    3. 20. What’s Next?
      1. (dissoc Clojure 'JVM)
      2. 4Clojure
      3. Overtone
      4. core.logic
      5. Pallet
      6. Avout
      7. Clojure on Heroku
  10. Index
  11. About the Authors
  12. Colophon
  13. SPECIAL OFFER: Upgrade this ebook with O’Reilly
O'Reilly logo

Parallelism on the Cheap

We’ll be examining all of Clojure’s flexible concurrency facilities in a bit, one of which—agents—can be used to orchestrate very efficient parallelization of workloads. However, sometimes you may find yourself wanting to parallelize some operation with as little ceremony as possible.

The flexibility of Clojure’s seq abstraction[128] makes implementing many routines in terms of processing sequences very easy. For example, say we had a function that uses a regular expression to find and return phone numbers found within other strings:

(defn phone-numbers
  (re-seq #"(\d{3})[\.-]?(\d{3})[\.-]?(\d{4})" string))
;= #'user/phone-numbers
(phone-numbers " Sunil: 617.555.2937, Betty: 508.555.2218")
;= (["617.555.2937" "617" "555" "2937"] ["508.555.2218" "508" "555" "2218"])

Simple enough, and applying it to any seq of strings is easy, fast, and effective. These seqs could be loaded from disk using slurp and file-seq, or be coming in as messages from a message queue, or be the results obtained by retrieving large chunks of text from a database. To keep things simple, we can dummy up a seq of 100 strings, each about 1MB in size, suffixed with some phone numbers:

(def files (repeat 100
                   (apply str
                     (concat (repeat 1000000 \space)
                             "Sunil: 617.555.2937, Betty: 508.555.2218"))))

Let’s see how fast we can get all of the phone numbers from all of these “files”:

(time (dorun (map phone-numbers files)))  1
; "Elapsed time: 2460.848 msecs"

We’re using dorun here to fully realize the lazy seq produced by map and simultaneously release the results of that realization since we don’t want to have all of the found phone numbers printed to the REPL.

This is parallelizable though, and trivially so. There is a cousin of mappmap – that will parallelize the application of a function across a sequence of values, returning a lazy seq of results just like map:

(time (dorun (pmap phone-numbers files)))  1
; "Elapsed time: 1277.973 msecs"

Run on a dual-core machine, this roughly doubles the throughput compared to the use of map in the prior example; for this particular task and dataset, roughly a 4x improvement could be expected on a four-core machine, and so on. Not bad for a single-character change to a function name! While this might look magical, it’s not; pmap is simply using a number of futures—calibrated to suit the number of CPU cores available—to spread the computation involved in evaluating phone-numbers for each file across each of those cores.

This works for many operations, but you still must use pmap judiciously. There is a degree of overhead associated with parallelizing operations like this. If the operation being parallelized does not have a significant enough runtime, that overhead will dominate the real work being performed; this can make a naive application of pmap slower than the equivalent use of map:

(def files (repeat 100000
                   (apply str
                     (concat (repeat 1000 \space)
                             "Sunil: 617.555.2937, Betty: 508.555.2218"))))

(time (dorun (map phone-numbers files)))
; "Elapsed time: 2649.807 msecs"
(time (dorun (pmap phone-numbers files)))
; "Elapsed time: 2772.794 msecs"

The only change we’ve made here is to the data: each string is now around 1K in size instead of 1MB in size. Even though the total amount of work is the same (there are more “files”), the parallelization overhead outstrips the gains we get from putting each evaluation of phone-numbers onto a different future/core. Because of this overhead, it is very common to see speedups of something less than Nx (where N is the number of CPU cores available) when using pmap. The lesson is clear: use pmap when the operation you’re performing is parallelizable in the first place, and is significant enough for each value in the seq that its workload will eclipse the process coordination inherent in its parallelization. Trying to force pmap into service where it’s not warranted can be disastrous.

There is often a workaround for such scenarios, however. You can often efficiently parallelize a relatively trivial operation by chunking your dataset so that each unit of parallelized work is larger. In the above example, the unit of work is just 1K of text; however, we can take steps to ensure that the unit of work is larger, so that each value processed by pmap is a seq of 250 1K strings, thus boosting the work done per future dispatch and cutting down on the parallelization overhead:

(time (->> files
        (partition-all 250)
        (pmap (fn [chunk] (doall (map phone-numbers chunk))))  1
        (apply concat)
; "Elapsed time: 1465.138 msecs"

map will return a lazy seq, so we use doall to force the realization of that lazy seq within the scope of the function provided to pmap. Otherwise, phone-numbers would never be called at all in parallel, leaving the work of applying it to each string to whatever process might have consumed the lazy seq later.

By changing the chunk size of our workload, we’ve regained the benefits of parallelization even though our per-operation computation complexity dropped substantially when applied to many more smaller strings.

Two other parallelism constructs are built on top of pmap: pcalls and pvalues. The former evaluates any number of no-arg functions provided as arguments, returning a lazy sequence of their return values; the latter is a macro that does the same, but for any number of expressions.

[128] Which we discussed in Sequences.

The best content for your career. Discover unlimited learning on demand for around $1/day.