Cover by Brian Carper, Christophe Grand, Chas Emerick

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo

Homoiconicity

Clojure code is composed of literal representations of its own data structures and atomic values; this characteristic is formally called homoiconicity, or more casually, code-as-data.[6] This is a significant simplification compared to most other languages, which also happens to enable metaprogramming facilities to a much greater degree than languages that are not homoiconic. To understand why, we’ll need to talk some about languages in general and how their code relates to their internal representations.

Recall that a REPL’s first stage is to read code provided to it by you. Every language has to provide a way to transform that textual representation of code into something that can be compiled and/or evaluated. Most languages do this by parsing that text into an abstract syntax tree (AST). This sounds more complicated than it is: an AST is simply a data structure that represents formally what is manifested concretely in text. For example, Figure 1-1 shows some examples of textual language and possible transformations to their corresponding syntax trees.[7]

Sample transformations from textual language to formal models

Figure 1-1. Sample transformations from textual language to formal models

These transformations from a textual manifestation of language to an AST are at the heart of how languages are defined, how expressive they are, and how well-suited they are to the purpose of relating to the world within which they are designed to be used. Much of the appeal of domain-specific languages springs from exactly this point: if you have a language that is purpose-built for a given field of use, those that have expertise in that field will find it far easier to define and express what they wish in that language compared to a general-purpose language.

The downside of this approach is that most languages do not provide any way to control their ASTs; the correspondence between their textual syntax and their ASTs is defined solely by the language implementers. This prompts clever programmers to conjure up clever workarounds in order to maximize the expressivity and utility of the textual syntax that they have to work with:

  • Code generation

  • Textual macros and preprocessors (used to legendary effect by C and C++ programmers for decades now)

  • Compiler plug-ins (as in Scala, Project Lombok for Java, Groovy’s AST transformations, and Template Haskell)

That’s a lot of incidental complexity—complexity introduced solely because language designers often view textual syntax as primary, leaving formal models of it to be implementation-specific (when they’re exposed at all).

Clojure (like all Lisps) takes a different path: rather than defining a syntax that will be transformed into an AST, Clojure programs are written using Clojure data structures that represent that AST directly. Consider the requiresRole... example from Figure 1-1, and see how a Clojure transliteration of the example is an AST for it (recalling the call semantics of function position in Clojure lists).

image with no caption

The fact that Clojure programs are represented as data means that Clojure programs can be used to write and transform other Clojure programs, trivially so. This is the basis for macros—Clojure’s metaprogramming facility—a far different beast than the gloriously painful hack that are C-style macros and other textual preprocessors, and the ultimate escape hatch when expressivity or domain-specific notation is paramount. We explore Clojure macros in Chapter 5.

In practical terms, the direct correspondence between code and data means that the Clojure code you write in the REPL or in a text source file isn’t text at all: you are programming using Clojure data structure literals. Recall the simple averaging function from Example 1-2:

(defn average
  [numbers]
  (/ (apply + numbers) (count numbers)))

This isn’t just a bunch of text that is somehow transformed into a function definition through the operation of a black box; this is a list data structure that contains four values: the symbol defn, the symbol average, a vector data structure containing the symbol numbers, and another list that comprises the function’s body. Evaluating that list data structure is what defines the function.



[6] Clojure is by no means the only homoiconic language, nor is homoiconicity a new concept. Other homoiconic languages include all other Lisps, all sorts of machine language (and therefore arguably Assembly language as well), Postscript, XSLT and XQuery, Prolog, R, Factor, Io, and more.

[7] The natural language parse tree was mostly lifted from http://en.wikipedia.org/wiki/Parse_tree.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required