Preface

For most of its history, Facebook has held internal hackathons every few months. For hackathons, engineers are encouraged to come up with ideas that aren’t related to their day jobs—they form teams and try to make something cool in the span of a day or two.

One hackathon in November 2007 resulted in an interesting experiment: a tool that could convert PHP programs into equivalent C++ programs and then compile them with a C++ compiler. The idea was that the C++ program would run a lot faster than the PHP original, as it could take advantage of all the optimization work that has gone into C++ compilers over the years.

This possibility was of great interest to Facebook. It was gaining a lot of new users, and supporting more users requires more CPU cycles. As you run out of available CPU cycles, unless you buy more CPUs, which gets very expensive, you have to find a way to consume fewer CPU cycles per user. Facebook’s entire web frontend was written in PHP, and any way to get that PHP code to consume fewer CPU cycles was welcome.

Over the next seven years, the project grew far beyond its hackathon origins. As a PHP-to-C++ transformer called HPHPc, in 2009 it became the sole execution engine powering Facebook’s web servers. In early 2010, it was open sourced under the name HipHop for PHP. And then, starting in 2010, an entirely new approach to execution—just-in-time compilation to machine code, with no C++ involved—grew out of HPHPc’s codebase, and eventually superseded it. This just-in-time compiler, called the HipHop Virtual Machine, or HHVM for short, took over Facebook’s entire web server fleet in early 2013. The original PHP-to-C++ transformer is gone; it is not deployed anywhere and its code has been deleted.

The origins of Hack are entirely separate. Its roots are in a project that attempted to use static analysis on PHP to automatically detect potential security bugs. Fairly soon, it turned out that the nature of PHP makes it fundamentally difficult to get static analysis that’s deep enough to be useful. Thus the idea of “strict mode” was born: a modification of PHP, with some features, such as references, removed and a sophisticated type system added. Authors of PHP code could opt into strict mode, gaining stronger checking of their code while retaining full interoperability.

Hack’s direction since then belies its origin as a type system on top of PHP. It has gained new features with significant effects on the way Hack code is structured, like asynchronous (async) functions. It has added new features specifically meant to make the type system more powerful, like collections. Philosophically, it’s a different language from PHP, carving out a new position in the space of programming languages.

This is how we got where we are today: Hack, a modern, dynamic programming language with robust static typechecking, executing on HHVM, a just-in-time compilation runtime engine with full PHP compatibility and interoperability.

What Are Hack and HHVM?

Hack and HHVM are closely related, and there has occasionally been some confusion as to what exactly the terms refer to.

Hack is a programming language. It’s based on PHP, shares much of PHP’s syntax, and is designed to be fully interoperable with PHP. However, it would be severely limiting to think of Hack as nothing more than some decoration on top of PHP. Hack’s main feature is robust static typechecking, which is enough of a difference from PHP to qualify Hack as a language in its own right. Hack is useful for developers working on an existing PHP codebase, and has many affordances for that situation, but it’s also an excellent choice for ground-up development of a new project.

Beyond static typechecking, Hack has several other features that PHP doesn’t have, and most of this book is about those features: async functions, XHP, and many more. It also intentionally lacks a handful of PHP’s features, to smooth some rough edges.

HHVM is an execution engine. It supports both PHP and Hack, and it lets the two languages interoperate: code written in PHP can call into Hack code, and vice versa. When executing PHP, it’s intended to be usable as a drop-in replacement for the standard PHP interpreter from PHP.net. This book has a few chapters that cover HHVM: how to configure and deploy it, and how to use it to debug and profile your code.

Finally, separate from HHVM, there is the Hack typechecker: a program that can analyze Hack code (but not PHP code) for type errors, without running it. The typechecker doesn’t really have a name, other than the command you use to run it, hh_client. I’ll refer to it as “the Hack typechecker” or just “the typechecker.”

As of now, HHVM is the only execution engine that runs Hack, which is why the two may sometimes be conflated.

Who This Book Is For

This book is for readers who are comfortable with programming. It spends no time explaining concepts common to many programming languages, like control flow, data types, functions, and object-oriented programming.

Hack is a descendant of PHP. This book doesn’t specifically explain common PHP syntax, except in areas where Hack differs, so basic knowledge of PHP is helpful. If you’ve never used PHP, you’ll still be able to understand much of the code in this book if you have experience with other programming languages. The syntax is generally very straightforward to understand.

For those with PHP experience, there’s nothing here that you won’t understand if you’ve never worked on a complex, high-traffic PHP website. Hack is useful for codebases of all sizes—from simple standalone scripts to multimillion-line web apps like Facebook.

There is some material that assumes familiarity with typical web app tasks like querying relational databases and memcached (in Chapter 6) and generating HTML (in Chapter 7). You can skip these parts if they’re not relevant to you, but they require no knowledge that you wouldn’t get from experience with even a small, basic web app.

I hope to make this book not just an explanation of how things are, but also of how they came to be that way. Programming language design is a hard problem; it’s essentially the art of navigating hundreds of trade-offs at once. It’s also subject to a surprising range of pragmatic concerns like backward compatibility, and Hack is no exception. If you’re at all interested in a case study of how one programming language made its way through an unusual set of constraints, this book should provide what you’re looking for.

Philosophy

There are a few principles that underlie the design of both Hack and HHVM, which can help you understand how things came to be the way they are.

Program Types

There is a single observation about programs that informs both HHVM’s approach to optimizing and executing code, and Hack’s approach to verifying it. That is: behind most programs in dynamically typed languages, a statically typed program is hiding.

Consider this code, which works as both PHP and Hack:

for ($i = 0; $i < 10; $i++) {
  echo $i + 100;
}

Although it’s not explicitly stated anywhere, it’s obvious to any human reader that $i is always an integer. The computer science term for this is that $i is monomorphic: it only ever has one type. A typechecker could make use of this property to verify that the expression $i + 100 makes sense. An execution engine could make use of this property to compile $i + 100 into efficient machine code to do the addition.

A loop variable may seem like a trivial example, but it turns out that in real-world PHP codebases, most values are monomorphic. This makes intuitive sense, because you can’t do much with a value—do arithmetic on it, index into it, call methods on it, etc.—without knowing what its type is. Most code, even in dynamically typed languages, does not check the type of each value before doing anything with it, which means that there must be hidden assumptions about the types of values. If the code mostly runs without runtime type errors, then those hidden assumptions must be true most of the time.

HHVM’s approach is to assume that this observation usually holds, and to compile PHP and Hack to machine code accordingly. Because it compiles programs while they are running, it knows the types flowing through each piece of code it’s about to compile. It outputs machine code that assumes those types: in the previous code example when compiling the expression $i + 100, HHVM would see that $i is an integer and use a single hardware addition instruction to do the addition.

The purpose of Hack, meanwhile, is to bring the hidden statically typed program into the light. It makes some types explicit with annotations, and verifies the rest with type inference. The idea is that Hack doesn’t significantly constrain existing PHP programs; rather, it makes the behavior that the programs already had explicit, and exposes it to robust static analysis.

This point is worth repeating: Hack’s static typing is not supposed to require a different style of programming. The language is designed to give you a better way to express the programs you were already writing.

Gradual Migration

Hack originated in the shadow of a multimillion-line PHP codebase. There’s no way to convert a codebase of that size from one language to another in one fell swoop, no matter how similar the languages are, so Hack has evolved with very gradual migration paths from PHP. Hack code can use functions and classes written in PHP, and vice versa. For every feature of Hack, there is a seamless way for code that uses it to interact with code that doesn’t use it.

In addition, the standard Hack/HHVM distribution comes with tools to do automated migration of PHP to Hack. It also includes a tool that transpiles Hack into PHP, for use by library authors who want to migrate to Hack while preserving a way for non-HHVM users to use their code. These tools are described in detail in Chapter 10.

HHVM, for its part, is intended to run PHP code identically to the standard PHP interpreter. The first step in migrating a PHP codebase to Hack is to switch to running that PHP code on HHVM. The only significant code changes that should be required in this step are around extensions: not all PHP and Zend extensions are compatible with HHVM. There should be no changes required because of differing behavior in the core language.

Make no mistake, though: despite its origins, Hack is an excellent choice if you’re starting a new project from scratch. In fact, you’ll get the most benefit out of Hack that way: the language is at its best when a codebase is 100% Hack.

How the Book Is Organized

The central feature of Hack is static typechecking. It cuts broadly across all of Hack’s other features, and is the most significant difference between Hack and PHP. The book starts by exploring that topic in detail in Chapter 1. Almost everything else in the book depends on an understanding of the content in that chapter, so if you haven’t seen Hack before, I very strongly recommend reading it thoroughly. That content is supplemented by Chapter 2, which discusses a particularly interesting part of Hack’s type system.

The rest of Hack’s features are mostly orthogonal to each other. Chapter 3 explains several of Hack’s smaller features. Chapter 4 shows the few PHP features that are gone from Hack, and explains why. Chapter 5 explains how and why to use Hack’s collection classes. Chapter 6 explains Hack’s support for multitasking, and Chapter 7 explains Hack’s syntax and library for generating HTML sanely and securely.

Chapter 8 covers the process of setting up, configuring, deploying, and monitoring HHVM. Chapter 9 covers the HHVM interactive debugger, hphpd. And finally, Chapter 10 explores some of the tools for working with Hack code, including a PHP-to-Hack migration tool and an interactive debugger.

Versions

This book is about Hack and HHVM version 3.9, which was released on August 18, 2015. (HHVM and the Hack typechecker live in the same codebase, and are released as a single package.) By the time you read this, there will already be newer versions available. However, 3.9 is a long-term support release; it will be updated with security and bug fixes for 48 weeks after its release.

HHVM 3.9 implements PHP 5.6 semantics. It supports all of the features new in PHP 5.6—constant scalar expressions, variadic functions, the exponentiation operator, etc. These features are present in Hack 3.9 as well. In general, as new versions of PHP come out, HHVM adds support for the new features and semantics, for Hack code as well as PHP code.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online offers a range of plans and pricing for enterprise, government, education, and individuals.

Members have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more information about Safari Books Online, please visit us online.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/hack-and-hhvm.

To comment or ask technical questions about this book, send email to .

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Content Updates

January 15, 2016

  • Updated the book’s reference version of HHVM and Hack to 3.9, released on August 18, 2015.

  • Throughout the book, fixed errors in invocations of array_map() in code examples (the argument order was wrong).

  • In Chapter 1, added documentation of the noreturn return type and the classname type. Updated section on meth_caller() to reflect lifting of restrictions. Rewrote some parts of the section on enforcement of type annotations at runtime to clarify some parts and reflect language changes.

  • In Chapter 3, added documentation of type constants. Updated section on array shapes to reflect language changes. Updated section on silencing typechecker errors to reflect the addition of the HH_IGNORE_ERROR comment.

  • In Chapter 4, noted that Hack does not allow returning from a finally block (which was true before version 3.9), and that PHP 7 has deprecated old-style constructors.

  • In Chapter 5, fixed some minor errors in the collections API reference.

  • In Chapter 6, updated section on async MySQL API to reflect addition of %L placeholders.

Acknowledgments

First and foremost, this book obviously wouldn’t exist without the efforts, spanning many years, of everyone who has worked on HipHop, HHVM, and Hack. This includes both current and former Facebook employees, as well as members of the open source community. There are far too many to name them all here, but all of their contributions helped make Hack and HHVM what they are today.

Not only do these projects represent the product of a huge amount of effort, but they are also the rewards for significant risks. None of these projects were “sure things” when they were started, and all of them have spent a fair bit of time fighting for their own continued existence. The story I know best, from experience, is HHVM’s. For the better part of two years, the HHVM team strove to get HHVM’s performance up to parity with HipHop, knowing that if they didn’t succeed, they would forfeit all of that work. The engineers and managers who drove the projects forward, despite such risks, deserve special recognition; it’s never easy to stake years of one’s own and others’ careers on speculative things like this. Particular thanks are due to the creators: Haiping Zhao, of HipHop; Keith Adams, Jason Evans, and Drew Paroski, of HHVM; and Julien Verlaguet, of Hack.

Now, about this book. I’m grateful to have gotten the chance to write it; I suspect that not a lot of software companies or teams would be thrilled at the idea of letting one of their engineers spend seven months writing prose instead of software. A few individuals deserve credit for helping get this thing off the ground and shepherding it along. In alphabetical order, they are: Alma Chao, Todd Gascon, Joel Marcey, James Pearce, Joel Pobar, and Paul Tarjan.

Big thanks are also due to the Hack and HHVM team members who reviewed this book’s early drafts. In alphabetical order, they are: Fred Emmott, Bill Fumerola, Eugene Letuchy, Alex Malyshev, Joel Marcey, Jez Ng, Jan Oravec, Dwayne Reeves, Julien Verlaguet, and Josh Watzman. This book was immensely improved by their feedback. Any mistakes are mine, not theirs.

Get Hack and HHVM now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.