Chapter 4. Good Actor Design

Understanding how to build good actor systems begins at the small scale. The large-scale structures that we build are often the source of our greatest successes or failures in any software project, regardless of whether it uses Akka. But if we don’t first understand the smaller building blocks that go into those larger structures, we run the risk of small mistakes propagating throughout the system in a way that makes the code unmaintainable. This problem isn’t unique to Akka. If you try to write code using a functional language without first understanding the basics of functional programming, you will run into similar problems.

In this chapter, we talk about how to avoid such issues and apply good principles and practices to building your actors and your system.

Starting Small

As just mentioned, understanding how to build good actor systems must begin at the small scale. We can talk about how to build and connect large-scale actor constructs together, but if we lack a good design at the small scale, we can still easily end up with bottlenecks, blocking operations, and spaghetti code that can cause the application to fail.

The most common mistakes people make when working with Akka happen in the small scale rather than the large end of the spectrum. Even though these problems tend to be smaller in scope than architectural problems, when compounded together they can become significant. Whether it is accidentally exposing mutable state, or closing over that mutable state in a future, it is easy to make mistakes that can render your actor system incomprehensible. You need to remember that although actors address some of the issues you encounter when building concurrent systems, they work only when you use them correctly. If you use them incorrectly, they can create just as many concurrency bugs as any other system. And even though actors can help to address some problems when building concurrent systems, they don’t solve every problem. So just as with any concurrent system, you need to watch out for bottlenecks, race conditions, and other problems that might come up.

Think about our sample project management system. In that application, we have separated out bounded contexts like the scheduling service, the people management service, and the project management service. But when we deploy the system we discover something unexpected. The user management service, which was supposed to be a very simple data management service, is not living up to expectations. We are losing data unexpectedly. Digging deeper, we can see that the system was built using a mix of different concurrency paradigms. There are actors and futures mixed together without consideration for what that means. As a result, there are race conditions. We are closing over mutable state inside of multiple threads. We have a series of concurrency problems that could have been avoided.

In the meantime, looking deeper it’s evident that our scheduling service isn’t handling the load. We expected to have to scale it, due to the fairly complex logic it involves, but now it appears that it is running so poorly that it is jeopardizing the project. We can mitigate the problem by scaling, but even then it might prove to be too slow. Investigating further, it’s clear that the application is only capable of handling a small number of concurrent projects. It turns out we have introduced blocking operations that are consuming all the threads, and this in turn is crippling the application.

There are patterns and principles in Akka that you can use to help prevent problems like these. The key is knowing when and where to apply those patterns. When used incorrectly, the patterns can create just as many problems as they solve. But if you apply them correctly, they will give you a strong foundation upon which to build the rest of your system.

Encapsulating State in Actors

One of the key features of actors is their ability to manage state in a thread-safe manner. This is an important feature to understand, but you also need to be aware that there is more than one way to manage that state, and each option might have its uses, depending on the situation. It is also valuable to know those patterns so that if you look into someone else’s code you can recognize the pattern and understand the motivation behind it.

Encapsulating State by Using Fields

Perhaps the simplest option for managing state is including the state in mutable private fields inside the actor. This is relatively straightforward to implement. It will be familiar to you if you are used to a accustomed imperative style of programming. It is also one of the more common ways mutable state is represented when looking at examples people provide in books, courses, or online.

In our project scheduling domain, we have the concept of a person. A person consists of many different data fields that might be altered. This includes data like first name, last name, business roles, and so on. You can update these fields individually or as a group. A basic implementation of this might look like the following:

object Person {
  case class SetFirstName(name: String)
  case class SetLastName(name: String)
  case class AddRole(role: Role)
}

class Person extends Actor {
  import Person._
  private var firstName: Option[String] = None
  private var lastName: Option[String] = None
  private var roles: Set[Role] = Set.empty

  override def receive:Receive = {
    case SetFirstName(name) => firstName = Some(name)
    case SetLastName(name) => lastName = Some(name)
    case AddRole(role) => roles = roles + role
  }
}

The basic format will be familiar to those accustomed to object-oriented programming and writing getters and setters in languages like Java. If you’re more functional-minded, the presence of mutable state might make you cringe a little, but within the context of an actor, mutable state is safe because it is protected by the Actor Model. If you dig into the standard Scala libraries, you will find that mutable state is actually quite common in what are otherwise pure functions. The mutable state in that case is isolated to within the function and is therefore not shared among different threads. Here, we have isolated the mutable state to within an actor, and due to the single-threaded illusion, that state is protected from concurrent access.

This approach is a good one to use when you are trying to model something simple. If only a few mutable fields or values are involved and the logic that mutates those fields is minimal, it might be a good candidate for this type of approach.

The problem with this approach is that as the complexity increases, it begins to become unmanageable. In our example of an actor representing a person, what happens when it expands and you need to keep state for various pieces of additional information like the address, phone number, and so on? You are adding more fields to the actor, all of which are mutable. They are still safe, but over time as this grows, it could begin to look ugly. Consider as well what happens if the logic around those things is nontrivial—for example, if you begin having to include validation logic for when you set a person’s address. Your actor is becoming bloated. The more you add, the worse it will become, as illustrated here:

class Person extends Actor {
  private var firstName: Option[String] = None
  private var lastName: Option[String] = None
  private var address: Option[String] = None
  private var phoneNumber: Option[String] = None
  private var roles: Set[Role] = Set.empty

  override def receive:Receive = {
    case SetFirstName(name) =>
      firstName = Some(name)
    case SetLastName(name) =>
      lastName = Some(name)
    case SetAddress(address) =>
      validateAddress(address)
      address = Some(address)
    case SetPhoneNumber(phoneNumber) =>
      validatePhoneNumber(phoneNumber)
      phoneNumber = Some(phoneNumber)
    case AddRole(role) => roles = roles + role
  }
}

Another issue with this is that actors are inherently more difficult to test than other constructs in Scala because they are concurrent, and this concurrency introduces complexity into your tests. As the complexity of the code increases, so too does the complexity of the tests. Soon, you might find yourself in a situation in which your test code is becoming difficult to understand and tough to maintain, and that poses a problem.

One way to solve this problem is to break the logic out of the actor itself. You can create traits that can be stacked onto the actor. These traits can contain all the logic to mutate the values as appropriate. The actor then becomes little more than a concurrency mechanism. All of the logic has been extracted to helper traits that can be written and tested as pure functions. Within the actor, you then need worry only about the mutability and concurrency aspects. Take a look:

trait PhoneNumberValidation {
    def validatePhoneNumber(phoneNumber: Option[String]) = {
      ...
    }
}

trait AddressValidation {
    def validateAddress(address: Option[String]) = {
      ...
    }
}

class Person extends Actor with PhoneNumberValidation with AddressValidation {
  ...
}

This technique makes it possible for you to simplify the actor. It also reduces the number of things you need to test concurrently. However, it does nothing to reduce the number of fields that are being mutated or the tests that need to happen around the mutation of those fields. Still, it is a valuable tool to have in your toolbox to keep your actors from growing too large and unmanageable.

So how can you reduce some of that mutable state? How can you take an actor that has many mutable parts and reduce it to a smaller number? There are a few options. One is to reevaluate the structure. Maybe you didn’t actually want a single actor for all of the fields. In some cases, it might be beneficial to separate this into multiple actors, each managing a single field or group of fields. Whether you should do this depends largely on the domain. This is less of a technical question and more of a domain-driven design (DDD) question. You need to look at the domain and determine whether the fields in question are typically mutated as a group or as a single field. Are they logically part of a single entity in the domain or do they exist as separate entities? This will help to decide whether to separate additional actors.

But what if you determine that they are logically part of the same entity and therefore the same actor? Is it possible to reduce some of this mutable state within a single actor?

Encapsulating State by Using “State” Containers

In the previous example, you can extract some of the logic from Account into a separate, nonactor-based PersonalInformation class. You could extract this into a class or set of classes that are much more type safe, easier to test, and a good deal more functional. This leaves you with just a single var to store the entire state rather than having multiple ones, as shown here:

object Person {
  case class PersonalInformation(
    firstName: Option[FirstName] = None,
    lastName: Option[LastName] = None,
    address: Option[Address] = None,
    phoneNumber: Option[PhoneNumber] = None,
    roles: Set[Role] = Set.empty
  )
}

class Person extends Actor {
  private var personalInformation = PersonalInformation()

  override def receive: Receive = ...
}

This is an improvement. You now have a PersonalInformation class that is much easier to test. You can put any validation logic that you want into that state object and call methods on it as necessary. You can write the logic in an immutable fashion, and you can test the logic as pure functions. In fact, if later you decide that you don’t want to use an actor, you could eliminate the actor altogether without having to change much (or anything) about PersonalInformation. You could even choose not to declare PersonalInformation within the companion object for the actor. You could opt instead to move that logic to another location in the same package or move it to a different package or module.

This is a nice technique to use when the logic of your model becomes complicated. It provides the means for you to extract your domain logic completely so that it is encapsulated in the PersonalInformation object and your actor becomes pure infrastructure. This reduces everything to a single var that is still protected by the single-threaded illusion. But what if you want to eliminate that var completely? Can you do that?

Encapsulating State by Using become

Another approach to maintaining this state is by using the become feature to store that state. This particular technique also falls more in line with the idea discussed earlier that behavior and state changes are in fact one and the same thing. Using this technique, you can eliminate the var completely without sacrificing much in the way of performance. Let’s take a look at the code:

object Person {
  case class PersonalInformation(
    firstName: Option[FirstName] = None,
    lastName: Option[LastName] = None,
    address: Option[Address] = None,
    phoneNumber: Option[PhoneNumber] = None,
    roles: Set[Role] = Set.empty
  )
}

class Person extends Actor {

  override def receive: Receive = updated(PersonalInformation())

  private def updated(personalInformation: PersonalInformation):Receive = {
    case SetFirstName(firstName: FirstName) =>
      context.become(updated(personalInformation
        .copy(firstName = Some(firstName))))
    ...
  }
}

This completely eliminates any need for a var. Now, rather than altering the state by manipulating that var, you are instead altering it by changing the behavior. This changes the behavior so that the next message will have a different PersonalInformation than the previous one. This is nice because it has a slightly more “functional” feel to it (no more vars), but it also maps better to our understanding of the Actor Model. Remember, in the Actor Model, state and behavior are the same thing. Here, we can see how that is reflected in code.

Be wary though. Although this technique has its uses and can be quite valuable, particularly when building finite-state machines, the code is more complex and takes more work to understand. It isn’t as straightforward as manipulating a var. It might be more complex to follow what is happening in this code if you need to debug it, especially if there are other behavior transitions involved.

So, when should you use this approach rather than using a var? This technique is best saved for cases for which you have multiple behaviors, and more specifically, when those behaviors can contain different state objects. Let’s modify the example slightly. In the example, most of the values are set as Options. This is because when you create your Person, those values might not yet be initialized, but they might be initialized later. So, you need a way to capture that fact. But what if, on analyzing the domain, you realize that a person was always created with all of that information present? How could you take advantage of your actor to capture that fact? Here’s one way:

object Person {
   case class PersonalInformation(firstName: FirstName,
                                  lastName: LastName,
                                  address: Address,
                                  phoneNumber: PhoneNumber,
                                  roles: Set[Role] = Set.empty
                                 )

   case class Create(personalInformation: PersonalInformation)
 }

class Person extends Actor {

   override def receive: Receive = {
     case Create(personalInformation) =>
       context.become(created(personalInformation))
   }

   private def created(personalInformation: PersonalInformation): Receive = {
       ...
   }
}

In this example, there are two new states. There is an initial state, and a created state. In the initial state, the personal information has not been provided. The actor therefore accepts only one command (Create). After you transition to the created state, you can go back to handling the other messages, which allows you to set individual fields. This eliminates the need for optional values because your actor can exist in only one of two states: it is either unpopulated or fully populated. In this case, the state object is valid only in specific states; in this example, it’s the “created” state. You could use a var and set it as an option to capture this fact, but then you would need to constantly check for the value of that option to ensure that it is what you expect. By capturing the state by using become you ensure that you handle only message types that are valid in the current state. This also ensures that the state is always what you expect without requiring additional checks.

When there are other multiple behavior transitions for which each transition might have different state, it is often better to capture that state by using become because in this case it simplifies the logic rather than making it more complex. It reduces the cognitive overhead.

Using combinations of become, mutable fields, and functional state objects, you can provide rich actors that accurately capture the application’s domain. Immutable state objects make it possible for you to create rich domain constructs that can be fully tested independently of your actors. You can use become to ensure that your actors can exist only in a set of valid states. You don’t need to deal with creating default values for things that are of no concern. And when the actors are simple enough, you can code them by using more familiar constructs like mutable fields with immutable collections.

Mixing Futures with Actors

Futures are an effective way to introduce concurrency into a system. But when you combine them with Akka, you need to be careful. Futures provide one model of concurrency, whereas Akka provides a different model. Each model has certain design principles that it adheres to, and those principles don’t always work well together. You should always try to avoid mixing concurrency models within a single context, but sometimes you might find yourself in a situation in which it is required. For those times, you need to be careful to follow the design patterns that allow you to use futures with actors safely.

The heart of the problem is that you are combining two very different models of concurrency. Futures treat concurrency quite differently from actors. They don’t respect the single-threaded illusion that actors provide. It therefore becomes easy to break that illusion when using futures with actors. Let’s look at some very simple ways by which we can break the single-threaded illusion, beginning with this example:

trait AvailabilityCalendarRepository {
  def find(resourceId: Resource): Future[AvailabilityCalendar]
}

This is a very simple use of a future. In this case, we have chosen a future to account for the fact that the repository can access a database, and that might take time and can fail. This is an appropriate use of a future and by itself it doesn’t create any issues.

The problem in this case arises when you try to use the repository in the context of an actor. Suppose that in order to bring the usage of this repository into your actor system, you decide to create an Actor wrapper around it. Our initial naive implementation might look like the following:

object AvailabilityCalendarWrapper {
  case class Find(resourceId: Resource)
}

class AvailabilityCalendarWrapper(calendarRepository:
   AvailabilityCalendarRepository) extends Actor {
  import AvailabilityCalendarWrapper._

  override def receive: Receive = {
    case Find(resourceId) =>
      calendarRepository
        .find(resourceId)
        .foreach(result => sender() ! result)
        // WRONG! The sender may have changed!
  }
}

This seems fairly simple. The code receives a Find message, extracts the ResourceId, and makes a call to the repository. That result is then returned to the sender. The problem here is that it breaks the single-threaded illusion. The foreach in this code is operating on a future. This means that when you run this code, you are potentially working within a different thread. This could mean that the actor has moved on and is processing another message by the time that foreach runs. And that means the sender might have changed. It might not be the actor that you were expecting.

There are multiple ways by which you can fix this problem. You could create a temporary value to hold the correct sender reference, such as in the following:

case Find(resourceId) =>
  val replyTo = sender()
  calendarRepository
    .find(resourceId)
    .foreach(result => replyTo ! result)

This theoretically fixes the issue. ReplyTo will be set to the value of sender() at the point at which you are interested in it and it won’t change after that. But this isn’t ideal for a number of reasons. What if you weren’t just accessing the sender? What if there were other mutable states that you need to manipulate? How would you deal with that? But there is a yet more fundamental problem with the preceding code. Without looking at the signature of the repository, how can you verify that you are working with a future? That foreach could just as easily be operating on an Option or a collection. And although you can switch to using a for comprehension or change to using the onComplete callback, it still doesn’t really highlight the fact that this operation has become multithreaded.

A better solution is to use the Pipe pattern in Akka. With the Pipe pattern you can take a future and “pipe” the result of that future to another actor. The result is that the actor will receive the result of that future at some point or a Status.Failure if the future fails to complete. If you alter the previous code to use the Pipe pattern, it could look like the following:

import akka.pattern.pipe

object AvailabilityCalendarWrapper {
  case class Find(resourceId: Resource)
}

class AvailabilityCalendarWrapper(calendarRepository:
   AvailabilityCalendarRepository) extends Actor {
  import AvailabilityCalendarWrapper._

  override def receive: Receive = {
    case Find(resourceId) => calendarRepository.find(resourceId).pipeTo(sender())
  }
}

There are multiple advantages to using the Pipe pattern. First, this pattern allows you to maintain the single-threaded illusion. Because the result of the future is sent back to the actor as just another message, you know that when you receive it, you won’t need any concurrent access to the state of the actor. In this example, the sender is resolved when you call the pipeTo, but the message isn’t sent until the future completes. This means that the sender will remain correct even though the actor might move on to process other messages.

The other benefit of this pattern is that it is explicit. There is no question when you look at this code as to whether concurrency is happening. It’s evident just by looking at the pipeTo that this operation will complete in the future rather than immediately. That is the very definition of the Pipe pattern. You don’t need to click through to the signature of the repository to know that there is a future involved.

But let’s take it a step further. What if you didn’t want to immediately send the result to the sender? What if you want to first perform some other operations on it, such as modifying the data so that it adheres to a different format? How does this affect this example? Again, let’s look at the naive approach first:

import akka.pattern.pipe

object AvailabilityCalendarWrapper {
  case class Find(resourceId: Resource)
  case class ModifiedCalendar(...)
}

class AvailabilityCalendarWrapper(calendarRepository:
   AvailabilityCalendarRepository) extends Actor {
  import AvailabilityCalendarWrapper._

  private def modifyResults(calendar: AvailabilityCalendar):
     ModifiedCalendar = {
    // Perform various modifications on the calendar
    ModifiedCalendar(...)
  }

  override def receive: Receive = {
    case Find(resourceId) =>
      calendarRepository
        .find(resourceId)
        .map(result => modifyResults)
        .pipeTo(sender())
  }
}

As before, at first glance this looks OK. The code is still using the Pipe pattern so the sender is safe. But we have introduced a .map on the future. Within this .map we are calling a function. The problem now is that, again, .map is operating within a separate thread. This too creates an opportunity to break the single-threaded illusion. Initially the code might be entirely safe. Later, however, someone might decide to access the sender within the modifyResults, or you might decide to access some other mutable state in the calendar. Because it is inside of a function in the AvailabilityCalendarWrapper, you might assume that it is safe to access that mutable state. It is not obvious that this function is actually being called within a future. It is far too easy to accidentally make a change to this code without realizing that you were operating in a separate execution context. So how do you solve that? Here’s one way:

import akka.pattern.pipe

object AvailabilityCalendarWrapper {
  case class Find(resourceId: Resource)
  case class ModifiedCalendar(...)
}

class AvailabilityCalendarWrapper(calendarRepository:
   AvailabilityCalendarRepository) extends Actor {
  import AvailabilityCalendarWrapper._

  override def receive: Receive = {
    case Find(resourceId) =>
       calendarRepository.find(resourceId).pipeTo(self)(sender())
    case AvailabilityCalendar(...) =>
      // Perform various modifications on the calendar
      sender() ! ModifiedCalendar(...)
  }
}

This example eliminates the .map in this case and brings back the pipeTo immediately after the future resolves. This time, though, we pipe back to self and include the sender as a secondary argument in the pipeTo. This allows you to maintain the link to the original sender. Again, this is more explicit about what portions of the operation are happening in the future and what portions are happening in the present. There is no longer a possibility of breaking the single-threaded illusion.

Sometimes, you might find yourself in a situation in which the future returns something too generic, like an Integer or a String. In this case, simply using the Pipe pattern by itself can make the code confusing. Your actor is going to need to receive a simple type that is not very descriptive. It would be better if you could enrich the type in some way, perhaps wrapping it in a message. In this case, it is appropriate to use a .map on the future for the purpose of converting the message to the appropriate wrapper, as long as you keep that operation as simple and as isolated as possible. For example:

ourFuture.map(result => Wrapper(result)).pipeTo(self)

The only operation we are making within the .map is to construct the wrapper type. Nothing more. This type of .map on a future is considered to be low risk enough that you can use it within an actor. The truth is that even this code introduces some risk. Where is Wrapper defined? Does it access mutable state somewhere during its construction? It is still possible to make a mistake with this code, but as long as you are following best practices, it will be highly unlikely. For that reason this code is generally considered acceptable.

You need to keep in mind that within an actor there are many things that might be considered mutable state. The “sender” is certainly one form of mutable state. A mutable var or a mutable collection is also mutable state. The actor’s context object also contains mutable state. Context.become, for example, is making use of mutable state. For that matter an immutable val that is declared in the parameters of a receive method could be mutable state because the receive method can change. Even access to a database that supports locking still constitutes mutable state. Although it might be “safe,” it is in fact mutating some state of the system.

In general, you should favor pure functions within actors wherever possible. These pure functions will never read or write any of the state of the actor, instead relying on the parameters passed in, and returning any modified values from the function. These can then be safely used to modify the mutable state within the actor. Pure functions can also be extracted out of the actor itself, which can ensure that you don’t access any mutable state in the actor itself. This makes it possible for you to use those functions in a thread-safe manner within the context of an actor.

When you do find yourself using impure functions within an actor, you need to ensure that you do so within the context of the single-threaded illusion. To do so, you should always prefer the Pipe pattern when working with futures. Sometimes, it can be beneficial to create a wrapper like you did for the repository. This wrapper isolates the future so that you only worry about it in one place and the rest of the system can ignore it. But even better is when you can avoid the future completely and use other techniques in its place.

Ask Pattern and Alternatives

The Ask pattern is a common pattern in Akka. Its usage is very simple and it serves very specific use cases. Let’s quickly review how it works.

Suppose that you want to make use of the “wrapper” actor that was introduced earlier. You want to send the “Find” message, and then you need the result of that. To do this using the Ask pattern, you might have something like the following:

import akka.pattern.ask
import akka.actor.Timeout
import scala.concurrent.duration._

implicit val timeout = Timeout(5.seconds)
val resultFuture = (availabilityCalendar ?
   AvailabilityCalendarWrapper.find(resourceId)).mapTo[AvailabilityCalendar]

In this example, you are using the Ask pattern to obtain a Future[Any]. It then uses the mapTo function to convert that future into a Future[AvailabilityCalendar].

This is a very helpful pattern when you have an operation for which you need to guarantee success within a certain amount of time. In the event of a failure, you will receive a failed future. In the event that the operation takes too long or doesn’t complete, you will also receive a failed future. This can be very useful for situations in which a user is waiting on the other end. The Ask pattern therefore can be quite common when building web apps or REST APIs or other applications that have an agreed-upon time limit.

Problems with Ask

You need to be careful with this approach, though. On its own, there is nothing wrong with it, and in many cases it represents exactly what you want. The problems arise when you begin to overuse this pattern. The Ask pattern has a lot of familiarity. It bears similarity to the way functional programming works. You call a function with some set of parameters, and that function returns a value. That’s what we are doing here, but instead of a function, we are talking to an actor. But where the functional model emphasizes a request/response pattern, the Actor Model usually works better with a “Tell, Don’t Ask” approach.

Let’s explore an example that shows how overusing the Ask pattern can break down. For this example, we will leave our domain behind so that we can focus on just the problem.

Suppose that you have a series of actors. Each actor has the job of performing some small task and then passing the result to the next actor in a pipeline. At the end, you want some original actor to receive the results of the processing. Let’s also assume that the entire operation needs to happen within a 5-second window. Let’s take a look at the code:

import akka.pattern.{ask, pipe}

case class ProcessData(data: String)
case class DataProcessed(data: String)

class Stage1() extends Actor {

  val nextStage = context.actorOf(Props(new Stage2()))

  override def receive: Receive = {
    case ProcessData(data) =>
      val processedData = processData(data)
      implicit val timeout = Timeout(5.seconds)
      (nextStage ? ProcessData(processedData)).pipeTo(sender())
  }
}

class Stage2 extends Actor {

  val nextStage = context.actorOf(Props(new Stage3()))

  override def receive: Receive = {
    case ProcessData(data) =>
      val processedData = processData(data)
      implicit val timeout = Timeout(5.seconds)
      (nextStage ? ProcessData(processedData)).pipeTo(sender())
  }
}

class Stage3 extends Actor {

  override def receive: Receive = {
    case ProcessData(data) =>
      val processedData = processData(data)
      sender ! DataProcessed(processedData)
  }
}

This code will certainly work and get the job done. But there a few oddities present in the solution. The first issue is the timeouts. Each stage, except for the last, has a 5-second timeout. But here is the problem. The first phase uses 5 seconds. But presumably the second phase will consume some amount of that 5 seconds. Maybe it consumes 1 second. Therefore, when you go to send a message to the next phase, you don’t really want a 5-second timeout anymore; you want a 4-second timeout. Because if the third phase takes 4.5 seconds, the second phase would succeed but the entire process would still fail because the first timeout of 5 seconds would be exceeded. The problem here is that the first timeout, the original 5 seconds, is important to the application. It is critical that this operation succeeds within the 5-second window. But all the other timeouts after that are irrelevant. Whether they take 10 seconds or 2 seconds doesn’t matter, it’s only that single 5-second timeout that has value. Because we have overused the Ask pattern here, we are forced to introduce timeouts at each stage of the process. These arbitrary timeouts create confusion in the code. It would be better if we could modify the solution so that only the first timeout is necessary.

The key here is that every time you introduce a timeout into the system, you need to think about whether that timeout is necessary. Does it have meaning? If the timeout fails, is there some action that can be taken to correct the problem? Do you need to inform someone in the event of a failure? If you answer “yes” to these questions, the Ask pattern might be the right solution. But, if there is already a timeout at another level to handle this case, perhaps you should try to avoid the Ask pattern and instead allow the existing mechanism to handle the failure for you.

Accidental Complexity

There is another problem with the code in the previous example. There is a complexity involved in using the Ask pattern. Behind the scenes, it is creating a temporary actor that is responsible for waiting for the results. This temporary actor is cheap, but not free. We end up with something like that shown in Figure 4-1.

Overusing Ask
Figure 4-1. Overusing Ask

In the figure, you see the details happening inside the seemingly simple Ask.

We have created a more complex message flow than we really need. We have introduced at least two temporary actors that must be maintained by the system. Granted, that happens behind the scenes without our intervention, but it still consumes resources. And if you look at the diagram and try to reason about it, you can see that there is additional complexity here that you don’t really want. In reality, what you would prefer is to bypass the middle actors and send the response directly to the sender. What you really want is to Tell, not Ask.

Alternatives to Ask

There are multiple ways that you could fix this pipeline. One option, given the trivial nature of the example, would be to use the forward method. By forwarding the message rather than using the Tell operator, you ensure that the final stage in the pipeline has a reference to the original sender. This eliminates the need for the timeouts and it would eliminate the temporary actors acting as middlemen, as demonstrated here:

class Stage1() extends Actor {

  val nextStage = context.actorOf(Props(new Stage2()))

  override def receive: Receive = {
    case ProcessData(data) =>
      val processedData = processData(data)
      nextStage.forward(ProcessData(processedData))
  }
}

Another approach to eliminating the Ask would be to pass a replyTo actor along as part of the messages. You could include a reference to the actor that is expecting the response. Then, at any stage of the pipeline you would have access to that actor to send it the results or perhaps to back out of the pipeline early if necessary:

case class ProcessData(data: String, replyTo: ActorRef)
case class DataProcessed(data: String)

class Stage1() extends Actor {

  val nextStage = context.actorOf(Props(new Stage2()))

  override def receive: Receive = {
    case ProcessData(data, replyTo) =>
      val processedData = processData(data)
      nextStage ! ProcessData(processedData, replyTo)
  }
}

class Stage2 extends Actor {

  val nextStage = context.actorOf(Props(new Stage3()))

  override def receive: Receive = {
    case ProcessData(data, replyTo) =>
      val processedData = processData(data)
      nextStage ! ProcessData(processedData, replyTo)
  }
}

class Stage3 extends Actor {

  override def receive: Receive = {
    case ProcessData(data, replyTo) =>
      val processedData = processData(data)
      replyTo ! DataProcessed(processedData)
  }
}

A final approach would be to pass a Promise as part of the message. Here, you would send the Promise through the pipeline so that the final stage of the pipe could simply complete the Promise. The top level of the chain would then have access to the future that Promise is completing, and you could resolve the future and deal with it appropriately (using pipeTo). The following example shows how to do this:

case class ProcessData(data: String, response: Promise[String])

class Stage1() extends Actor {

  val nextStage = context.actorOf(Props(new Stage2()))

  override def receive: Receive = {
    case ProcessData(data, response) =>
      val processedData = processData(data)
      implicit val timeout = Timeout(5.seconds)
      nextStage ! ProcessData(processedData, response)
  }
}

class Stage2 extends Actor {

  val nextStage = context.actorOf(Props(new Stage3()))

  override def receive: Receive = {
    case ProcessData(data, response) =>
      val processedData = processData(data)
      implicit val timeout = Timeout(5.seconds)
      nextStage ! ProcessData(processedData, response)
  }
}

class Stage3 extends Actor {

  override def receive: Receive = {
    case ProcessData(data, response) =>
      val processedData = processData(data)
      response.complete(Success(processedData))
  }
}

Each of these approaches is valid, and depending on the situation, one might fit better than the other. The point is to realize that you should use Ask only in very specific circumstances. Ask is designed to be used for situations in which either you need to communicate with an actor from outside the system, or you need to use a request/response–style approach and a timeout is desirable. If you find you need to introduce a timeout into a situation that doesn’t warrant one, you are probably using the wrong approach.

Commands Versus Events

Messages to and from actors can be broken down into two broad categories: commands and events.

A command is a request for something to happen in the future, which might or might not happen; for instance, an actor might choose to reject or ignore a command if it violates some business rule.

An event is something that records an action that has already taken place. It’s in the past, and can’t be changed—other actors or elements of the system can react to it or not, but they cannot modify it.

It is often helpful to break down the actor’s protocol—the set of classes and objects that represents the messages this actor understands and emits—into commands and events so that it is clear what is inbound and what is outbound from this actor. Keep in mind, of course, that an actor could consume both commands and events, and emit both as well.

You can often think of an actor’s message protocol as its API. You are defining the inputs and outputs to the actor. The inputs are usually defined as commands, whereas the outputs are usually defined as events. Let’s look at a quick example:

object ProjectScheduler {
  case class ScheduleProject(project: Project)
  case class ProjectScheduled(project: Project)
}

Here, we have defined a very simple message protocol for the ProjectScheduler actor. ScheduleProject is a command. We are instructing the ProjectScheduler to perform an operation. That operation is not yet complete and therefore could fail. On the other hand, ProjectScheduled is an event. It represents something that happened in the past. It can’t fail because it has already happened. ScheduleProject is the input to the actor and ProjectScheduled is the output. If we weren’t using actors and we wanted to represent this functionally, it could look something like this:

class ProjectScheduler {
  def execute(command: ScheduleProject): ProjectScheduled = ...
}

Understanding the difference between commands and events in a system is important. It can help you to eliminate dependencies. As a small example, consider the following frequently asked question: why is the ProjectScheduled part of the ProjectScheduler’s message protocol rather than being part of the protocol for the actor that receives it? The simple answer is that it is an event rather than a command. The more complete answer, though, is that it avoids a bidirectional dependency. If we moved it to the message protocol of the actor that receives it, the sender would need to know about the ProjectScheduler’s message protocol, and the ProjectScheduler would need to know about the sender’s message protocol. By keeping the commands and the resulting events in the same message protocol, the ProjectScheduler doesn’t need to know anything about the sender. This helps to keep your application decoupled.

A common practice, as just illustrated, is to put the message protocol for the actor into its companion object. Sometimes, it is desirable to share a message protocol among multiple actors. In this case, a protocol object can be created to contain the messages instead, as shown here:

object ProjectSchedulerProtocol {
  case class ScheduleProject(project: Project)
  case class ProjectScheduled(project: Project)
}

Constructor Dependency Injection

When you create Props for your actor, you can include an explicit reference to another actor. This is simple to do, and is often a good approach.

Here is an example of direct injection of another ActorRef when an actor is created:

  val peopleActor: ActorRef = ...
  val projectsActor: ActorRef = system.actorOf(ProjectsActor.props(peopleActor))

However, if your actors have a number of references, or the order of instantiation means that the necessary reference isn’t available until after the receiving actor is already constructed, you must look for other means.

actorSelection via Path

One option is to refer to the destination actor via its path, as opposed to its reference. This requires that you know the path, of course, possibly accessing it via a static value in the destination actor’s companion object.

A path can be resolved only to an actorSelection, however, not an actual actor reference. If you want access to the actor reference, you must send the identify message, and have the reply give us the reference. Or, you simply send your messages to the actor selection directly.

Actor selections have their own problems, though. An actor selection is unverified until you send the Identify message. This means that there might or might not be an actor on the other end. This in turn means that when you send a message to a selection, there is no guarantee that there will be an actor on the other end to receive it. In addition, a selection can include wildcards. This means that there might actually be more than one actor on the other end, and when you send a message you are in fact broadcasting it to all actors. This might not be what you want.

Actor selections, when not used carefully, can also lead to increased dependencies. Often, if you are using an actor selection, it implies that you have not carefully thought about your actor hierarchy. You are falling back to an actor selection because you need to reference an actor that is part of a different hierarchy. This creates coupling between different areas of your actor system and makes them more difficult to break apart or distribute later. You can think of this as being similar to littering your code with singleton objects, which is usually considered a bad practice.

Actor reference as a message

A better way to go for more complex dependencies between actors is to send the actor reference as a message to the actor that needs it—for example, you effectively “introduce yourself” to the actor that needs to send a message to you by encapsulating an actor reference in a specific type (so that it can be differentiated by the receive method of the destination actor), and then holding the passed value in the destination actor for later use.

This technique is essentially what we introduced when we used a replyTo actor in the message protocol when we were discussing the Ask pattern. The replyTo is a way to introduce one actor to another so that the first actor knows where to send messages.

This method demands a little more thought, requiring you to design your hierarchy in such a way that the necessary actors can talk to one another. This additional care is good because it means that you are thinking about the overall structure of the application, rather than just worrying about the individual actors. But you need to be careful here, as well. If you find that you’re spending too much effort in passing reference to actors around through messages, it might still mean that you have a poorly defined hierarchy.

Conclusion

In summary, by keeping in mind a few basic principles and techniques, you can create actors that retain cohesion, avoid unnecessary blocking and complexity, and allow the data flow within your actor system to be as smooth and concurrent as possible.

Now that you have seen how individual actors can best be structured, we will raise our level of abstraction to consider the flow of data through the entire system, and how correct actor interactions support that flow.

Get Applied Akka Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.