Programming for PaaS

Chapter 4. Moving Legacy Apps to PaaS

Everyone has skeletons in their closets. Skeletons come in the form of technical debt, the applications written in Cobol, the spaghetti code that we always meant to clean up, that undocumented application that a coworker wrote before leaving, and all those applications that have been around for 10 years and nobody knows how they work anymore. These are commonly called legacy applications.

Developing apps in the cloud requires a new worldview and often adds a few different programming paradigms. While legacy applications can be adapted to work within PaaS, there is a common set of challenges to make them run well.

Fortunately, the changes needed to run your legacy apps in PaaS are also known as best practices today. These are the same changes you would need to make to your app if you just wanted to make it more reliable and robust, whether or not you ran it on a PaaS.

Initial Considerations

When you’re developing applications for the cloud, there are certain assumptions you need to be aware of. This is true whether you’re developing on IaaS or PaaS.

If you’re deploying legacy apps to IaaS, the main benefit is the ability to provision virtual machines faster. The downside is that the virtual machines are usually ephemeral: they can go down; they’re not typically reliable or robust. This forces you into thinking about building an application in a way that can be redundant, with your app running across many different servers.

This way of thinking is different from that in shared or dedicated hosting environments, where you assume that servers do not typically go down. In non-virtualized environments, one can often run an application with all of its services on a single server. Larger applications may put the database services on a separate machine. Frequently, there is no need to deal with replication issues, data consistency issues, or session management issues. When users upload code to these systems, the implicit assumption is that it’s only on one server. This doesn’t hold true when you have to build systems across many different servers.

Sidestepping Potential Problems

When working with IaaS, one has to assume that any server could go down at any minute, as they often do. PaaS offerings, which are typically built on IaaS, have usually thought out some of these potential problems for you. Building out N-tier application logic can be hard, and that is why PaaS can be so valuable; N-tier is built into PaaS from the start.

With PaaS, you don’t have to figure out how to configure Apache on 20 different servers; you don’t have to figure out how to load balance across those servers; you don’t have to worry about MySQL slaves and master-master replication. Nor do you need to be concerned about redundancy, heartbeats, and failovers, as you would with IaaS.

However, with PaaS, you still have to write your code with the assumption that any individual instance of your app may go down at any moment. That forces you to think about building applications in a way that isn’t consistent with some of the traditional methods of web development, especially for legacy applications. You will also need to take into consideration frameworks and content management systems (WordPress, Drupal, Joomla!) that haven’t yet gotten up to speed with this new way of thinking and haven’t yet incorporated ways to manage multi-instance applications.

We will cover specific examples of how to address these problems later in this chapter.

You have to make those changes yourself, and you have to start thinking not only about how your application will act if a few or even many of your instances go down, but also about how your application will act if the filesystem is not consistent across different instances.

Common Questions to Ask Yourself

Here are some common questions you will need to think about for legacy applications as related to the hard drive:

If you can’t rely on the hard drive being persistent or consistent, what does that mean for your application?
If you’re uploading content, where does it go? Where does it stay?
What other assumptions does your application make about the hard drive?
Where are your sessions kept? Many times, session state is held on the hard drive, and if you’re running tens, dozens, or hundreds of instances, what does that mean from a user perspective?
Are users going to stay logged in every time they hit the web page or is it going to log them out between visits?

Even More Legacy Code Issues

Another consideration when moving to PaaS is long-running processes. In traditional development methodology, there has rarely been a problem with long-running processes operating sequentially. Your application typically will just take longer to load, using a large amount of processing power up front, but this can be hidden behind the scenes with Ajax or some other frontend gimmicks. However, when you utilize PaaS, you have to think about long-running processes differently. In PaaS, long-running processes should be processed independently and are asynchronous from your main application logic. In fact, some PaaS providers enforce this by killing your main application if it runs for too long. Not all legacy applications were created to work that way.

These are just a few of the ways in which traditional development paradigms have evolved and how cloud paradigms are much different than the original ones. Now let’s put those ideas into specific contexts.

Overview

What follows in this chapter is a set of the most common things you need to think about to transition legacy applications into PaaS. They are broken out into the following sections:

Asset Hosting: How do you deal with uploaded content (images/videos/music)?
Session Management: How do you deal with session data?
Caching: How do you incorporate modern caching techniques?
Asynchronous Processing: How do you handle long-running processes?
SQL: What considerations are there for SQL in PaaS?
NoSQL: How can you take advantage of modern trends of NoSQL?
Miscellaneous Gotchas

Asset Hosting

When users upload their content, whether it’s a profile image, a movie, an attachment, or any kind of file, it is considered an asset. Traditionally, it hasn’t been a problem to host those assets locally to the application (on the hard drive), since if the server crashes, you can restart it and your files will remain persisted. In modern cloud hosting, this assumption is no longer true. If a server crashes, depending on your IaaS or PaaS provider, the filesystem is often ephemeral and will not be restored on restart.

With many application platforms hosted in the cloud, IaaS or PaaS, when you restart your application or systems, you lose your uploaded content.

The upside to this is that new instances of your app are very easy to spin up; this is an essential piece of building on PaaS. However, it means that dealing with uploaded content in a way that’s better and more robust becomes not only a good idea but also a necessary idea for developing applications. It’s one of the first things you have to deal with when you’re moving your legacy application into PaaS.

In terms of asset hosting, when you’re dealing with a legacy application or a content management system like WordPress or Drupal, you’re dealing with systems that tend to store items on the disk and in a specific file format. So, the challenges depend on whether you are taking an existing application built from scratch and turning that into a modern cloud application for PaaS, or trying to host WordPress, Drupal, or other CMS sites in the cloud. If you’re doing the former, the general process is to use a blob storage system, also known as object storage.

All About Blob

There are two essential kinds of storage for assets: blob (binary large object) storage, also known as object storage , and file storage. File storage is the familiar kind of storage system used in traditional development. Blob storage works differently. Blobs are more like a key/value store for files, and are accessed through APIs like Amazon S3, Windows Azure Blob Storage, Google Cloud Storage, Rackspace Cloud Files, or OpenStack Swift.

When a user uploads assets to your application, a temporary file is usually created and moved to a more permanent location in the filesystem. When using blob storage, instead of moving files to a folder, you upload the asset to its final location (usually through a REST API) and are given a unique URL to reference the asset from. Once it’s uploaded into the object storage mechanism, the API will give back the URL for that asset, at which point you can store the URL. Instead of storing the object on the disk, you now have a URL that you can reference; it’s been uploaded into a persistent storage mechanism.

One of the benefits of using an object storage system is that all those files that were uploaded are automatically replicated on many different servers (up to seven or eight different copies will exist in many different parts of the world). It’s very difficult to lose data and it’s much more persistent; the content is much less likely to suffer damage based on any disk failure. You don’t have to worry about backups as acutely as you would if your data were uploaded to your local disk.

There are even more added benefits depending on which object storage you use. Content delivery networks (CDNs) can speed up the delivery of your assets so that not only are they hosted in many different locations, but also, depending on where in the world you are, they can be served to you from the one that’s closest to you. This can make the whole experience of consuming your assets and downloading the web pages feel a lot faster.

Because of the redundancy and CDN considerations, blob storage is a good idea in general, but it also has the added benefit of providing more speed, reliability, and robustness to your website. And it’s not too difficult to implement. As you’ll see in these code examples, the amount of effort that’s required up front to deal with asset hosting is more than worth the investment in terms of what you get out of it.

PHP with Amazon S3

Amazon has libraries for using S3 in Ruby, Java, Python, .NET, and mobile phones. Here is an example of how easy S3 is to integrate with a PHP application. This code will not work out of the box because it only contains the relevant snippets for understanding the flow of code. To get code that is fully ready to go, you will need to go to Amazon’s Sample Code & Libraries page, which has more detailed instructions for how to use the code. However, for the purposes of illustration, once the library is incorporated into your application it is not difficult to use:

<?php
// S3.php is available from Amazon's AWS website
if (!class_exists('S3')) require_once('S3.php');

// Amazon gives you credentials if registered for S3
// Best practice is to make these ENV vars in a PaaS
$s3 = new S3(
    getenv("AWS_ACCESS_KEY"),
    getenv("AWS_SECRET_KEY")
);

// A "bucket" is analogous to the name of a folder
// It is a way to collect similar things together
$bucket = "MyPaaSAppBucket";

// Create the bucket
$s3->putBucket($bucket, S3::ACL_PUBLIC_READ);

// Assuming the file is POST'ed as a form element
// called "file", <input type="file" name="file" />

// Name the uploaded file. Bad idea to pass the name
// like this without any validation.
$file_name = $_FILES['file']['name'];

// Upload the file
$s3->putObjectFile(
    $_FILES['file']['tmp_name'],
    $bucket,
    $file_name,
    S3::ACL_PUBLIC_READ
);

$url = 'http://'.$bucket.'.s3.amazonaws.com/'.$file_name;

Node.js with Azure Blob Service

Like Amazon, Microsoft has libraries for using its Blob service in Ruby, Java, .NET, Python, PHP, and mobile phones. Here is an example of how easy Azure Blob is to integrate with a Node.js application. Again, this code will not work out of the box; you will need to go to Microsoft Azure’s website for the code you need and more detailed instructions for how to use it. However, for illustration purposes, once the library is incorporated into your application it is not difficult to use:

// Azure in Node is available from npm install azure
var azure = require('azure');

// Azure gives you credentials if registered for blob
// Best practice is to make these ENV vars in a PaaS
// They will be called AZURE_STORAGE_ACCOUNT and
// AZURE_STORAGE_ACCESS_KEY

// Create a service and container to gather assets
var containerName = "myPaaSContainer";
var blobService = azure.createBlobService();
var container = blobService.createContainerIfNotExists(
    containerName,
    function(error){
        if(!error){
            // Container exists and is private
        }
    }
);

// The name of the uploaded object
var blobName = "myimage";

// The file name, possibly the uploaded temp file
var fileName = "/path/to/myimage.jpeg";

var blob = blobService.createBlockBlobFromFile(
    containerName,
    blobName,
    fileName,
        function(error){
          if(!error){
              // File has been uploaded
          }
    }
);

var url =
 "http://"  + AZURE_STORAGE_ACCOUNT + ".blob.core.windows.net/" + 
      containerName + "/" + blobName;

Generalized Asset Hosting Functions in Ruby for Rackspace Cloud Files

When integrating any blob functionality into your application, you will typically be doing various functions repeatedly, like uploading files and returning their corresponding URLs for storage in your database. To achieve maximum portability, it generally makes sense to add a layer of abstraction around this often-used code. That way, if you want to use a different object storage provider in the future, you can change your code mainly in one place rather than many.

You may organize these functions in a class if you are using an object-oriented language, or you may simply have some basic functions accessed globally.

The simple example class that follows is written in object-oriented Ruby. It contains some basic logic for working with Rackspace Cloud Files, but it could easily be ported to S3, Azure Blob, or any other object storage without affecting code that depends on it.

There are some libraries, such as Fog, that have this logic already encapsulated in them:

// Usage: o = ObjectStorage.new(container_name)
//        o.upload(file, blob_name)
//        return o.url
class ObjectStorage
    def initialize(name)
        @@connection ||= CloudFiles::Connection.new(
            :username => ENV["USERNAME"],
            :api_key => ENV["API_KEY"]
        )
        @container = @@connection.container(name)
    end

    def upload(file, blob_name)
        @object = @container.create_object(blob_name, false)
        @object.write(file)
    end

    def url
        @object.public_url
    end
end

Uploading with Plug-ins

When you’re dealing with a content management system (CMS), the process can be different. Systems like WordPress, Drupal, and Joomla! have plug-ins. Instead of rewriting code, you can install a plug-in, which may directly tie into blob services like S3. The plug-in upload mechanism stores the files directly into object storage and gives you back a URL. This improves the speed of the load time for your blog or CMS. Even better, it gives you a more scalable blog to which you can add more instances. When you load balance it to an instance that doesn’t have the uploaded content, you’ll never see the nefarious Error 404.

The following is a list of selected starting points for some of the most popular CMSs today. These URLs will have the latest code and documentation that will allow you to integrate with your applications quickly and easily:

WordPress plug-ins

Drupal modules

Joomla! plug-ins

Session Management

Sessions are incredibly important to think about when scaling applications in PaaS. Unless configured otherwise, sessions are by default almost always stored in a temporary file on the hard drive. This is fine if your application only runs on one server, but with PaaS you can easily start many instances of your app, which means that it can be running on many different servers. To users of your app, this can end up making it look like they’ve suddenly been logged out randomly and for no reason.

A session is created, sometimes automatically (e.g., in PHP and Rails), by creating a unique random token. For the sake of explanation, let’s say the token is XYZTOKEN and is stored in a cookie called MYAPP_SESSION_ID. Your PHP application will automatically know to look for a cookie named MYAPP_SESSION_ID, and if it finds that cookie, it takes the value and looks for the file corresponding to that token value. Again for simplification, let’s say there is a directory in /tmp called phpsessions and the token is a simple one-to-one relationship. Any arbitrary data for that user will then be saved in a hash and persisted to the file /tmp/phpsessions/XYZTOKEN. This is an insecure simplification of what really happens, but it is a good illustration of the overall process.

If you are running your application on many different servers, you cannot depend on the filesystem to be the same on all of them. Therefore, you need to store session data in a different way.

With PaaS, there are three typical places you can store sessions. There are pros and cons to each type of session management:

Encrypted cookies for sessions

Examples: default mechanism in Rails, plug-ins available for many other frameworks
Pros: very fast and no need to run any external services
Cons: limited amount of data can be stored, not available for every web framework

NoSQL storage for sessions

Examples: memcached, MongoDB
Pros: fast, you can store as much data as you want, supported by most frameworks
Cons: dependency on external services that you may not use for anything but sessions

SQL storage for sessions

Examples: MySQL, PostgreSQL
Pros: you can utilize the same SQL database you are using within your applications already
Cons: slower than the other two alternatives

There are many ways to implement these session management tools. They are highly dependent on the technology and framework you choose to develop with.

PHP

In PHP, you can overwrite the session handler to use any technology you want through the session_set_save_handler() function. If you are using the Zend Framework, there is a simple way to connect sessions to a database like MySQL using Zend_Session_SaveHandler_DbTable. Other PHP frameworks have similar functionality built in, or you can write code yourself that will accomplish it pretty easily. Here is an annotated example from the PHP documentation for how to write sessions to files:

<?php
class MySessionHandler implements SessionHandlerInterface
{
    private $savePath; // where to save session files

    // initialize the session object and create a directory
    // for session files if necessary
    public function open($savePath, $sessionName)
    {
        $this->savePath = $savePath;
        if (!is_dir($this->savePath)) {
            mkdir($this->savePath, 0777);
        }
        return true;
    }
    // no need to do anything when you close out a session
    // object
    public function close()
    {
        return true;
    }
    // for any given session id ($id) return the data
    // stored in the session file on disk
    public function read($id)
    {
        return (string) @file_get_contents("$this->savePath/sess_$id");
    }
    // write session data to a session file
    public function write($id, $data)
    {
        return file_put_contents("$this->savePath/sess_$id", $data) === false ?
        false : true;
    }
    // when you want to delete a session, delete the
    // session file containing the data
    public function destroy($id)
    {
        $file = "$this->savePath/sess_$id";
        if (file_exists($file)) {
            unlink($file);
        }

        return true;
    }
    // garbage collect sessions objects older than a given
    // amount of time
    public function gc($maxlifetime)
    {
        foreach (glob("$this->savePath/sess_*") as $file) {
            if (filemtime($file) + $maxlifetime < time() && file_exists($file)) {
                unlink($file);
            }
        }

        return true;
    }
}

$handler = new MySessionHandler();
session_set_save_handler($handler, true);
session_start();

// proceed to set and retrieve values by key from $_SESSION

Node.js

In Node.js, encrypted cookies can be implemented using a variety of npm modules. If you use the Connect middleware, galette and cookie-sessions are two plug-ins that give you encrypted cookie functionality.

Ruby

In Rails, the default session mechanism is an encrypted cookie, but if you want to change it to a Mongo or MySQL service all you have to do is edit config/initializers/session_store.rb.

Java

In Java, Tomcat has built-in clustering capabilities, where every session is replicated automatically to every Tomcat instance. This can be nice because enabling it does not require significant code changes; however, the downside is that managing and maintaining session replication can turn into a network overhead in this implementation and this Tomcat feature requires sticky sessions in the load balancer, which not every PaaS enables. If you want to try using encrypted cookies in Java, take a look at Marc Fasel’s SessionInCookie application and the associated blog post, which has more details.

Caching

Caching is a very important part of making web architecture scalable, and it can take many different forms.

A brute but effective form of caching is to take dynamic content that’s generated server-side, download it as a file onto the disk, and then serve that file from disk. Caching to disk can minimize CPU turnaround for each request, making it extremely fast; it can increase the number of requests per second that can be made on your website by orders of magnitude. Cached data can also be stored in RAM or NoSQL, which removes even more latency from this process and makes it more scalable across instances of your application.

Alternatively, another caching technique is to take fragments of the generated code and store them in RAM or a NoSQL cache. If used correctly, this technique can also increase the speed of your website by an order of magnitude, creating a blend of dynamic and static content mixing speed with power.

Depending on the language you are writing code in, it is often also possible to precompile source code. In PHP, you’ll find solutions like APC or bytecode generation that will increase the performance of your application by preparsing and preunderstanding the code that’s going to be run. In Java compilation is a necessary step, so this does not apply, but for many dynamic languages (like Python and Node.js) there is added value in a precompiled cache step.

Filling In the Pieces

Depending on individual circumstances, caching is a technology that may not need to be migrated from your legacy application into your cloud-based PaaS in order for you to move forward. Why? Because often caching is built in such a way that if a piece of HTML is missing—if it doesn’t exist on the disk, or doesn’t exist where it is expected to be—the system will compensate by regenerating the correct content.

For example, if you have three different instances of your code and the cache has only been populated to disk on one of those instances, when you happen to hit an instance that doesn’t have a cache yet the system will generate that code and then save it to the second instance; the next time a request comes in to the second instance, it won’t have to be generated again.

The biggest downside to relying on this is that it can lead to situations where a user sees different content when reloading a cached page. If one instance has a copy that was made a week ago and another has a copy made three days ago and another copy is fresh, every reload may present different information.

However, this is not always a problem, so from a caching perspective, migrating a legacy app is not necessarily going to create a poor user experience. However, anytime you put something on a disk, that disk can crash, or data might not be consistent across all the disks; what might be cached on one server might be different from what’s cached on another, which could result in an inconsistent user experience.

When moving legacy applications to PaaS, you can use the opportunity to implement best practices and create a central cache storage for your application, using SQL or NoSQL databases to store that data instead of a disk. So, instead of making calls to disk, you can use technology like memcached, Redis, MongoDB, or CouchDB. In some cases, you might cache the data directly into a SQL database, putting it into MySQL or PostgreSQL.

With the MySQL and PostgreSQL options, the upside is that you usually already have connections to your database that you can use and you don’t need external dependencies on yet another service. So if you’re not using memcached or another NoSQL service in your application, it may make more sense to simply use your MySQL database, since you already are using that for other aspects of your app. However, as you’re looking more and more into performance and scalability, the benefits of using memcached and other NoSQL value stores will become clear: much of the time data can be extracted far faster, especially with memcached. In NoSQL, you are usually storing that data right in the RAM, and that’s the fastest place to get it. In most SQL databases, frequently used data is also kept in RAM to an extent. If your databases do not exceed the limits of how much data is cached in RAM, you may not see a substantial difference using memcached or MySQL. However, as your datasets grow, this may become a bottleneck down the road.

One of the great benefits to using PaaS is how simple it is to add more services to your application as well. For example, if you were afraid of using MongoDB because you did not want to run and scale it, PaaS will do that for you, so you have more flexibility to try services that you otherwise might have stayed away from.

Caching with memcached in PHP

Implementing client-side integration with caching is generally a very easy process. In PHP, if you are on Ubuntu, all you need to do is run sudo apt-get install php5-memcached to install the client memcached libraries for PHP. Then you can use the following code to get and set values in the memcached key/value store:

<?php
$cache = new Memcached();

// memcached allows multiple servers, best to
// keep the names in environment variables
$cache->addServer(
    getenv("MEMCACHED_SERVER"), getenv("MEMCACHED_PORT")
);

// set
$cache->set("foo", "Hello World!");

// get
$cache->get("foo"),
?>

Caching with MongoDB in Node.js

Caching with a NoSQL database like MongoDB is generally as easy to do as using a simple RAM hash system like memcached. This is because essentially, these are all basically key/value stores and caching is a primitive form of distributed key/value storage. (This is not completely true, especially when it comes to expiration of key/value pairs, but it can naively be treated as such for most uses.)

In Node.js, there is an npm module called mongodb that provides a mongodb client. Simply running npm install mongodb will install it:

// pass the connection string in through env vars
var mongourl = process.env.ENV_VARIABLE["mongourl"];
require('mongodb').connect(mongourl, function(err, cache) {
    var collection = db.collection('cache');

    // set
    collection.insert({ 
        key: "foo",
        value: "Hello World!"
    });

    // get
    collection.findOne({ key: "foo" },
        function(err, item) {}
    );
});

Generalized Caching Functions in Ruby for Redis

When integrating any caching functionality into your application, you will typically be doing various functions repeatedly, like getting and setting key/value pairs. For maximum portability, it generally makes sense to add a layer of abstraction around this often-used code. That way, if you want to use a different object storage provider in the future, you can change your code mainly in one place rather than many.

You may organize these functions in a class if you are using an object-oriented language, or you may simply have some basic functions accessed globally.

The simple example class that follows is written in object-oriented Ruby. It contains some basic logic for working with Redis but could easily be ported to MySQL, memcached, or any other technology that can be used with caching:

gem "redis"
require "redis"

// Usage: cache = ObjectCache.new
//        cache["key"] = "value"
//        return cache["key"]
class ObjectCache
    def initialize
        @connection = Redis.new(
            ENV["REDIS_SERVER"]
        )
    end

    def [](key)
        @connection[key]
    end

    def []=(key, value)
        @connection[key] = value
    end
end

Asynchronous Processing

Long-running tasks—tasks that take a lot of CPU, RAM, or processing power—should be moved into the background so that they don’t affect the user’s experience on a website. Platform-as-a-Service providers will often kill long-running web-facing processes. This makes asynchronous processing a requirement rather than simply a best practice for moving legacy applications to the cloud.

Serving Up Stored Data

Here’s an example. You have a long list of data, maybe RSS URLs. This data is processed through many high-latency API calls (regularly polling RSS feeds from various sources). Because that data needs to be presented very quickly, you don’t want to be gathering it while processing queries in real time as the user is viewing the data.

Another example is processing images or videos. When you’re uploading an image, an application may want to resize it, compress it, or do any number of other tasks. Depending on the size of that image, processing it can take a large amount of RAM, CPU, and time. The processing can take a long time (minutes or even hours), and a user of your application should not have to wait for your application to process the data in real time. Processing images and videos should be done asynchronously and the result should be pushed to the user as the processing finishes.

To accomplish this, you need asynchronous processes that will gather and calculate in the background, using as much time and as much CPU as needed. The data can be stored either in the database or in your caching mechanism. Once stored, it can be accessed quickly and directly in real time from the web frontend application.

How to Create Asynchronous Processes

The generic technique for setting up long-running processes is actually quite simple. Here is some pseudocode for before:

for each task in tasks
do
    // could take a while
    process the task
end

and after:

for each task in tasks
do
    // only takes a millisecond
    queue task for processing
end

The processing code will look like this:

loop
do
    grab task from queue
    process the task
end

More Advanced Scheduling of Background Tasks

As you become more familiar with incorporating background processing into your applications, you may want to get fancier with it. You can incorporate open source systems like Celery for Python, which provides an asynchronous task queue based on distributed message passing and also supports scheduling tasks at certain times. Celery uses RabbitMQ, Redis, Beanstalk, MongoDB, or CouchDB as a backend for its service.

In Ruby, there is a similar project called Resque backed by Redis. These projects have the added flexibility of giving insight into the health and state of your queue, which is critical as the scale of your applications grow.

Aside from open source projects, there are even some third-party services that specialize in background tasks, like Iron.io’s IronWorker. IronWorker has libraries in Ruby, Python, PHP, Node.js, Go, Java, and .NET, and unlike Celery it does not require you to have any integration with a service like RabbitMQ or Redis. It is a fully managed service that takes your processing code and runs it for you in the cloud on servers managed by Iron.io.

SQL

When you’re dealing with legacy applications and looking to move them into PaaS, it is important to be aware of the abilities and limitations of the SQL services provided by the platforms that are available to you.

The Dilemma of Stored Procedures

A common “gotcha” in dealing with databases from legacy systems is stored procedures. The problem with stored procedures is that they can be used pretty commonly within legacy applications. However, in many platforms, the hosted databases are not very friendly to stored procedures; with many PaaS providers, they are completely disabled, which can be a problem for legacy applications. It is usually not considered a best practice to use stored procedures; although they have been used in the past, the modern programming methodologies discourage their use.

Stored procedures are functions stored in the database itself that can be used as first-class citizens when doing SQL queries on that database. A trivial example of a stored procedure might be adding two columns together on a database; you can select that stored procedure and get the combination of those columns without having to do the processing in your code.

Of course, stored procedures can get a lot more complicated and offset much of the computational effort of processing data into the database instead of the code.

One of the big problems with stored procedures is that they create one-offs that are hard to remember and hard to keep track of. It is hard to remember what stored procedures are used and need to be maintained, which makes code harder to debug and especially hard to run tests against. When you are running a local version of the database—in addition to a QA version, a development version, and a production version—it is incredibly difficult and untenable to maintain, distribute, and change stored procedures across all of them all the time. The rise of unit testing was one of the nails in the coffin of stored procedures. The inability to easily and effectively run unit tests against stored procedures across all databases took them out of fashion.

If stored procedures are impossible to remove from your application, there are solutions. PaaS providers have started to look into third-party dedicated databases, and some even offer dedicated databases themselves. There are now providers that offer MySQL-as-a-Service on a dedicated basis, including Xeround and ClearDB. Even Amazon has a MySQL service called RDS, which is a dedicated and managed MySQL environment. Typically, when you have a dedicated MySQL instance, storing those procedures is allowed.

So, if you can’t get around the dilemma of stored procedures, there are ways that you can still migrate your legacy applications into the cloud. However, the solutions can get expensive, and it might be harder to set up slaves and ensure consistency across multiple databases as you grow.

NoSQL

In legacy applications, NoSQL is one of the easier items to migrate into a PaaS, because there is no concept of stored procedures. There’s very little overhead involved. It’s a similar interface for most NoSQL services running within a PaaS; it runs just as easily as it would if you were outside of one. Whether your key/value storage is as simple as memcached or as robust as Riak, it will typically work very similarly within a cloud environment. In fact, a number of the NoSQL options can help you in many ways, such as with asset hosting, caching, and even asynchronous processing. Leveraging those key/value storages in various ways can actually help when you’re moving your legacy applications into PaaS.

Here is an incomplete list of NoSQL databases that you can choose from:

MongoDB
CouchDB
Redis
Cassandra
Riak
HBase
Cassandra
Amazon SimpleDB
Amazon DynamoDB
Azure Table Storage

Miscellaneous Gotchas

When you’re migrating any legacy application into PaaS, you must consider how your application will survive and how the user experience will look if the disk goes away or if any one individual server dies.

PaaS can deal with managing, load balancing, network failover, and heartbeat monitoring of your individual services. It can do much of the work for you, but it cannot rewrite your legacy code for you (yet). It cannot know the dependencies within your code that are related to writing to the disk. Many times there are small pieces of code in a library you have forgotten about that have assumptions regarding storing information. Trying to port and migrate your applications into PaaS can often feel like looking for a needle in a haystack. But once you have done so, the benefits are large and important: you are making your application conform to standards independent of cloud standards that are simply known throughout the industry as best practices.

When you are prototyping applications or building them in the early stages, you might never think that it’s going to get to the point where you’ll need to migrate to PaaS. This leads us to consider another gotcha.

The Optimization Trap

One of the easiest traps to fall into when writing or rewriting your code is premature optimization: building features that you think are going to be needed or optimizing aspects of your app before they actually need to be optimized. However, the topics and actions we have discussed so far in this chapter do not fall into the camp of premature optimization. They really fall into the category of best practices, because any application that starts to grow is always going to need to employ PaaS.

Premature optimization is making an optimization that may or may not matter at scale. All of the topics we’ve covered in this chapter will always matter, and should be considered when you’re building your applications from scratch so that you don’t end up having to port more legacy code somewhere down the line.

Starting from Scratch

Is it better just to throw out your legacy applications and start from scratch?

It depends. The considerations will be different for every application, for every developer, for every team. There are certainly going to be cases and situations where it makes more sense to redo your code from scratch. There will also be cases where you will not be able to do so, since you might be dependent on code from systems like WordPress or Drupal; you might be so dependent on legacy code that throwing it out and starting from scratch is simply not an option.

There are also cases in which an application has grown to a size that makes it very difficult to throw out all the existing code. Other projects that might have pieces of code that can be thrown out, enabling you to decouple and create smaller applications to do the services—file uploads, for example—independently of the main code itself. Again, it definitely depends on the situation.

A Final Note on Legacy Apps

Look for all the places where your code depends either directly on the disk or on any individual server. Consider storing in RAM data that is needed across many instances and auditing for every place that could possibly affect the user experience negatively.

Get Programming for PaaS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Programming for PaaS by Lucas Carlson