O'Reilly logo

Embedding Perl in HTML with Mason by Ken Williams, Dave Rolsky

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 10. Scalable Design

So now that you know how to do things with Mason, it’s time to start thinking about how to do things cleanly, scalably, and maintainably. Mason is a good tool, but it is not magic, and you still need to think about design when you use it.

Modules Versus Components

Mason is a powerful tool for generating content. Its combination of easy templating syntax, powerful component structures, and features like autohandlers, dhandlers, and component inheritance all combine to make it much like Perl itself: it makes easy things easy, and difficult things possible.

However, exactly like Perl itself, the facilities it provides can make it all too tempting to do things the easy way, and Mason makes no attempt to enforce any sort of discipline in your design. Instead, this is your responsibility as a programmer and application designer. This is where the responsibility always lies, no matter what language or tool you are using.

Though Mason is at its core a text templating tool, it also provides much more functionality. One such piece of functionality is that individual components are almost exactly like subroutines. They can be called anywhere in your processing and they can, in turn, call other components, generate output, and/or return values to the caller. And, like Perl’s subroutines, variables defined inside a component are lexically scoped to that component.

It is this similarity between components and subroutines that can lead to design trouble. As long-time Mason users, we have come to believe that Mason components should be used almost exclusively for generating output. For data processing, we believe that Perl modules are the better solution. In our experience, this division of labor leads to long-term benefits in maintainability and clarity of design.

When we say “generating output,” we mean generating binary or text output of any sort (HTML, XML, plain text, images, etc.) to be sent somewhere (STDOUT, a web client, etc.). In a web environment, this includes things like sending redirect headers or custom error responses as well as HTML. When we say “data processing,” we mean the work of retrieving data from an external data source such as a database, processing data and constructing useful objects or data structures, doing calculations, implementing business logic, or munging data.

Our exception to this rule is when the data processing is entirely part of the UI that Mason is generating. For example, in a web context, it may be necessary to do some munging of POSTed data or to translate data from the manner in which it is presented in the UI to a format suitable for your backend.

But Mason is not the right tool for all jobs, and it should not form the entire infrastructure of any project.

The rest of this discussion will assume a web environment, as that is Mason’s primary domain, though this discussion can apply to any environment in which Mason could be used.

Another important goal is to minimize duplication of code. You will never eliminate this entirely, but this should always be your goal. Duplicated code leads to bugs when one piece changes and the other doesn’t, increases the difficulty of understanding the entire code base, and increases implementation time for bug fixes and changes.

Obviously, the line between generating output and data processing is extremely blurry. Given that fact, perhaps the best goal is to reduce the data processing in Mason components to the minimal amount necessary to properly generate output. All other application logic should be placed in Perl modules and called from your components.

The line that needs to be drawn is one that makes the code flow in both your modules and your components as natural as possible. We don’t want to go into impossible contortions in order to eliminate four lines of processing from a component, nor do we want to put knowledge about Mason or our components into our modules. Like all design tasks, there is as much art as skill involved.

For example, as mentioned before, we consider it entirely appropriate for Mason components to handle incoming request argument processing. A component could use these arguments to determine what library function to call or what object to instantiate. It might also use these arguments to change the way it generates output, for example if there were a parameter indicating that no images should be included on a page.

There is little reason to handle this particular processing task with a module. Indeed, this would be creating exactly the kind of dependency we believe is so problematic in using Mason for application logic. Your modules should be generically useful and if they depend on being called by Mason components, they are useless outside of the Mason environment.

What exactly is the danger of blurring these lines? Well, Mason is a fine system for generating HTML or other forms of output. However, let’s assume that you plan to also provide your data via an email interface. A user may write an email to you with a specific body such as “fetch file 1,” and your application will respond with the contents of file 1.

In a case such as this, you just want to execute some application logic to fetch a file and then spit it out to your mailer. It is unlikely that any of Mason’s powerful features would be necessary in order to perform this task; in fact, Mason would probably get in the way.

Another example can illustrate this issue further. Let’s assume we want to build an application to serve as the backend for a new web site focused on news about Hong Kong movies. Let’s assume you intelligently decide to make a single component to generate a story box. A story box has a headline, an author, and the first 500 characters of the story. If there are more, it has a link to read the whole thing.

Here’s the HTML-making portion of the component:

<h1><% $story{headline} | h %></h1>

written by <b><% $story{author} | h %></b>

<% substr ($story{body}, 0, 500) | h %>
% if ( length $story{body} > 500 ) {
<a href="full_story.html?story_id=<% $story{story_id} %>">
Read the full story
% }

Pretty simple, no? The component contains some application logic, of course. It checks the length of the story’s body and changes the output depending on it. But the real question is where the %story hash comes from. Let’s assume that we call another component to get it. So then we have this:

 my %story = $m->comp('get_newest_story.mas');

So what’s the problem? Well, there is none as long as the only time you want to get the newest story is in a Mason environment. But what if you wanted to send out the top story anytime someone sent an email to you at newest_story@hkmovienews.example.com?

Hmm, let’s write a quick program to do that:

#!/usr/bin/perl -w

use HTML::Mason::Interp;

my $outbuf;
my $interp = HTML::Mason::Interp->new( out_method => \$outbuf );

my %story = $interp->exec('/path/to/get_newest_story.mas');

# imagine the mail is sent

Not so bad, we suppose. Here are some issues to consider:

  • You just loaded a couple of thousand lines of Perl code in order to do a simple database fetch and then send an email. And because this email interface has become quite popular, it’s happening a few times every minute. Your sysadmin is looking for you and she’s carrying a big spiked club!

  • The return value of $interp->exec() may not be what you’d expect. If the component you called did an $m->abort('something') internally, the return value will be 'something'. This works fine when using the Mason ApacheHandler code, but it isn’t what you expected in this situation.

  • If any component you call (or that it calls) references $r (the Apache request object), it will fail spectacularly. It’s nice to feel free to access $r in your components, but if you were trying to make a multipurpose Mason system you’d have to be sure not to use $r in any component that might be used outside of a web context, and you would feel fettered and stifled.

Now imagine that you multiply this by 40 more data processing and application logic components. Then remember that if you try to do 'perldoc get_newest_story' from the command line, it won’t do anything! And remember that you have 40 separate files, one per API call. Now imagine that you take advantage of Mason’s inheritance and other fancy features in your data processing code. Now imagine trying to debug this later.

If, however, you put the 'get_newest_story' functionality into a module, you could call this module from both your component and your email sending program, looking something like this:

#!/usr/bin/perl -w

use MyApplication;

my %story = MyApplication->get_newest_story( );

The advantages include:

  • You can easily preload your shared library code in the main Apache server at startup, resulting in a memory savings.

  • Performancewise, calling a subroutine in a module is much more lightweight than calling a Mason component. A Mason component call involves calling a subroutine and also performing a bunch of overhead tasks like checking the age of the component file, checking required arguments and types, and so on.

  • Perl modules have well-known mechanisms for documentation and regression testing. Psychologically, we feel that an API is more stable when we have a documented module that instantiates it. A tree of components feels more mutable, and we hate feeling as if we’ve built a shaky house of logic that we don’t necessarily understand in the end.

The Other Side

However, that’s not to say you don’t lose anything. Here’s a summary of a number of arguments we’ve heard on the possible advantages of using Mason components for data processing, along with our responses.

Data processing in Mason components provides developers with a unified way of writing both display and processing code. This is especially appreciated by less experienced developers not accustomed to writing modules.

Perl modules are one of the fundamental tools for writing reusable code and creating maintainable applications. It may be convenient to use Mason for data processing in the short term, but in the long term you’ll be better served by moving to a more formalized approach involving separate mechanisms for processing and display.

For rapid development environments, it’s hands-down faster to create a new component, and you are less likely to have a merge conflict with another person’s work.

Once a module is created, adding a new function or method to it is fairly trivial, but the initial process does require some thought. And yes, merge conflicts are more likely when using version control because you will have fewer files, though in our experience this is not terribly common.

Mason has support for private versions of processing code. One person said that where they work everyone has a version of the site checked out from version control and views his version through TransHandler magic via <name>.dev.example.com. Developers can change their own version of the processing components and preview the changes. If the processing code were in modules, every developer would need his own Perl interpreter, thus a separate server.

It is possible, though not completely trivial, to provide every developer a unique copy of the modules in his own server. This can be more of a maintenance hassle, particularly when adding new developers, though some automation can eliminate the hassle. Again, this is a case of investing time up front as an investment in the future. This issue is discussed in Chapter 11.

For example, giving each developer his own Apache daemon is relatively easy, running it on a unique high-numbered port. Each developer’s server can then use the developer’s local copies of the code, modules, and components, so the developer can work in isolation and feel free to break things without slowing anyone else down.

Or, just as easily, each developer can run a daemon locally on his own computer, perhaps connecting to a central test database or even running a RDBMS locally.[21]

Most importantly, nothing can replace solid coding guidelines, development practices, and testing, coupled with tools like version control.

Components give you many fringe benefits over Perl subroutines: named argument passing and checking, result caching, a lightweight hierarchical naming structure, component logging, and so on.

We can’t really argue with this. It’s true. However, we have yet to find ourselves really wishing for this functionality when developing application logic. Named arguments are nice, but CPAN provides several nice solutions for validating named arguments, including Params::Validate, which Mason uses internally.

There have been times when shoving data processing into a Mason component was exactly what we’ve needed. The code sits there right next to the code that calls it, not off in site_perl/, which should usually have some tight controls over what gets put in it. In a matter of seconds you can try things out without worrying about module naming, namespace collisions, server restarts, and so forth. Then when you’ve had a chance to think about what a good interface should be like, you can migrate the code to a module. It’s all well and good to extol the virtues of good planning, but the creative process is seldom very plannable unless you’ve done a similar task before.

On yet another hand, you can always maintain your own module directories and add them to Perl’s search path via a quick use lib.

We are certainly not advocates of the “design everything and make sure it’s perfect before coding” school of design. Our points are more about the end product than the development process itself. Your process should lead to the creation of clean, maintainable code. If you make a mess while writing it, we certainly won’t criticize as long as it gets cleaned up in the end.

Our summary is simple. Writing your application logic and data processing as Mason components is a shortcut that can bite you later. Like many design trade-offs, it speeds up initial time to release while guaranteeing maintenance pain in the future.

[21] Though a local RDBMS may be more trouble than it’s worth with a high-maintenance RDBMS like Oracle.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required