Building an SLA

One of the main reasons for web operators to collect end-user data is to build an SLA. Even if you don't have a formal SLA with clients, you should have internal targets for uptime and page latency, because site speed has a direct impact on business experience.

User-facing SLAs have several components (see Table 11-2). You need to be specific about these so that there's no doubt whether an SLA was violated when someone claims that a problem occurred.

Table 11-2. The elements of a user-facing SLA

SLA component

What it means

How it's expressed

Example

Task being measured

The thing being tested—the business process or function itself

This is usually expressed as a name or description of the test; avoid using just the URL or page name as it makes the test harder to read.

"Updating a contact record"

Metric being calculated

The element of latency that's being computed. If you can't control it, it shouldn't be in your SLA.

This is a measurement that is specific and can be reproduced across systems. You should know, for example, that "page load time" means "from the first DNS lookup to the browser's onLoad event."

"Host latency"

Calculation

The math used to generate the number

Unfortunately, this is usually an average. Don't do this. Averages suck. Insist on a percentile (or at the very least a trimmed mean), and a single bad measurement won't ruin an otherwise good month.

"95th percentile"

Valid times

The times and days when the metric is valid. If you don't include this, you won't have room ...

Get Web Operations now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.