I’m writing a series of posts about Generalizing Apdex. This is #14. To minimize confusion, section numbers in the current spec are accompanied by the section symbol, like this: §1. The corresponding section numbers in the generalized spec, Apdex-G, are enclosed in square brackets, like this: .
The current Apdex specification defines an attractively simple scoring rule in which measurements falling within the zones Satisfied, Tolerating, and Frustrated receive scores of 1, ½, and 0 respectively. Enhancements have been proposed that would involve retaining the core approach to scoring while allowing for more than two targets (or thresholds), thereby creating more than three zones, and using finer scoring gradations (but still between 0 and 1) for rating measurements that fall within those zones.
In my previous posts in this series, I have stated that the notion of classifying all measurements into one of three performance zones is a core feature of Apdex that should be retained, because three-category classification schemes are common to many measurement and reporting domains. I have also made the case for allowing more thresholds, and thereby defining each performance zone as the union of one or more distinct performance intervals. For the conclusion of that discussion, see Generalizing the Apdex Thresholds.
In this post I will consider whether the Apdex formula itself should be generalized. In particular, should Apdex-G accommodate other scoring rules within the Tolerating Zone--and if so, how?
I am going to approach that question through the process of reviewing section §4 of the current spec (shown in the left-hand column below), and proposing text for the corresponding paragraphs of Apdex-G (shown on the right).
Describing the Formula
In composing the introductory paragraph and section [4.1] of Apdex-G, in addition to removing references to response times, my goal was to reduce the amount of repitition in the spec. Rather than describing concepts that have already been defined in earlier sections, I decided to simply refer to the appropriate descriptions or definitions. In this case, the relevant paragraphs should all be found in section  Apdex Inputs. I have not yet drafted these, but when I do, I will make sure they fit properly with section .
§4. Calculating the Index
The Apdex does not entail new measurements – rather it is a new way to represent existing measurements, calculated by counting the measurement samples in each of the performance zones.
§4.1 The Apdex Formula
The Apdex is calculated for each report group using the following equation:
A report group that defines a set of measurement samples, and a target threshold T (seconds) between the satisfied-tolerating zones of performance
F defines the threshold between the tolerating-frustrated zones of performance,
F = 4T
There are counts of response time measurement samples within the above defined performance zones of:
Satisfied_Count = number of satisfied response time samples,
Tolerating_Count = number of tolerating response time samples,
Total_Samples = number of all samples in the report group
 Calculating the Index
The Apdex does not entail new measurements, rather it is a new way to represent an existing set of measurements, reflecting the degree to which those measurements achieve designated targets.
[4.1] The Standard Apdex Formula
The Apdex index is calculated as follows. Given:
Measurement data that meets the requirements of section [3.1]
A report group comprising measurement samples, defined according to section [3.2]
Three performance zones (Satisfied, Tolerating, Frustrated), defined according to sections [3.3] and [3.4]
An allocation process that assigns each sample to a performance zone, and counts all samples, so that:
Total_Samples is the number of all samples in the report group
Satisfied_Count is the number of report group samples in the Satisfied Zone
Tolerating_Count is the number of report group samples in the Tolerating Zone
Then the Apdex index for the report group is:
Pros and Cons of Configurable Scoring
-- Is Apdex Sharp Enough? Alan Ackers, Apdex Symposium, 2008 ACKE08
One way to describe an Apdex score is as the weighted proportion of satisfactory samples in a report group. Samples in the Satisfied Zone have weights of 1, those in the Frustrated Zone have weights of 0. I see no reason why those weights should change. But today, all samples in the Tolerating Zone have weights of ½, regardless of their values.
Alan Ackers and Neil Gunther GUNT09 discuss further partitioning the Tolerating Zone into sub-zones, to make the Apdex score more responsive to the distribution of tolerating samples. Similar ideas, like scoring measurements within the Tolerating zone on a sliding scale between 1 and 0, have been raised during informal discussions of Apdex at conferences like CMG. All such proposals assign progressively higher weights to samples approaching the Satisfied zone, lower weights approaching the Frustrated Zone. Ackers and others have described some pros and cons of this idea, which I summarize below:
- Less exposure to boundary effects: This is unlikely to matter when reporting on transaction response times, which tend to vary widely. But when measuring a rapid and well-defined activity like a database lookup, rounding effects can produce clumps of similar or identical samples. In that situation, a small change to an Apdex threshold can move a large group into a different performance zone, significantly altering the Apdex score. A graduated scoring function reduces the impact of this type of boundary effect, because samples close to the Satisfied or Frustrated thresholds already have weights close to 1 or 0 respectively.
- Harder to sell Apdex politically: The formula is not as simple conceptually, making it harder to explain and justify to business management as a performance indicator.
- Computational complexity: The sample sorting process is far more involved, because more buckets and comparisons are needed. The more complex calculation requires additional processor usage and results in slower calculations, which are not good for tools that need to calculate Apdex in real time.
In practice, for most measurement domains, if the average of all weights used (which must be values between 1 and 0) remains ½, then I think applying any graduated scoring system within the Tolerating zone would be unlikely to make much difference to the result of an Apdex calculation. In which case, the current method of using a constant weight of ½ for all samples in the Tolerating Zone is preferable, because it is much simpler to compute.
On the other hand, we do want Apdex-G to be applicable to any measurement domain that uses a three-category classification scheme. And allowing an Addendum the option of specifying a more general rule for scoring samples within the Tolerating Zone does not seem to undermine any fundamental characteristic of Apdex.
Therefore I propose that Apdex-G adopt the current Apdex formula as the standard scoring method, but also allow an Addendum to substitute an alternative scoring function (such as a sliding scale) for the factor Tolerating_Count/2 in the standard formula. In the draft spec for Apdex-G (continued below), I have included that option as section [4.1.2], following a note on the standard formula that I have carried over from the current Apdex spec, with some clarifying wording changes.
The Formula in Action
Note that measurements in the frustrated zone are counted in the number of total user samples in the denominator. To achieve the optimal Apdex value of 1.00, all users must experience satisfactory performance. If some users see tolerating or frustrating performance, then the index rapidly dips below 1.00. For example, if 80% of users are satisfied and 10% are tolerating, while the remaining 10% are frustrated, the index is 0.85.
[4.1.1] The Standard Formula in Action
Note that measurements in the Frustrated zone are counted in Total_Samples, the denominator of the formula, but do not contribute to the numerator.
Another way to describe the Apdex index is as the weighted proportion of satisfactory samples in the report group. Samples in the Satisfied Zone have weights of 1, those in the Tolerating Zone have weights of ½, and those in the Frustrated Zone have weights of 0.
[4.1.2] Alternative Scoring in the Tolerating Zone
If the characteristics of a particular measurement and reporting domain justify using graduated weights within the Tolerating zone, an Addendum may substitute an alternative scoring function (such as a sliding scale) for the factor Tolerating_Count/2 in the standard formula.
Errors and Exceptions
Not all measurements reflect normal operation of the system or process under study. And because the intent of an Apdex index is to track the degree to which a system under study is achieving designated targets, errors and exceptions cannot be ignored. Indeed, they should be counted whenever possible, because they usually indicate that a system or process failed to achieve a target performance level.
Apdex-G provides some general guidelines for handling exceptions, which may be refined and extended by an addendum to cover domain-specific errors and exceptions. The discussion in the current spec will be reviewed for inclusion in Apdex-R.
§4.2 Dealing with Exceptions
User aborts are factored into the above equation. A user abort occurs when a user enters a new inquiry before the system responds with the original inquiry. A user-generated abort stops the timing of the Task. Therefore, user aborts can fall into any of the satisfied, tolerating, frustrated zones. If a tool can detect a clear server-generated abort, then it is handled differently. Server aborts (e.g., TCP closes within a Task) are counted as a frustrated sample regardless of the Task time measurement.
Some tools may have the optional capability to interpret the application to a greater level of detail than the minimal Task boundary. For example, they may be able to detect user relevant information at the layer of the application logic. If the tool can detect Task errors, then these application errors (e.g. Web page 404 replies) are counted as frustrated samples.
[4.2] Dealing with Exceptions
When a Report Group contains samples marked as errors or exceptions, tools performing the Apdex calculation should classify those measurements as follows:
(a) Exceptions caused by intentional user intervention may be classified as Satisfied, Tolerating, or Frustrated in the same way as any other measurement, if the necessary field(s) are present in the sample. If the field(s) required to perform classification are absent, user-generated exceptions should be classified as Satisfied.
(b) Exceptions indicating abnormal system or process behavior may be classified as Tolerating if the system or process immediately returned to normal operation without requiring abnormal intervention.
(c) Exceptions indicating a system or process failure should be classified as Frustrated.
(d) Exceptions indicating measurement errors should be discarded from the Report Group.
(e) Exceptions not amenable to classification should be discarded from the Report Group.
An addendum may specify domain-specific refinements to these general guidelines.
As usual, all these proposals are open for public discussion. Please use the comment form below to contribute any comments, suggestions, or questions.