I’m writing a series of posts about Generalizing Apdex. This is #6.
The current Apdex specification is entirely focused on application response time, specifically on the response time of Tasks and Task Chains. However, people have already adopted the Apdex method as a convenient way to report on other metrics, including network “turns”, bandwidth, and VOIP quality scores. These adaptations demonstrate the strength and adaptability of the core Apdex concepts. But because such ad hoc extensions fall outside the standard, they are likely to be implemented in inconsistent ways. By generalizing the Apdex spec, we aim to rectify that situation, bringing a wide class of metrics in a wide range of measurement domains within the scope of the standard.
Which Apdex features are candidates to be generalized? To answer that question, I will review the current Apdex specification. For clarity, I will use the section symbol (§) when referring to specific sections or subsections of the spec. During this pass, my goal is only to identify the major aspects of the current standard to be reviewed and reworked, not to address the actual structure or language of the generalized spec. It will be much easier to produce precise language once the concepts are clear.
This section introduces the document and its relationship to the Alliance, and to other documents. This is an area that the Alliance considered when it began working on generalization in 2007. So I will begin by adopting an earlier decision as a working assumption, refining it with some new terminology to simplify our discussions:
We should separate the core Apdex method into its own specification, and create distinct spec documents that contain any rules associated with domain-specific applications of the Apdex method. I will refer to the domain-specific components of the spec as addenda.
To simplify the discussion here, I will also introduce letter suffixes to denote specific elements of an Apdex spec. This idea is modeled on the naming scheme used by the Transaction Processing Performance Council for their TPC benchmarks.
I will refer to the new core specification as Apdex-G. This document will contain definitions and notation for all the core Apdex features we decide to generalize, such as targets, zones, and scoring. If I need to refer to a domain-specific addendum, I will use a suffix appropriate to their particular domain, such as Apdex-R for response time, Apdex-V for VOIP quality, Apdex-S for service quality, Apdex-N for network performance, etc.
This naming scheme, and my particular choices of letter suffixes, are arbitrary editorial conveniences at this point; they have never been discussed by the Alliance. However, there may be marketing benefit from defining different versions of Apdex in this way. The existence of separately identified versions of the Apdex standard may help to emphasize its broad applicability. If you have opinions or alternative suggestions as to the best way to name and distinguish the components of a generalized Apdex spec, please contribute them in a comment to this post.
Finally, while we’re reviewing §1, we should at least note that the name–Application Performance Index–is not a neutral one:
While it can be reasonably applied to many metrics in many measurement domains, there are probably some in which the Apdex method would be more applicable than the name itself. Does that mean we should not seek to apply Apdex in those domains? I don’t think so; it’s sufficient that the name reflect the history of the method. In practice this may not be a real issue, but the origin of Apdex should be noted somewhere in the introduction to Apdex-G.
§2. Index Overview
[§2 (preamble), §2.3, §2.4] These sections are specific to measurements of application responsiveness; they belong in Apdex-R.
[§2.5] This section introduces the concepts of Apdex thresholds (T and F) and performance zones (Satisfied, Tolerating, and Frustrated), but in the context of application responsiveness. This must be generalized, with all discussion of application responsiveness moved to Apdex-R. In particular, Apdex-G must allow for:
The present spec (§2.5) contains the rule that the value of F is four times the value T. This rule of thumb, while convenient, is not generally applicable outside the application responsiveness domain; it must be moved to Apdex-R. Since F cannot, in general, be derived from T, Apdex-G must permit both T and F to be specified separately, and make no assumptions about which is the larger value. In some measurement domains, larger values are more desirable.
Separately Configurable Thresholds will be the subject of a future post.
These changes are unlikely to encounter any objections. They have been proposed for several years within the Apdex community, and are already employed in some Apdex-based reporting. For example, Keynote Systems implemented them in their VOIP quality reports. See Clarity in Voice Performance Measurements with Apdex, presented by Ken Harker at the Apdex Symposium in December 2007 [PPT file].
But more complicated requirements may arise:
Up to now, Apdex has been applied exclusively to report metrics whose measure of “quality” decreases (or increases) monotonically. For response times, smaller numbers are always better. For VOIP MOS scores, larger numbers are always better. But this characteristic does not necessarily apply to every metric that could be reported using Apdex. We can envisage Apdex being used to report on metrics for which:
- closeness to a target value represents “quality,” with user satisfaction being tied to the distance of a measurement–above or below–the target
- the zones are ordered differently, such as Tolerating > Satisfied > Frustrated (this can be true for service quality scores, for example)
To accommodate these cases, we would need Apdex-G to provide more flexibility when configuring targets and zones. See also “Configurable Scoring” under §4 below.
Configurable Zone Alignment will be the subject of a future post, in which I will aim to address all requirements for separately configurable threshold values.
[§2.6] How the Index Works: This needs minor wording changes to remove any references to ‘seconds’.
§3. Apdex Calculation Inputs
[§3.1] Response Time Measurements: most of this section must be moved to Apdex-R. It can be replaced by more general guidelines about measurement tools.
[§3.2] The Report Group definition must be generalized, removing references to Tasks and Task Chains
[§3.3] The entire discussion of Thresholds must be generalized, removing references to ‘seconds’ and incorporating any decisions made about “Configurable Zone Alignment” (§2.5).
[§3.4] Number of Measurement Samples: The core ideas in this section can remain, with minor wording changes. However, the minimum number of samples is probably domain-specific, so that rule should be moved to the addenda. Apdex-R should specify a minimum of 100 samples.
§4. Calculating the Index
The present spec defines an attractively simple scoring rule in which measurements falling within the zones Satisfied, Tolerating, and Frustrated receive scores of 1, ½, and 0 respectively. Alan Ackers and Neil Gunther have discussed possible enhancements that would involve retaining the core approach to scoring while allowing for more than two targets (or thresholds), thereby creating more than three zones, and using finer scoring gradations (but still between 0 and 1) for rating measurements that fall within those zones.
The changes necessary to permit these kinds of generalization are closely related to those discussed under “Configurable Zone Alignment” (§2.5) and the two topics must be considered together.
Configurable Scoring will the subject of a future post.
[§4.1] The description of the Apdex Formula must be rewritten to incorporate any decisions made about “Configurable Zone Alignment” and “Configurable Scoring”.
[§4.2] This discussion of Exceptions must be moved to Apdex-R. I’m not sure if we need to replace it with any more general equivalent.
§5. Reporting the Index
[§5.1, §5.2, §5.3] The rules for Displaying the Apdex Value and Indicating Sample Size must be generalized to handle “Separately Configurable Thresholds” (§2.5). Also, these requirements of the present Apdex standard may need to be modified to allow Apdex to be applied in many other domains.
The issues to be considered in the generalization process are (1) the degree of generalization to be accommodated, (2) the notation to be used to describe the more general targets, zones, and scoring, and (3) because the more general notation will necessarily be more complex, whether to retain any simpler notation for use in simpler cases like those defined in the present spec. This is an area for further research; I will try to identify some specific examples of the requirements of other domains.
Configurable Reporting will the subject of a future post.
[§5.4] Additional Reporting Rules
The thresholds defining the Apdex Rating Bands (Excellent, Good, Fair, Poor, Unacceptable) are already defined as optional in section 5.4 of the present spec. But if implemented, the details must follow the specification.
According to Jonathan Becher, such grades are an essential feature of a KPI. However, I suspect that the Apdex thresholds, and the names of the five rating bands they define, may need to be configurable so that an Apdex implementation can conform to existing practice within a measurement domain, or within a user organization.
Configurable Rating Bands will the subject of a future post.
We should review, and decide what is appropriate as the Apdex-G spec evolves. One possibility is to convert this section into a “History” section, unless we have references that support the concepts behind the generalized version of Apdex. There is some flexibility in the choice and allocation of explanatory background material to sections §1, §2, and §6, so I intend to defer decisions on the exact composition of those sections until later in the process.
See my previous post, An Extensible Apdex Glossary. My current view is that all Apdex spec documents should share a common glossary; any other approach seems likely to be a maintenance headache.
I will be following up with these posts (but not in this order). I am introducing an unobtrusive color convention for the seven parts of Apdex-G; this may turn out to be useful during detailed discussions of components of the new spec:
- Section 1: Generalizing the Apdex Language
- Section 1: Apdex-G Section  Introduction
- Section 2: Apdex-G Section  Index Overview
- Section 2: Separately Configurable Thresholds
- Section 2: Configurable Zone Alignment
- Section 2: Generalizing the Apdex Thresholds
- Section 3: Apdex-G Section  Calculation Inputs
- Section 4: Configurable Scoring in Apdex-G
- Section 5: Configurable Reporting in Apdex-G
- Section 5: Report Groups and Quality Ratings in Apdex-G
- Section 6: Apdex-G Background and References
- Section 7: An Extensible Apdex Glossary
These posts will discuss Apdex-G at the next level of detail, and I will link each post as it is published. Otherwise I do not plan to extend or refine this post unless I discover an error or omission. However, my conclusions in this post are certainly open for public comment. If you post a question or suggestion below, I will respond.