نمودار اکشن و تجزیه و تحلیل عملکرد کاربر
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
28077 | 2013 | 27 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : International Journal of Human-Computer Studies, Volume 71, Issue 3, March 2013, Pages 276–302
چکیده انگلیسی
A user operating an interactive system performs actions such as “pressing a button” and these actions cause state transitions in the system. However to perform an action, a user has to do what amounts to a state transition themselves, from the state of having completed the previous action to the state of starting to perform the next action; this user transition is out of step with the system's transition. This paper introduces action graphs, an elegant way of making user transitions explicit in the arcs of a graph derived from the system specification. Essentially, a conventional transition system has arcs labeled in the form “user performs action A” whereas an action graph has arcs labelled in the form “having performed action P, the user performs Q.” Action graphs support many modelling techniques (such as GOMS, KLM or shortest paths) that could have been applied to the user's actions or to the system graph, but because it combines both, the modelling techniques can be used more powerfully. Action graphs can be used to directly apply user performance metrics and hence perform formal evaluations of interactive systems. The Fitts Law is one of the simplest and most robust of such user modelling techniques, and is used as an illustration of the value of action graphs in this paper. Action graphs can help analyze particular tasks, any sample of tasks, or all possible tasks a device supports—which would be impractical for empirical evaluations. This is an important result for analyzing safety critical interactive systems, where it is important to cover all possible tasks in testing even when doing so is not feasible using human participants because of the complexity of the system. An algorithm is presented for the construction of action graphs. Action graphs are then used to study devices (a consumer device, a digital multimeter, an infusion pump) and results suggest that: optimal time is correlated with keystroke count, and that keyboard layout has little impact on optimal times. Many other applications of action graphs are suggested.
مقدمه انگلیسی
Predicting how users will perform using an interactive system is a key part of the science of HCI as well as a practical part of usability analysis. This paper introduces action graphs, which generalise finite state machines to allow analysis of user actions. The dimensions of buttons and their separation along with an action graph can be used to predict time or other costs the user incurs for any sequence of activities. Since times are calculated using programs, any programmable function can be used, such as the Fitts Law, KLM or other model (even financial costs). This paper provides an algorithm (in Java) to convert a standard model into an action graph; our work is reproducible and could be embedded into analysis tools. This makes a significant advance on our previous work ( Thimbleby, 2007a and Gimblett and Thimbleby, 2010). Almost any interactive system can be analyzed with action graphs, though the example case studies in this paper are based on “control panel” devices with a small keypad, rather than typewriter (QWERTY)-based devices; thus this paper is not explicitly concerned with information-based applications (word processing, data entry, diaries, address books, etc.) but with the control of systems (such as instrumentation, medical devices, consumer devices)—although an abstract view of a complex application such as a word processor may have interesting control features, say, in its dialog or menu structures, which would be amenable to action graph analysis. Our action graph case studies suggest that optimal task time and keystroke counts are correlated and, surprisingly, that keyboard layout is not a significant factor in optimal task times. However, such results are but a small contribution of the paper, since action graphs can be used to explore many further issues. 1.1. Conceptual background: tasks, generalisations and abstractions This paper presents a mathematical framework to address certain HCI questions, and its main benefits are that it permits a complete and automatic analysis of certain issues previously beyond the reach of researchers (except in the very simplest of cases). As a piece of mathematics, it is correct; the key questions, then, are whether it can be applied to HCI in an appropriate and in a useful way? By way of comparison, “addition” is correct mathematically, but whether and to what extent it can be usefully applied to real-world questions, say, about money and cash depends on various non-mathematical, or at least “non-addition” issues. For example, in “the real world” there are inflation and interest and bank charges and even thieves, so money in a bank account does not quite follow the usual laws of addition without a lot of qualification. Cash, however, is more familiar than HCI theory and certainly far clearer than action graphs, which this paper introduces! We will therefore use the very familiar territory of cash as a conceptual bridge to help make some of the HCI issues of action graphs clearer: very familiar issues with cash and addition have interestingly analogous issues in the less familiar territory of action graphs. The big picture could be put like this: although one would hardly think of dismissing the abstract idea of addition because of the technicalities of inflation, it might be tempting to dismiss action graphs because of “their” problems when in fact the problems are more to do with the complexity of HCI. In particular, the rigour of action graphs highlight many boundary problems that deserve more research, in a sort of similar way that an apparent failure of addition on your bank account might reveal a thief or something even more interesting at work that deserves closer investigation rather than dismissing theory that does not cover everything. If different sorts of coins are to be added for a cash value, they should be treated with different values. In this paper, we use our approach to add times due to finger movement, but it could also be used to add times (or even cash values) from other sources. We use the Fitts Law to estimate times for a user to do tasks, but we could have used, for instance, KLM ( Card et al., 1980), which would add further types of time values. Mathematically, this is trivial, but for the first paper introducing the approach it adds a level of complexity—in fact, we side-step this complexity by emphasising lower bounds on times; KLM would increase timing estimates, but does not affect hard results from lower bounds. (The second case study, discussed in B.1, introduces “button hold operators,” which shows that generalizations like KLM are trivial.) If cash (e.g., from a loan) is to be added, it may have a time-dependent value. We assume the user interface has a fixed physical layout, as occurs on physical devices such as industrial control panels. The mathematics can handle dynamic, soft key layouts, too, but for the purposes of this paper such dynamic features introduce unnecessary complexity. If very large amounts of cash are to be added, a computer program may overflow and give incorrect results. We use a computer to perform calculations with action graphs, and as such, we are limited to work within the practical limitations of computers. This means there are some interactive systems that are too complex to be satisfactorily analyzed, but we would argue that such systems raise HCI questions of a different nature than our approach is intended to handle. Moreover, a system that is too complex for a computer to analyze is possibly too complicated for conventional concepts of usability to be applied. If people do not declare all their capital and cash flow, one will obtain incorrect results. People often ignore illiquid capital—because they are only interested in cash, or perhaps because they are trying to pay smaller insurance premiums. In other words, one has to be clear what the task is, and then analyze it correctly. Our approach uses action graphs. Any task a user performs on a device changes the state of the device's action graph; thus, every task corresponds to a state change. Just as there are some types of monetary value one may not wish to declare, there are some types of state change that one may—or may not—consider to be valid tasks. For example, a type of task one might want to ignore when analysing a ticket machine is “press buttons, insert cash, but do not get a ticket.” Undoubtedly the device has a sequence of states corresponding to this failed task! For some analyzes, one might want to know the time cost to the user of failure (presumably it would be very frustrating for it to take a long time before the user discovers they cannot get a ticket), and for other analyses one might wish to ignore it. From a computational perspective, both choices are easy: we can define tasks as any state change, or define tasks as any state change ending with dispensing a ticket—or we can impose any other task criterion that interests us to analyse. How people wish to use their cash is a question of economics, not just of addition. What tasks a user wishes to perform is beyond the scope of this paper. Some people may be quite happy not knowing exactly how much cash they have; they do not need to use addition (adding up coins), they just shake and listen to the piggy bank, or use some other heuristic to check they have enough to live by. Although action graphs give precise answers to certain HCI questions, indeed questions that previously were impractical to address in their full generality, they do not address all HCI concerns. They are another tool for the toolbox, not a replacement toolbox. Not everybody uses cash; how does addition work with cheques, credit cards, shares, banknotes and other forms of money? A natural question is to ask whether action graphs can handle continuous systems such as virtual reality, speech, action games, and so forth. This question is rather like saying, “I can see how addition works with coins, but how does it work with paper money?” (The answer is, you first need to be able to convert arabic numerals into numbers.) Yes, action graphs can handle continuous systems; you first need to decide an abstraction that ensures the action graphs measure the values of interest, and just like converting the text on a banknote into a value, one will need to convert the duration of (say) a music track into a number. How that is done is an issue beyond action graphs, but once obtained, the numbers then plug into action graphs and analysis can proceed exactly as described in this paper. In fact, it is unlikely that action graphs will help much with sharp usability issues here—does the length of a track affect the usability of a music player?—but the music industry might wish to use action graphs to model costs and profits obtainable over the period while the user is downloading and listening to a track. Finally, we often want to know the value of a pile of cash, and it is natural to add up its value to find out—addition is obviously useful. The question is, for any mathematical technique, does it tell us things we did not or could not know without it? Finally, are action graphs worth the effort? They are a new, simple technique that answer certain HCI questions; in that sense they are another contribution to the HCI literature. More specifically, we started analysing large interactive systems we thought that cross over, which is defined below, would be a problem—it is a design issue where satisficing users may choose unexpected strategies. 1 (In fact, action graphs were invented to handle cross over.) It turns out that for all devices we have now analysed, cross over is not a significant factor in estimating task times. This is quite a surprise, and ironically suggests that estimates of task times for these types of device can be obtained without action graphs! The preceding comments have hopefully made the philosophical orientation easier to understand, but the comparison with something so mundane might accidentally make the approach seem equally trivial. In fact, the methodology used in this paper spans disciplines, drawing them closer. We develop some theory and analyse systems, which is superficially like and broadly similar to cognitive modelling—distinctively, cognitive modelling is usually completed with an empirical evaluation, but our approach does not rely on direct human-based evaluation, though some of our analysis relies on published results from empirical experiments. The literature on cognitive modelling cannot be briefly summarised, but see Card et al. (1983), Gray et al. (1993), Grossman et al. (2007), Kieras and Meyer (2000), Kieras et al. (1997), St. Amant and Horton (2007), Meyer et al. (1988), and Matessa et al. (2003). Research methodology in HCI owes much to the conception of science stemming from Francis Bacon (and his ideas as refined particularly by John Stuart Mill) and is empirical: put briefly, since we do not know adequate theories a priori, we should explore the world inductively to determine them. In contrast, Isaac Newton's innovation was to start with simple assumptions, explore the mathematical consequences, then turn to real conditions ( Cohen et al., 2008). If you start from the world, as Bacon recommends, you perhaps never achieve clarity, whereas with Newton's approach, you start with clarity then determine how applicable it is. Following Newton's style, then, the methodology of this paper is to start with mathematics with explicit assumptions, and then to explore the consequences of those assumptions. Real case studies (see Section 5.1 for the main case study, and additional case studies provided in Appendix B) show the value of the approach, but the approach can be applied far more widely. Obviously, while necessary this alone is not sufficient for a useful contribution; therefore, we also argue that the results we achieve are unexpected and insightful. Inevitably, Baconian science is driven by what is easy to measure. In an empirical experiment time is easy to measure, but from a system perspective keystroke count is easy. The differences in these practical considerations should encourage research on the tradeoffs between the various approaches. For example, in many contexts time is crucial, but in many others low error rate is crucial. Almost certainly, reducing keystroke counts will have a better impact on overall error rate (e.g., if the probability of error per keystroke is p , then the probability of an error-free sequence of n keystrokes is (1−p)n(1−p)n; this is exponential with n, and therefore reducing n is indicated to reduce error rate); conversely, requiring users to work faster (reducing time) may increase error rates. Now from a system perspective, keystroke count is easy, even trivial, to measure, but this is not sufficient for many purposes—we need new methods to broaden the scope and impact of system-based theories. 1.2. HCI background Newell and Card (1985, p. 237) said “striving to develop a theory that does task analysis by calculation is the key to hardening the science [of HCI],” and writing a decade later MacKenzie (1995) anticipated a future scenario: “something like this: A user interface is patched together in story-board fashion with a series of screens (with their associated soft buttons, pull-down menus, icons, etc.) and interconnecting links. The designer puts the embedded model into “analyse mode” and “works” the interface—positioning, drawing, selecting, and “doing” a series of typical operations. When finished, the embedded model furnishes a predicted minimum performance time for the tasks (coincident with a nominal or programmable error rate). The designer moves, changes, or scales objects and screens and requests a reanalysis of the same task.” Systems like CogTool (John and Salvucci, 2005) are already a great help for designers working from storyboards, but (to date) they only evaluate specific, sequential tasks composed of relatively few steps. This paper will show how to predict optimal times a skilled user would not be able to do better than for any or all tasks, or from benchmark collections of tasks, composed of any number of choices and steps—all without the designer having to patch a story-board together or “work” the user interface as MacKenzie envisaged. Of course it remains possible to obtain estimated times for particular sequences of user actions (e.g., from story-boarded sequences) if desired. The importance of “automatically” becomes apparent when analyzing devices with thousands or more states: there are then millions of potential tasks. Card et al.'s (1983) classic The Psychology of Human–Computer Interaction argues that reducing expert time is a key principle of user interface design. Expert users often want “short cuts” such as special keystroke combinations that save work, presumably to save time as much as to reduce the number of actions they have to do. Projects such as Ernestine were driven by the conviction that “time is money” and that it was worth redesigning user interfaces to make them faster to use (Gray et al., 1993). There is considerable evidence that users optimize timings (e.g., Appert et al., 2004 and Gray and Boehm-Davis, 2000), and eventually will treat optimal or nearly optimal interaction as routine. Howes et al. (2009) give evidence that optimal time is a predictor of actual skilled performance time: people are adaptive, and with practice they improve. (Bailey et al., 2009 provide a review of usability testing and high-impact metrics.) In safety critical domains, conventional empirical experiments cannot cover all features of devices even of modest complexity; usability inevitably gets relegated to “look and feel” or focuses on a few tasks. Thorough empirical exploration is not possible except for the most trivial of devices. Although action graphs are only a start, more development in analytical approaches is needed to extend the scope of HCI further into systematic analysis, particularly when there is a requirement to do so, as in safety critical domains. Conventional user evaluation is costly (to pay human participants, buy laboratory time, and to manage the experiments) and must be performed later in the design cycle, after a prototype system has been made available. At this stage, insights are less likely to be fed back into the design: many decisions have already been made, and if the system works well enough to evaluate it, why not ship it? Indeed, production pressures typically mean that companies ignore poor usability provided that systems appear good enough to be shipped. In many environments, then, improving usability has negligible priority after a system “works,” for when a system appears functional it is unlikely to be revised even if revision could achieve usability gains. As soon as a specification of an interactive system is available, or as soon as program code is written, a system model can be obtained (Thimbleby and Oladimeji, 2009 and Gimblett and Thimbleby, 2010) that can be used to generate action graphs automatically—this approach is extremely useful in an iterative design process, since the model can be continually regenerated for analysis as the design is modified. Thus, the approach lends itself to predictive analysis, which can have a significant influence on a design because it can be used earlier, cheaper, faster and more often, and at a design phase when improvements are easier to implement. There is a great need for quantitative predictions about user performance with designs well before actual experiments with users can be contemplated. This is the key point: predictive estimates of low bounds on time are relevant to design for or to analyze skilled behavior, for skilled behavior cannot do better than the theoretical low bounds. Other research based on this premise includes Pirolli (2007), and Gray and Boehm-Davis (2000). Illustrating these issues, in fact, most relevant published work to date—including Kieras et al. (1997), Appert et al. (2004), and John and Salvucci (2005)—is based on analyzing manually predefined scenarios: that is, given a particular sequence of user actions, estimate the time a user would take to achieve a specific goal. Menu selection ( Cockburn et al., 2007 and St. Amant and Horton, 2007) is a special case where the goal is to make a selection, and where each selection has only one way to make it. Petri nets have been used, but most published papers (e.g., Lacaze et al., 2002) only show single-step times, not times for arbitrarily long sequences of actions that this paper handles, though some papers (e.g., St. Amant and Horton, 2007) explore linear sequences of actions. In all cases the system modelling seems to be limited by the difficulty of precise manual analysis; for example, St. Amant and Horton (2007) note that system features, which they ignore, such as short-cuts, would complicate their analysis. We have no such problems in this paper, because our approach is fast, general and completely automatic. 1.3. Action graphs versus KLM and CogTool Researchers using action graphs or methods such as KLM (Card et al., 1980 and John and Salvucci, 2005) may do the same sorts of things, so it is natural to make a comparison between the approaches. KLM is usually a manual technique for estimating task times from user behavior, keystrokes, mouse movements and mental operations. It relies on having a task breakdown. Action graphs can provide this task breakdown for any or all tasks a system supports; action graphs allow KLM (or any related analysis approach) to be automated, and allow KLM to be applied without manual effort. In particular, in areas where coverage is required (e.g., for safety critical interactive systems), action graphs allow every task (perhaps millions of tasks) to be analysed automatically for any device. Previously, this has not been possible except, perhaps, in very limited contexts. CogTool is an interactive tool (with a graphical user interface) with a much more sophisticated underlying model than KLM; it is much easier to use and more accurate. CogTool allows researchers, system designers, usability professionals, to build a story board of a proposed or actual system, and then run a sophisticated psychological model (using ACT/R) on it. A researcher thus obtains realistic estimates of task times (along with breakdowns) from CogTool. ACT/R is a very complex program (because it is a very realistic human performance model), and CogTool uses it as a black box. Action graphs are a theoretical model, very similar to finite state machines. They allow interactive systems to be implemented and analysed, with the advantage over finite state machines that they directly support analysis of sequences of user actions. CogTool is open source and runs on commercial Macintosh and PC platforms. Action graphs are theoretical and completely described in the present paper; they are therefore “open source” for all practical purposes. CogTool is quite a complex system, but the CogTool web site (cogtool.hcii.cs.cmu.edu) provides substantial documentation, downloads, and access to the CogTool user community. In a sense action graphs are simple and elegant, but unfortunately they rely on multidisciplinary knowledge, graph theory, algorithms and HCI, so although they are “simple” they have a comparable learning curve to CogTool. In contrast, the present paper is the only documentation on action graphs. An interesting contrast between CogTool and action graphs is that you have to understand CogTool to use it, but action graphs could be used inside an HCI analysis program without the user of that program knowing anything about action graphs: action graphs are a means to an end, not an end in itself. CogTool could use action graphs as a means of implementing story boards and supporting ACT/R (in the present paper we use action graphs with Fitts Law, but any measure, for instance provided by ACT/R could be used). In fact, CogTool effectively implements a single path through an action graph, as the sequence of ACT/R-annotated actions a story board represents. Thus CogTool analyzes single paths through story boards, whereas action graphs are a natural representation to explore all or any subset of paths, including a single path. Since action graphs allow automatic analysis of all paths a user might take using a system, they can be used to support analysis of safety critical systems, where coverage (i.e., checking every feature) is essential. KLM and CogTool cannot do this, though if either KLM or CogTool was implemented using action graphs, it would become feasible to explore alternative user strategies, optimal behavior as well as user error. Since CogTool relies on building a story board by hand, it is impractical to analyze many design alternatives; the story boards tend to be very small in comparison with action graphs, which have no real practical limitations on size. On the other hand, the story board is a natural, visual representation of interaction, and this approach makes CogTool very appealing to its user community. Because of the underlying ACT/R model, the analysis of the single story board is thorough and insightful, though exploring alternatives (and keeping track of them) is tedious. Using CogTool seriously in iterative design would be burdensome: as changes to a design are made, the story boards need to be revised and this will unavoidably create a version control problem with the requirements or specification of the target system. In contrast, action graphs are used to specify a system, and how that system is originated is outside their scope. A story board would only give one (or possibly a few) alternative paths, and this would not be sufficient. In the present paper, complete system models are automatically derived from running programs using discovery (Thimbleby and Oladimeji, 2009 and Gimblett and Thimbleby, 2010), though one could equally obtain system models from specifications (written in any of the many formal specification languages that generate FSMs or BDDs). 1.4. Notations and the role of appendices The body of this paper assumes a breadth of knowledge covering the Fitts Law, graph theory, lower and upper bounds, order notation, and algorithms. While the ideas may particularly inspire HCI researchers, the paper is also likely to be read in depth by programmers implementing tools based on the ideas. Unfortunately there is a confusing variety of assumptions and notations used in wider literature, so some short appendices have been provided to supply a coherent summary of and short introduction to the standard notations and concepts used in this paper. These brief appendices also provide references for further reading on the topics. Appendix A.1 Fitts Law; brief introduction and details of parameters used in the present paper. Appendix A.2 Upper and lower bounds, including fastman/slowman and bracketing. Appendix A.3 Order notation, including ΩΩ notation. Appendix A.4 Graph theory and state machines. Appendix A.5 Least cost algorithms, and why action graphs are needed for user performance analysis. Table options The paper develops a theory, then applies it to explore some real case studies. The main case study is presented in the body of the paper, but several other case studies are provided in Appendix B, primarily to support the argument that the main case study has the properties ascribed to it because it is typical, rather than arising by chance or (worse!) by contrivance or special selection. (We also vary the case studies to explore some more extreme keyboard layouts, see Appendix B.2.) While good conventional HCI experiments take care to control for variability in human users, we are unaware of other HCI experiments that similarly try to manage variability in device design; the space of device design is largely unexplored territory. Appendix C expands potential critiques of the case study experiments, details that would perhaps have been too technical or too distracting within the body of the paper (which already has a substantial further work section, Section 6), as well as exploring some further thought experiments. Appendix B Additional case studies. Appendix C Additional further work and critiques of basic results. Table options 1.5. Graph theory device models and notations The theory developed in this paper will generally be embedded in a tool, such as CogTool (John and Salvucci, 2005), so a typical user (e.g., an HCI professional) need not know any technical details. However in this paper, we need to develop and justify the approach. Readers unfamiliar with graph theory notation may wish to refer to Appendix A.4. We represent an interactive device as a graph: a set of vertices VV (states), a set of user actions AA, and a transition relation T⊆V×A×VT⊆V×A×V. It is suggestive to represent elements 〈u,a,v〉〈u,a,v〉 of the graph by View the MathML sourceu→av. In words, the notation View the MathML sourceu→av means that if the device is in state u and the user does action a , the device will transition to state v . In graph theoretic terminology, u→vu→v is an arc and a its label. A sequence of transitions View the MathML sourceu→av, then View the MathML sourcev→bw… is more concisely represented by View the MathML sourceu→av→bw… When we are not concerned with the details of the intermediate steps a,b,…a,b,… (what actions are, what intermediate states are visited, and how many states are visited), we use the notation u⇝wu⇝w. Actions AA define names and the geometry of targets (i.e., physical details of the button, its name, shape and location) to perform those actions. For systems with timeouts (like “reset if user does nothing for 10 s” or “hold button for 2 s”) actions in AA define the appropriate timings. The model allows for soft keys and touch screens that can display changing, moving, or expanding targets for the user to press or mouse click on; AA is enlarged accordingly to accommodate each variation of input actions, simply by having a distinct action a∈Aa∈A for each unique user action. Thus if the “same” button can appear in different places (a common strategy to stop users habituating, and, say, clicking OK without checking a warning), we still need each place to have a separate action for our analysis. We use the following notation for properties of sequences of actions, σσ: |σ|#|σ|# the number of user actions; thus if A,B,CA,B,C are actions then |ABC|#=3|ABC|#=3 ⌊σ⌋T⌊σ⌋T the lower bound time to perform the actions σ1≡σ2σ1≡σ2 if the two sequences of actions have the same effect on the system. Table options In this paper, the system model MM and the initial state s i will be readily understood from the context; the standard notation M,si⊨formulaM,si⊨formula would be used in more formal presentations.
نتیجه گیری انگلیسی
A skilled user's performance is limited by the optimal bounds on user performance, as determined by the device design. Usability depends on efficient use of interactive systems, and to design efficient systems requires analysis or evaluation of the time complexity of the designs with due considerations of relevant trade-offs, such as error rate. This paper introduced action graphs and gave a theory and algorithm for obtaining lower bounds on task times. The work is placed within a standard mathematical framework, using graph theory. Once the framework is implemented (by combining this paper with standard algorithms), it has essentially no overhead in use, whereas empirical approaches always take organisation and participant time, so the marginal cost of analysis is negligible—in particular, during iterative design analyzes can easily be performed repeatedly to explore variations. Models for lower bounds on time and count give different sequences of user actions to achieve the same tasks and thus need not necessarily be correlated, for the reasons discussed in Section 2. We showed that when operating a pushbutton device, the best possible user time may not be the obvious View the MathML sourcet=Ω(lowerboundoncount). Turn MathJax on We first illustrated the idea with a simple, illustrative, device (Section 2), but for the various real devices studied, even though they have cross overs, the following linear relation applies View the MathML sourcelowboundusertime≈0.2×(lowerboundbuttoncount)s Turn MathJax on We conclude it may not be worth estimating times (unless the application requires exact times) when lower bounds on button counts are easier to measure. Indeed, lower bounds on button counts for any task can be calculated with no uncertainty: the measures are objective and do not depend on particular users, training, or other experimental variables. Lower bounds on button counts can be used to optimize user interface design for skilled use. The caveat of course is that, for other domains than panel user interfaces, this result should first be checked—and this can be done using action graphs. Of course, empirical work can still calibrate an analysis, and could do so incrementally over a period of time, replacing calculated values of f with actual user times for those transitions (which would of course then include thinking time, and other delays that KLM, GOMS, CogTool, etc., estimate), but, typically, absolute time differences are of no more value in design trade-off considerations than relative improvements, for which no calibration is necessary. Further, we explored alternative button layouts. The erroneous intuition that button layout must have a large effect on task performance is possibly because we over-rate button location or we over-rate a few salient tasks, ignoring the larger number of alternative tasks that are affected in compensating ways by changing layout. Overall, the layout does not seem to matter much for a balanced portfolio of tasks. An advantage of a rigorous approach such as this paper followed is that many assumptions are made explicit, and therefore now beg to be explored in further research. It is unsurprising, then, that this paper raises numerous points that deserve further examination; Section 6 (which continues in Appendix C) presents selected ideas for future work that build on the ideas and results presented here in the main paper.