اثر فرمت های ارائه مکمل جایگزین بر قضاوت های کارت امتیازی متوازن
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|332||2005||18 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : International Journal of Accounting Information Systems, Volume 6, Issue 3, September 2005, Pages 159–176
Using the Balanced Scorecard (BSC) to evaluate performance is a complex judgment task involving a large set of performance measures. Commercially available BSC software provides capabilities for providing decision guidance to the user through the design of supplementary information displays. This study compares the judgment performance of decision makers who received BSC data for two divisions of a simulated company in one of three formats: (1) individual BSCs for each division, (2) individual divisional BSCs supplemented by a side-by-side tabular summary of each division's performance, and (3) individual divisional BSCs supplemented by a side-by-side graphical comparison of each division's performance. Providing supplemental tabular displays did not improve judgment consensus, relative to viewing separate displays, but did improve consistency between performance evaluation and bonus allocation decisions. Consensus for participants using supplemental graphical displays was lower than that for participants viewing separate displays for each division. Further, participants given supplemental graphical displays exhibited lower judgment consensus and lower consistency between performance evaluation and bonus allocation decisions than those using supplemental tables. Inconsistency and lack of consensus are likely to create perceptions of unfairness, thereby increasing dissatisfaction with and resistance to continued use of the BSC. Thus, our findings indicate the need for care when designing and implementing BSC decision aids.
Many organizations have implemented the Balanced Scorecard (BSC) both to measure performance and as a tool to better implement and monitor strategy (Frigo and Krumwiede, 2000 and Silk, 1998). Prior research investigating use of the Balanced Scorecard (BSC) to evaluate performance and allocate compensation (Banker et al., 2004, Dilla and Steinbart, 2005, Lipe and Salterio, 2000 and Lipe and Salterio, 2002), however, indicates that decision makers do not fully utilize all the measures included in the BSC when making decisions. Failure to fully attend to all the data in a BSC may be due, in part, to its complex nature: typically, a BSC contains from 4–7 measures in each of four categories. This suggests the potential for creating decision aids that improve the quality of judgments made using the BSC. One way to provide decision guidance in complex judgment tasks is to alter or supplement the manner in which information is displayed (Silver, 1990 and Silver, 1991). Such informative guidance can improve decision making by making it easier to attend to relevant portions of the information set. Commercially available BSC software, such as the QPR ScoreCard (QPR, 2004) provides tools for creating a variety of supplementary displays of BSC data. A recent study by Banker et al. (2004) found that supplementing BSCs with strategy maps showing how various measures relate to different aspects of the organization's strategy increases the attention paid to those measures when evaluating divisional performance. This suggests that supplementary information displays may indeed be effective decision aids for BSC users. The supplemental display tools incorporated in commercially available BSC software (e.g., QPR, 2004) allow the user to easily create a variety of tabular and graphical displays. Although this capability is a featured selling-point of BSC software, research has not examined the effect of supplementary display formats on decision quality. This study therefore investigates whether providing alternative supplementary displays improves judgments made using the BSC. It also investigates whether presenting such displays in tabular or graphical format affects judgment quality. We examine two measures of judgment quality: consistency between individual performance evaluation and compensation decisions and consensus among users' performance evaluation decisions. It is especially important to consider consistency and consensus as dimensions of decision quality because of their effect on perceptions. Compensation judgments that are inconsistent with performance evaluations or performance evaluations that lack consensus may be viewed as unfair and arbitrary. Managers react negatively to use of the BSC as a performance evaluation tool if they perceive that the evaluation process is subjective and unfair (Malina and Selto, 2001). Indeed, Ittner et al. (2003) found that dissatisfaction with the manner in which the BSC affected compensation led one major financial institution to quit using it for that purpose. Improving BSC decision quality through supplemental information displays may thus affect the effectiveness and acceptance of the BSC as an evaluation and strategic management tool. Providing supplementary information displays may improve judgment consistency and consensus by making it easier to compare individual BSC measures across divisions. Users provided with only individual BSCs for each division may only attend to a subset of measures. This failure to consider the entire set of relevant measures may reduce consistency between performance evaluation and compensation decisions, because decision makers may attend to different subsets of information when performing each task. Also, using different subsets of information may reduce between-user judgment consensus. By making it easier for decision makers to attend to all of the measures in a BSC instead of varying subsets, supplemental displays should improve both individual judgment consistency and between-user judgment consensus. The format of supplementary information displays may also affect judgment quality. Previous information display research suggests that the benefits of graphs versus tables depend on the nature of the task. Cognitive fit theory (Umanath and Vessey, 1994, Vessey, 1991 and Vessey and Galletta, 1991) indicates that graphs are more useful for tasks that require identifying and understanding relationships and for making comparisons (i.e., spatial tasks), while tables are more useful for tasks that require extracting specific values and combining them into an overall judgment (i.e., symbolic tasks). Using the BSC to evaluate divisional performance and make bonus recommendations is a complex decision involving both types of tasks. It is therefore unclear whether supplemental tabular or graphical displays of BSC data will better aid decision makers. Our results indicate that supplemental display format is important. Compared to individual divisional BSC displays, supplemental tabular displays improved consistency between performance evaluation and bonus allocation decisions, but did not affect consensus. In contrast, supplemental graphical displays decreased judgment consensus, but did not affect consistency. Also, both judgment consistency and consensus were higher with supplemental tabular than with graphical displays. The remainder of this paper is organized as follows. The next section presents the background and theoretical development of our hypotheses. 3 and 4 describe the experimental method and present our results. The final section summarizes and discusses the implications of our findings.
نتیجه گیری انگلیسی
H1A and H2A predict that decision makers who receive supplementary comparative tabular or graphical displays of BSC data will exhibit greater judgment consensus than individuals who are given only separate BSCs for each division. H3A tests whether consensus will differ depending upon the format (graphs versus tables) used to display supplementary BSC information. To test these hypotheses, we first performed an ANOVA with pairwise consensus as the dependent variable, and then performed specific comparisons between experimental conditions. Table 4 presents the results of this analysis.Panel A of Table 4 shows that the information display factor has an overall significant effect on consensus (F(2, 120) = 3.53; p = .03). Panel B of Table 4 shows mean consensus by information display condition. Planned comparisons show no significant difference in mean pairwise consensus between participants given supplemental tables and those viewing divisional BSCs only (p = .58). Thus, H1A is not supported. Inspection of the cell means shows that participants given supplemental graphs exhibited lower consensus than did unaided decision makers. Post hoc analysis confirms that this difference is marginally significant (p = .08). Thus, supplemental graphs affected consensus in the opposite direction to that predicted in H2A. Participants receiving supplemental graphs exhibited lower mean pairwise consensus than participants provided supplemental tabular displays (p = .01). Thus H3A is rejected. Providing supplementary graphs reduces consensus relative to providing the same information in the form of supplementary tables. There was also a significant common by unique measures interaction (F(1, 120) = 28.86; p < .01) on pairwise consensus. Mean pairwise consensus was higher, on average, when one division scored higher than the other on both common and unique measures (mean across conditions = .629) than when one division had better common measures but the other division scored higher on the unique measures (mean across conditions = .373).3H1B and H2B predict that decision makers who receive supplementary comparative tabular or graphical displays of BSC data will exhibit greater consistency between their performance evaluation and bonus allocation decisions than individuals who are given only separate BSCs for each division. H3B addresses whether decision makers who receive supplementary comparative graphical displays of BSC data will exhibit a different level of judgment consistency than individuals who are given supplementary tabular displays. We performed two tests of these hypotheses. The first test uses a log–linear model to compare proportions of participants whose divisional evaluations were directionally consistent with their bonus allocations across information display conditions. The second test compares correlations between divisional evaluation differences and bonus allocations across information display conditions. Panel A of Table 5 displays proportions of consistent responses by experimental condition; Panel B of Table 5 displays correlations between divisional evaluation differences and bonus allocations across information display conditions. The overall display format main effect on the proportion of consistent responses was significant (χ2(df = 2) = 9.24; p = .01). The proportion of consistent responses was higher for participants given supplemental tables than for participants given only separate divisional BSCs (p = .01). The correlation between evaluation differences and bonuses was also higher with supplemental tables (p = .01). Together, these results indicate support for H1B. The proportion of consistent responses for participants given supplementary graphs was not significantly different from unaided participants (p = .86), nor was there a significant difference (p = .57) between the two groups in terms of the correlation between performance evaluation differences and bonus allocations. Thus, H2B is not supported. The proportion of consistent responses for participants given supplementary graphs was lower than that for participants who received supplemental tables (p = .003). The correlation between performance evaluation differences and bonus allocations was also lower for participants given supplementary graphs than for participants given supplementary tables (p = .002). Thus, the results indicate rejection of H3B: participants given supplementary graphs exhibited significantly lower consistency between their performance evaluation and bonus allocation decisions than did participants given supplementary tabular displays.The log–linear analysis of proportions of consistent responses also disclosed a marginally significant three-way interaction between information display, common measures, and unique measures (χ2(df = 2) = 5.44; p = .07). This interaction appears to be largely driven by the low proportion (.273) of consistent responses observed in the divisional BSC only condition for participants who were given a scenario where WorkWear performed better on common measures and RadWear performed better on unique measures. Results in this cell may have caused the overall proportion of consistent responses to be lower for participants receiving divisional BSCs only as opposed to those receiving supplemental tables. To determine the effect of this cell on the overall results, we performed a follow-up analysis where we deleted all participants who viewed scenarios where WorkWear performed better on common measures and RadWear performed better on unique measures. Judgment consistency did not differ between the divisional BSC only (.813) and supplemental tables (.882) conditions for the reduced data set (p = .44). Consistency remained marginally higher for supplemental tables (.882) than for supplemental graphs (.697) (p = .06). The results therefore suggest that the significant results observed for H1B and H3B are driven in part by judgment differences in the scenario where WorkWear performed better on common measures and RadWear performed better on unique measures.To test H1C, H2C and H3C, we conducted a 3 × 2 × 2 ANOVA with our task difficulty measure as the dependent variable. Information display format did not have a significant effect on perceived effort (F(2, 120) = .11; p = .90). Thus, neither H1C and H2C were supported and H3C was not rejected. A significant common by unique measures effect was observed, however (F(1, 120) = 10.86; p = .001). Participants rated the task as easier to perform on average in conditions where one division had higher scores than the other for both common and unique measures (mean across conditions = 2.86) than in conditions where one division had higher common measures and the other higher unique measures (mean across conditions = 1.99). These results may account for the earlier finding that consensus was higher when one division performed better than the other on all measures as opposed to one division performing better on common measures and the other performing better on unique measures. Increased difficulty in processing the latter pattern of data may have created more variability in judgments.