آیا روند دلفی منجر به افزایش دقت در پیش بینی های قضاوتی مبتنی بر گروه می شود و یا آن را به سادگی وادار به اتفاق نظر در میان پیش بینی کنندگان قضاوتی می کند؟
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
1029 | 2011 | 10 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Technological Forecasting and Social Change, Volume 78, Issue 9, November 2011, Pages 1671–1680
چکیده انگلیسی
We investigate the relative impact of internal Delphi process factors – including panelists' degree of confidence, expertise, majority/minority positioning – and an external factor, richness of feedback – on opinion change and subsequent accuracy of judgmental forecasts. We found that panelists who had low confidence in their judgmental forecast and/or who were in a minority were more likely to change their opinion than those who were more confident and/or in a majority. The addition of rationales, or reasons, to the numeric feedback had little impact upon panelists' final forecasts, despite the quality of panelists' rationales being significantly positively correlated with accurate forecasts and thus of potential use to aid forecast improvement over Delphi rounds. Rather, the effect of rationales was similar to that of confidence: to pull panelists towards the majority opinion regardless of its correctness. We conclude that majority opinion is the strongest influence on panelists' opinion change in both the ‘standard’ Delphi, and Delphi-with-reasons. We make some suggestions for improved variants of the Delphi-with-reasons technique that should help reduce majority influence and thereby permit reasoned arguments to exert their proper pull on opinion change, resulting in forecast accuracy improvements over Delphi rounds.
مقدمه انگلیسی
As can be seen from the papers in this Special Issue, the Delphi technique is quite frequently applied to a range of judgment problems with the expectation that judgment accuracy will be improved – relative to unstructured group judgment. However, while research supports the general conclusion that Delphi usually leads to opinion change in the direction of greater accuracy, increased accuracy is by no means guaranteed. In their review of the literature evaluating the Delphi technique, Rowe and Wright [1] found that Delphi outperformed both the statistical average of group members (by twelve studies to two with two ties) and standard interacting groups (by five studies to one with two ties) in terms of accuracy. Rowe, Wright and Bolger [2] attributed the equivocal nature of the obtained results to the highly variable formats and implementations of the Delphi technique. The effectiveness of the Delphi technique thus appears to depend on the particular nature of its implementation – such as the number of iterations, type of feedback and size and constitution of groups. Further, the nature of the Delphi implementation interacts with the type of task that it is applied to and the characteristics of the group members (e.g. their expertise and confidence). In order to get the best improvement in judgment accuracy from applying Delphi, it would be advantageous to know the answers to these questions: what implementation of the technique works best, and under what conditions? To develop the answers, it seems essential to have the guidance of a theoretical framework, rather than blindly trying different forms of Delphi in a variety of settings. Rowe and Wright [3] conceptualized both judgment change and changes in judgmental accuracy as coming about through the action of both (i) internal Delphi process factors related to the individual panelist (e.g., the degree of expertise possessed by individual panelists, and (ii) each panelist's degree of confidence in a judgment) and external factors related to the particular application of the Delphi technique ( e.g., the nature of feedback given to panelists between Delphi rounds and the nature of the task – either ‘intellective’ or ‘judgmental’). Intellective tasks are those in which deduction of the already-existing truth is the focus of attention, for example. This is often the paradigm in laboratory-based evaluations of Delphi. In such settings, judgmental accuracy is easily measured. By contrast, judgmental tasks often involve forecasting – here, the judgmental forecaster can only explain and defend judgments, since the outcome has not yet occurred. Such settings are often those found in real-world Delphi applications. Wright and Rowe [4] provide a discussion of these task taxonomies. To our knowledge, only two research studies of Delphi have been conducted within Rowe and Wright's framework: Rowe and Wright [3] and Rowe, Wright and McColl [5]. These two studies both used short-term forecasting tasks and focused on investigating the effects of increasing the richness of feedback on accuracy improvement over Delphi rounds. In many real-world applications of Delphi, the feedback procedure consists, solely, of reporting the numerical means and ranges of panelists' opinions. In the two research studies, feedback was enriched by asking judges to supply a written rationale for each judgment and, in addition, their confidence in that judgment. These studies looked also at the effects of expertise and confidence on opinion change – with the expectation that confidence and expertise would be positively related with each other and negatively related with tendency to revise opinion. Rowe et al. [5] also analyzed whether opinion change was greater when more people supported the alternative position from that chosen by the focal panelist (panelists had to predict one of two alternative outcomes). The two studies produced conflicting findings regarding the effectiveness of providing rationales for judgments. Rowe and Wright [3] found more improvement when arguments were given, as predicted, but Rowe, Wright and McColl [5] found no advantage in eliciting and then exchanging rationales between panelists. However, a post-task examination of participants' written rationales in the latter study revealed the use of a large number of rationales that simply repeated opinions (e.g., “I think it is more than likely…”) rather than causal arguments (e.g., “I think that this is more likely because…”), thus it may have been the case that most panelists were unable to provide persuasive rationales for their forecasts in this study. The two studies did, however, produce similar findings regarding the other variables: expertise and confidence were both found to be unrelated to opinion change and there was no relationship between confidence and expertise (where expertise was measured either objectively, in terms of proportion correct, or by self report). Studies on advice taking surveyed by Bonaccio and Dalal [6] provide an indication that recommendations from an advisor to a decision maker (or judge, in these authors' terms) carry more weight if accompanied by rationales. In the, so-called, judge advisor system studies (JAS), participants are typically randomly allocated to perform the roles of either judge or advisor on a decision task. After forming a view or opinion on an issue, the judge is next presented with the views of advisors. Many JAS studies have found a bias toward favoring one's own initial opinion as judge over that of the opinions of advisors. According to Yaniv [7], this bias arises because judges have access to the full rationale underpinning their own opinion or decision and have only incomplete knowledge of advisors' rationales. In contrast, Kruger [8] contends that this bias is due to egocentricity – where a judge adheres to a default belief in the inherent superiority of his or her own judgment. In short, the extent to which discounting of advice arises because judges have access to the full rationale as opposed to egocentric bias is still an open research question. In the current study, we compare the effects of Australia-sourced Delphi panelists exchanging rationales for their predictions relative to a standard-practice Delphi setup – which does not entail the exchange of rationales or reasons. We do this by choosing a judgmental prediction task in which many of our participants have a strong day-to-day interest – football match outcomes. According to Alomes [9], Australian Rules Football acts as a social glue in Australia – cutting across many social and gender divides and enticing many segments of the community to know a great deal about the competition. Most work places and secondary schools in the southern Australian state of Victoria have tipping competitions and a great deal of media attention is devoted to discussing forthcoming matches and analyzing the results of past matches. In this context, a number of participants are likely to be ‘expert’ in the judgmental prediction task that we utilized (at least relative to the rather heterogeneous set of events used by Rowe et al. [5]) and some of the panelists should, in principle, be able to provide convincing rationales for judgmental predictions of match outcomes (e.g., based on past and current form, availability of key players, home ground advantage, etc.).1 In addition, we wished to investigate, in greater detail, the influence of internal Delphi process factors that might operate on opinion change. To this end, we calculated some additional measures of support for a particular predicted match outcome (tip): whether a panelist was in a minority or majority position; whether average confidence was greater for the individual panelist's tip or for the alternative tip; and, where appropriate, whether the rationale for a particular panelist's tip was stronger (and more similar, i.e. showing greater consensus with the rationales preferred by other panelists) than for the alternative tip. These new variables, plus confidence in each tip and objective expertise, were then subjected to analyses – not performed in the earlier studies – that utilized logistic regression to determine the relationship between these variables and propensity to change opinion.
نتیجه گیری انگلیسی
In this study, we found that opinion change across Delphi rounds was low, particularly in the numeric-only feedback treatment compared to the numeric feedback-plus-rationales treatment. Nevertheless, we found, using logistic regression, that in both treatments, those panelists who had low confidence in their tip and/or who were in a minority were more likely to change their opinion than those panelists who were more confident and/or in a majority. The addition of rationales to the numeric feedback had little impact upon opinion change toward the “correctly” tipped team but did tend to pull panelists towards the majority opinion, even though the quality of other panelists' rationales was significantly positively correlated with the more valid tip. We infer that the provision of high-quality rationales, although a strong indicator of the validity of proffered opinion, was overshadowed by the provision of many low-quality rationales – recall that 52% of rationales were rated to be of the lowest quality. Clearly, opinion change in a panelist was most influenced by majority opinion – if the panelist had low confidence in his/her own tip – than by the quality of others' opinions, even when quality could be appropriately inferred by evaluating the proffered rationale for a particular tip. The current set of results contrast, to some degree, with the earlier process studies of Delphi. Rowe and Wright [3] found greater improvement in accuracy over Delphi rounds when feedback between rounds included other panelists' rationales rather than solely numeric feedback. By contrast, Rowe et al., [5] found that the inclusion of rationales was ineffectual in promoting more appropriate opinion change. Rowe et al.'s [5] strongest result was of the persuasive influence (for better and for worse) of majority opinion on individual panelists' opinion change and subsequent validity. The current study thus confirms and extends Rowe et al.'s results: convergence of opinion over Delphi rounds does not necessarily imply a linked improvement in forecasting accuracy. Our result is also in congruence with the weight of general research on group processes (i.e. including, but not restricted to, Delphi). Kerr & Tindale's [12] review of group-based judgmental forecasting focuses on how to aggregate individual opinions to achieve an accurate group-based judgment. These authors distinguish between the use of judgment in intellective tasks and in judgmental forecasting. Recall that, in intellective tasks, the deduction of the (already-existing) truth is the focus of attention, whereas in judgmental forecasting the forecasters can only explain and defend their judgments, since the forecast outcome has not yet occurred. Kerr and Tindale's review concludes that pre-existing majority opinions will generally determine group consensus decisions in judgmental forecasting tasks, arguing that only in intellective tasks, where those group members who favor the correct answer can explain or demonstrate why they are correct, is a correct minority likely to be persuasive. Note that information about the opinions of other group members in the studies reviewed by Kerr and Tindale ranged from anonymous statistical feedback to face-to-face discussion. Because of the disproportionate power of majority opinion to determine the subsequent group position in judgmental forecasting tasks, Kerr and Tindale argued that only those aggregation methods that facilitate information exchange between group members are likely to be beneficial over-and-above a statistical averaging of prior opinions – since information exchange provides the enabling conditions for group members to recognize errors in justifications of judgments. Our current study provided enabling conditions for such recognition of errors in reasoning but we found that majority opinion (both correct and incorrect) was, once again, the strongest influence on panelists' opinion change. Arguably, this result may be a consequence of our failure to find either a task, or panelists, that facilitated the generation of strong argumentation – in order to allow either (i) the recognition of errors in one's own reasoning, or (ii) the recognition of better-quality reasoning in the justifications of other panelists' opinions. Such a recognition could, in principle, then be subsequently utilized to change one's own opinion towards greater accuracy. Importantly, Rowe et al. [5] also appeared not to find the “right” task and/or participants. It follows that further studies need to be conducted, with other tasks and participants, in order to ascertain the prevalence, in practice, of tasks where significant accuracy improvement can be realised by eliciting and feeding back panelists' rationales. Alternatively, it may be that in many real-world Delphi applications, panelists cannot generate good arguments for their proffered opinions. In such instances, majority opinion may have a deleterious impact on opinion change – and the linked convergence of panelists' opinions – over Delphi rounds. It is important that future research studies should aim to see if additional ‘enabling conditions’ can be created to allow good arguments to overcome the pull of the majority – and thus stimulate opinion change towards increased accuracy. We believe that the best cue to expertise, and therefore to the best predictions, is the presence of strong arguments. We also believe that panelists are able to evaluate the quality of argumentation. It follows that group-based forecasting will, in principle, be improved solely by the exchange of reasoned argumentation and that all other influences on opinion change – such as dominance, confidence, and majority consensus – should be eliminated in the feedback exchanged between panelists over Delphi rounds. To date, standard Delphi applications entail the procedural inclusion of anonymity in the Delphi process (which acts to remove bias due to factors such as individual panelist dominance – which can lead to ‘process loss’ in interacting groups). On the basis of our current analysis, other process improvements to Delphi practice are now warranted. We advocate that Delphi applications should include the exchange the exchange of panelists' rationales and argumentation but, in addition, advocate that: − Argumentation that does not explicate clear causation should be eliminated by the Delphi process moderator. − Similar or duplicate argumentation – i.e., that generated by more than one panelist – should be combined together such that the argumentation generated by either a single panelist, a minority of panelists, or a majority of panelists, cannot be identified as such. −Confidence in panelists' predictions should not be elicited or exchanged between panel members, due to the poor relationship between confidence and expertise. However, utilization of such changes and additions to practice-based Delphi applications need to be systematically tested in future research studies using, we suggest, the analytic methods employed in the present study. In conclusion, we note that many studies have demonstrated convergence in panelists' opinions, over rounds, in real-world applications of the Delphi method. But very few studies have evaluated the validity of such convergence of opinion – see [1] for a review of the extant studies. Given the mixed pattern of evidence that has now emerged across the few process-orientated studies that have investigated the Delphi technique – only the current study and [3] and [5] – it follows that further investigation is now necessary in order to evaluate the validity of Delphi as a group-based method for aggregating individual opinions and judgments. Delphi is very widely applied in practice, as the range of papers included in this Special Issue of Technological Forecasting and Social Change amply illustrates, but the question of whether the standard Delphi feedback procedure – consisting, solely, of reporting the numerical means and ranges of panelists' opinions – actually improves judgmental accuracy is now debatable.