تغییر قضاوت در طول روش های شبه دلفی: نقش نفوذ اکثریت، تخصص و اعتماد
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|965||2005||23 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Technological Forecasting and Social Change, Volume 72, Issue 4, May 2005, Pages 377–399
This study investigates individual opinion change and judgmental accuracy in Delphi-like groups. Results reveal that the accuracy of judgmental probability forecasts increases over Delphi rounds (in terms of proportion correct and appropriateness of confidence) when statistical summaries or written rationales are provided from other members of an individual's nominal group, but does not increase in a control iteration condition (without feedback). Additionally, subjects who gave more appropriate probability forecasts on the first round exhibited least opinion change, although measures of confidence were unrelated to opinion change. Results also show that majority opinion exerts strong opinion pull on minority opinion even when the majority favours an incorrect answer (irrespective of the nature of feedback provided). The implications of these results for the utility and conduct of the Delphi technique are discussed, in particular, with respect to selecting panellists and choosing an appropriate feedback format.
The Delphi technique is a forecasting tool that was developed to allow the benefits of canvassing multiple judges without the often-corresponding deficits associated with group interaction that may arise from social processes ,  and . It is a structured group process, in which individuals are required to give numerical judgments or forecasts over a number of rounds, with feedback being provided from the anonymous other members of the panel, and the final aggregate being taken as the process output. It is not, however, a method intended to force consensus per se—response stability rather than consensus is the signal to cease additional polling, with disagreement (as indicated by, for example, a bipolar distribution of responses) accepted as informative. Delphi's effectiveness over comparative procedures, at least in terms of judgmental accuracy, has generally been demonstrated . In a review of empirical studies of Delphi, Rowe and Wright  found that Delphi groups outperformed ‘statistical’ groups (which involve the aggregation of the judgments of noninteracting individuals) in 12 studies, underperformed these in two, and ‘tied’ in two others, while Delphi outperformed standard interacting groups in five studies, underperformed in one, and ‘tied’ in two. This trend is all the more impressive given that many laboratory studies of Delphi effectiveness have used simplified versions of the technique (e.g., with limited feedback) in simplified contexts (e.g., using nonexpert, student subjects) that might be anticipated to undermine the virtues of the technique  and . We return to this issue shortly. Although research suggests that Delphi allows improved judgment compared to alternative methods, as demonstrated in these ‘technique comparison’ studies, the reasons for this are still unclear, given a relative dearth of ‘process’ studies that have attempted to establish the precise mechanism for improvement in Delphi . In this study, we attempt to advance understanding of how Delphi improves judgmental performance.
نتیجه گیری انگلیسی
This study has aimed to reproduce and extend the findings of Rowe and Wright  and, in doing so, has attempted to maintain task characteristics across studies (as much as possible) to increase the chance of generalising results between them. In terms of reproducing the earlier findings, however, the success of the present study is limited and lends further weight to the general thesis of Rowe and Wright —that subtle changes in technique composition, group membership, and other situational factors are liable to have substantial effects on how and whether Delphi will aid judgmental accuracy. Indeed, the present study appears to demonstrate just how sensitive results are to any manipulation of task environment. For example, in spite of the fact that both the present study and the previous one involved short-term forecasting tasks using groups of five (student) subjects who made predictions about newsworthy political, economic, and international events, the Iteration condition in the present study led to no improvement in forecasting across rounds when previously it did. Although the two feedback conditions in the present study are undoubtedly somewhat dissimilar to those conditions in the earlier study, the same cannot be said with respect to the Iteration condition, in which precisely the same instructions and subject requirements were used. The most probable explanation for the above outcome is that there were subtle differences between the two tasks and sets of performance measures that were not controlled for across studies. One difference that might be of importance concerns the nature of elicited subject responses: the previous study required quantitative responses (e.g., the number of seats to be won by a named political party in a forthcoming election), while the present study required qualitative ones (i.e., a choice of one answer from two possibilities). For example, it may be that subjects are more amenable to change and improvement when they are able to alter an estimate that they might recognise, on reconsideration, to be exaggerated but may feel a certain defensiveness to admitting a definite, categorical, and undisputed error—something that would be implied in making a prediction change to the converse of two exclusive and exhaustive options. Another difference might lie in the relative difficulty of the items being forecast in the two studies. The essential nature of the cross-study task differences here would seem a topic worthy of further investigation. Returning to the results of the present study, evidence was found for an improvement in cross-round accuracy in the two feedback conditions—a trend that was significant in the Statistical case. With respect to the propensity of subjects to change predictions over rounds, there were no significant differences between the three experimental conditions, although (as in the previous study) the Iteration condition led to a higher proportion of mean changes than either of the feedback conditions. To the extent that results from the two studies are generalisable, this trend indicates that feedback may actually serve as an inhibitor of change, perhaps by inducing a defensiveness in panellists. Our feedback formats were deliberately simplified so as not to confound the effects of Statistical and Reasons feedback, but the generally prescribed Delphi method uses both types, and it would be interesting to assess in a future study how panellists responded to both types in one process (e.g., whether one feedback type proved more influential than another). In terms of the appropriateness of changes over rounds, however, the present results bear little similarity to those of Rowe and Wright . In the previous study, an association was found between increasing propensity of subjects to change judgments and increasing accuracy improvement, in both feedback conditions (but not in the Iteration condition), contrasting with the present results, where a similar association was revealed in the Statistical and Iteration conditions but not in the Reasons condition. Rowe and Wright proposed that feedback allows good forecasters to identify themselves, while providing information to direct the changes of the less expert—an explanation that might still account for the outcomes of the Statistical condition but does not explain the lack of influence of the Reasons feedback or the positive association found in the Iteration condition between these measures. Explanation of these discrepancies appears to require recourse once more to the task characteristics of the present study. The general ineffectuality of the Reasons feedback across a number of performance measures may derive from the sheer number of reasons that our subjects were required to generate. Computation of the Brier measure requires a large number of probability judgments from subjects. This may have led to a degree of overload on the subjects, with a consequent decrease in the quality of arguments and their subsequent ability to appropriately influence opinion change in other panellists. Indeed, it is worth noting that posttask examination of subjects' written rationales revealed a large number of reasons that simply reported opinions (‘I think that this is more likely…’) rather than causal arguments (‘I think that this is more likely because…’). Information of the former type is arguably less rich than that of the latter type and arguably less useful for subjects in terms of discriminating relative expertise. With respect to the role of ‘objective expertise’ in nominal groups, success at replicating the results of Rowe and Wright's study was once more variable. As in the previous study, it was found that high relative expertise was associated with a low propensity to change predictions over rounds—an association that was significant in the Statistical condition and a nonsignificant trend in the Reasons condition. A similar trend was also found in the Iteration condition, although the general effectiveness of the Iteration approach proved inferior here to that of the feedback approaches in terms of overall improvement in aggregate accuracy across rounds. In Rowe and Wright's study, mean first-round confidence, as obtained through rating scales, did not appear to be an especially good predictor of objective expertise. In the present study, an attempt was made to replicate and extend consideration of the role of confidence in nominal groups, particularly by using more fine-grained measures of the appropriateness of confidence (i.e., Brier scores). As in the previous study, little evidence was found of any relationship between subjects' mean first-round confidence and either their first-round accuracy or their propensity to change predictions over rounds. This result argues against the use of confidence measures as discriminants of expertise (for panellist selection). Consideration of calibration graphs revealed the nature of the miscalibration, with subjects generally exhibiting overconfidence—a bias that has been frequently reported (e.g.,  and ). Providing feedback (either Reasons or Statistical) encouraged more appropriate ratings (i.e., better Brier scores). The final factor that was considered in the present study was the influence of majorities and minorities. Unsurprisingly, it was found that the propensity of subjects to change their predictions over rounds was significantly related to the degree of support or opposition (in the nominal group) for the subjects' initial predictions. The expectation, however, that accurate minorities (excluding overt social group pressure) could pull inaccurate majorities towards the correct position was not realised; majorities, whether accurate or otherwise, exerted a significant pull on minorities to the consensual position, even when that position was fallacious. The pervasive effect of majorities is, however, likely to be influenced by the degree of expertise possessed by subjects; although subjects showed a fair degree of competence in this particular forecasting task (with a hit rate of approximately two-thirds correct), the study of the behaviour of more-expert subjects might conceivably yield different results, with subjects of a higher base level of expertise perhaps being more able to resist majority influences in appropriate situations. This is particularly important, inasmuch as it is not Delphi's intent to force consensus. Also of interest with regard to majority/minority influence are the effects of different sized groups and hence the potentially different magnitudes of opposition and support. The impact of relative expertise and group size on opinion change and judgment accuracy are areas that ought to be considered in future studies. For practitioners, the implications of the results of this study are several-fold. First, evidence suggests that confidence is not a good predictor of expertise and hence should not be used as a selection device (e.g., to choose among a list of experts). Second and perhaps obviously, practitioners should take care in choosing their experts, inasmuch as evidence suggests that the ‘better’ the expert is, the more appropriately they are likely to respond to feedback (see also Ref. ). Third, they should be aware that the power of the majority is not totally undermined in Delphi, and hence, convergence of opinion over Delphi rounds will not necessarily imply improved forecasting accuracy in every case. And fourth, the water-muddying results of this study suggest that the practitioner should think carefully about the nature of feedback they provide, what information it might contain, and how their panellists might react to it, inasmuch as a variety of studies (including this one) give contrary results as to whether iteration alone, the feedback of statistical averages, or the use of panellist arguments will lead to most improved performance over rounds. In future studies, we hope to disentangle the complex interactions between feedback type, expertise, panellist personality, and accuracy measures.