سنجش توافق در مطالعات دلفی : مرور و مفاهیم برای تضمین کیفیت در آینده
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|1047||2012||12 صفحه PDF||سفارش دهید||9130 کلمه|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Technological Forecasting and Social Change, Volume 79, Issue 8, October 2012, Pages 1525–1536
Consensus measurement plays an important role in Delphi research. Although it is not the technique's aim, the measurement has to be considered an important component of Delphi analyses and data interpretation. During the past 60 years, the Delphi multi-round survey procedure has been widely and successfully used to aggregate expert opinions on future developments and incidents. This paper is dedicated to how consensus (and dissent) has been measured since the technique's emergence in the 1960s and which criteria have been used. The review also includes a description of its relationship with the measurement of stability over Delphi rounds, although the major focus lies on the concept of consensus. In an extensive literature review, 15 types of measure were identified and classified for measuring consensus (and/or stability) in detail. The research reveals that there are obvious deficits in the practice and rigour of consensus measurement for Delphi research: mistakes in statistical tests or their premises have even been made. This article gives a broad understanding of the consensus concept, shows strengths and weaknesses as well as premises of different types of measure and concludes with lessons learned. Its major contribution is therefore on improving the future quality of consensus-oriented Delphi studies.
Mankind has always desired to know what the future will be like. Throughout history, people have consulted chosen individuals who were said to be able to envision the future. Among the most famous prophesiers is Michel de Notredame (1503–1566) or Marie Anne Adélaide Lenormand (1772–1843). For over thousand years, oracles constituted the lives of Romans and Greeks. From the eighth century B.C. until the third century A.D., people primarily consulted oracles regarding fortune, success, marital affairs, professional advancement, and judicial disputes  and . In these times, oracular sites were spread all over Greece. The two greatest were in Delphi, associated with Apollo, and in Dodona, associated with Zeus . The Greek word Delphoi means “hollow” or “womb”. Historians interpret it as a reference to Gaia, the great mother of all creatures on Earth or the primordial Earth goddess, in the Ancient Greek religion. In the 1950s, the term Delphi was adopted by the U.S. RAND Corporation for its research purposes. The RAND Corporation was a research institution that initially focused on national security issues and later concentrated on scientific, educational, and charitable endeavours for public welfare. Within the scope of the “Project Delphi”, RAND researchers developed a structured survey in written form in order to estimate bombing requirements. For confidentiality reasons, the contents of the experiment were first published by Dalkey and Helmer  in their article “An Experimental Application of the Delphi Method for the Use of Experts” 10 years later. Project Delphi was sponsored by the United States Air Force and included the application of “expert opinion of the selection, from the viewpoint of a Soviet strategic planner, of an optimal U.S. industrial target system and to the estimation of the number of A-bombs required to reduce the munitions output by a prescribed amount” [4, p.458]. The expert panel consisted of seven specialists in the areas of economics, physics, systems analysis, and electronics. Dalkey and Helmer  reported that the experts' first evaluation of possible industry targets did not result in consensus. However, in a second estimate, consensus was achieved and the procedure was said to have yielded more reliable results than comparable techniques. Shortly after the technique's introduction to the public in 1963, various studies using the technique on non-military issues followed. Since the 1950s, the usage of the Delphi survey method has undergone different stages of development, which were described by Rieger : 1. Secrecy/obscurity (1950s): exclusive application in the military context 2. Novelty (1960s): declassification by the U.S. military and introduction to the public 3. Popularity (1970–1975): spread to Western Europe, Eastern Europe, and Asia; major forecasting tool in business 4.Scrutiny (1975–1980): critical evaluation of the technique's reliability and validity 5. Continuity (1980–1986): acceptance in science and practice; stable application patterns After a time of stagnation in the 1980s, the Delphi technique received increasing interest in the early 1990s again. As an extensive literature review by Landeta  shows, this trend prevails. In total, 414 Delphi-related articles were published in the two major databases “Science Direct” and “ABI/Inform” during 1995 and 1999 . This number increased to 677 articles in the period between 2000 and 2004. Similar patterns can be observed for book publications. Google Ngram viewer displays graphs showing how certain phrases have occurred in a collection of books (generated in July 2009) during time (see ). From 1950 until 2008, a bigram search in a sample of more than one million English books for “Delphi study” reveals a strong increase from almost zero in 1960 to peak in 1962, but then to decrease to about half of the 1980 frequency around 1990. Since then, the term “Delphi study” was used increasingly and after 2005 reached a level higher than all years before. More recent applications concentrate on the web-based implementation of the Delphi procedure, but still follow the technique's fundamental rationale and consider consensus measurement to be a key component of analysis  and . The facts presented here underline that the Delphi technique is widely accepted as a research technique today and that its value has been scientifically and practically proven. This paper contributes to research on the Delphi techniques in three ways. First, it identifies and explains 15 different types of measures related to the field of consensus measurement and/or stability over Delphi rounds. Second, it gives an overview of the criteria that have been defined for these 15 different types of measurement. Third, an overall assessment is presented that gives guideline for future Delphi work. The overall research questions are therefore: (1) how has consensus been measured in Delphi studies since its emergence in the 1960s until today?; (2) which levels of measurement have been used to define consensus?; and (3) which implications should be considered for future quality assurance in Delphi research? The research presented here is therefore conceptual in nature and represents a comprehensive literature review across multiple disciplines and research strands. The remainder of the paper is organized as follows: Section 2 summarizes the fundamental characteristics and rationale of Delphi in order to set the basis for the following core sections on consensus measurement. Section 3 is dedicated to general definitions of “consensus”, its relationship with stability and the differentiation from dissent-oriented Delphi studies. 4 and 5 provide an overview of different types of consensus measurement and the defined consensus criteria. This includes subjective and descriptive types of measurement as well as inferential statistics. Conclusions will be stated in Section 6; limitations and future research will be addressed in Section 7.
نتیجه گیری انگلیسی
The previous sections have presented the results of an extensive literature review on consensus measurement in Delphi studies including accompanying tests for stability. This review revealed that a general standard of how to measure consensus in Delphi studies does not yet exist. Researchers have applied subjective criteria as well as descriptive and inferential statistics to measure consensus and convergence. Especially in the case of the latter, violations in basic assumptions have been found or tests have been wrongly conducted. Also, there is often a mistaken impression that consensus has to be considered the major aim and stopping criterion of a Delphi round. In fact, numerous researchers have illustrated why stability should be tested first, before terminating Delphi rounds based on consensus , ,  and . Thus, the measurement of consensus alone is not sufficient for Delphi studies, even more since it is not the primary aim of the Delphi method. Delphi facilitators should definitely test for both stability, for example by the Chi square (x2) test for independence or by changes in the coefficient of variation in subsequent rounds, and the level of agreement, for example by the Interquartile Range, in order to fully exploit the data. Independent from the usage as a stopping criterion, consensus measurement has to be considered a key component of Delphi data analysis and interpretation — as well as the measurement of dissent. Both consensus and dissent oriented analyses, such as for opposing group views grounded in the data, should be complementarily applied in order to obtain a deeper understanding of the data. Murphy et al.  analyzed group behavior in Delphi research and could identify three scenarios, which are the most likely depending on the initial situation. If there is a majority view, this is likely to determine the final decision. On the other hand, if there is an initial consensus among panellists, the final group opinion may shift to a more extreme view. Should there be split view initially, panellists will move toward one of the two views resulting in subgroups. More cohesive subgroups will, in turn, lead to a lower chance of achieving consensus. In such cases, researchers should try to find valuable results in dissent oriented analysis, such as statistical outliers and extreme points, bi- or multipolar distributions, or group comparisons. Nevertheless, even though certain statements may not result in consensus among panellists, the process generally helps to clarify an issue. A good example is the study conducted by Spinelli . Although the author found no significant convergence of opinions, the results indicated several valuable trends over the three-round period. Finally, researchers should bear in mind that in addition to consensus statistics, other analyses, such as scatter plots, analyses of subgroups, or impact analyses, may also lead to interesting results in Delphi studies .