قدرت آماری و تخصیص اندازه گیری در ارزیابی مطالعات مداخله ارگونومیک در دامنه فوقانی ذوزنقه الکترومیوگرافی: یک مطالعه موردی از کار مونتاژ
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
6779 | 2002 | 13 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Electromyography and Kinesiology, Volume 12, Issue 1, February 2002, Pages 45–57
چکیده انگلیسی
The present study aimed at exploring the statistical power of ergonomic intervention studies using electromyography (EMG) from the upper trapezius muscle. Data from a previous study of cyclic assembly work were reanalyzed with respect to exposure variability between subjects, between days, and within days. On basis of this information, the precision and power of different data collection strategies were explored. A sampling strategy comprising four registrations of about two min each (i.e. two work cycles) for one day per subject resulted in coefficients of variation between subjects on the 10-, 50-, and 90-APDF-percentiles of 0.44, 0.31, and 0.29, respectively. The corresponding necessary numbers of subjects in a study aiming at detecting a 20% exposure difference between two independent groups of equal size were 154, 78, and 68, respectively (p≤0.05, power 0.80). Multiple measurement days per subject would improve power, but only to a marginal extent beyond 4 days of recording. Increasing the number of recordings per day would have minor effects. Bootstrap resampling of the data set revealed that estimates of variability and power were associated with considerable uncertainty. The present results in combination with an overview of other occupational studies showed that common-size investigations using trapezius EMG percentiles are at great risk of suffering from insufficient statistical power, even if the expected intervention effect is substantial. The paper suggests a procedure of how to retrieve and use exposure variability information as an aid when studies are planned, and how to allocate measurements efficiently.
مقدمه انگلیسی
Changes in mechanical exposure (physical work load) have been used as proxies for expected health effects in a large number of ergonomic intervention studies. This approach has been prominent in controlled experimental studies of specific characteristics of a job, e.g. tools [4] and [43], work station design [31] or work pace [38], as well as in field studies of production systems with a fast turn-over of production processes or labour [7] and [20]. One important reason is that changes in exposure are in general easier to assess and interpret as the specific result of an intervention than health outcomes. In recent years, it has become increasingly evident that direct recordings of mechanical exposure are superior to self-reports or observations as regards accuracy and resolution [49] and [56]. Substantial efforts have been invested in developing personal monitors allowing mechanical exposure data to be continuously sampled and stored for hours during occupational work [3] and [24]. Surface electromyography (EMG) has been extensively used in working life research. In 1995, a review identified 97 internationally published papers using recordings of EMG from the upper trapezius muscle, several of which compared EMG amplitudes during different conditions in simulated or real work [39]. Thus, upper trapezius EMG amplitude may serve as a prominent example of a directly recorded exposure variable in ergonomic intervention studies. However, the use of EMG requires considerable resources in terms of equipment, competence and time. Studies assessing upper trapezius EMG amplitude have therefore in general been conducted on small groups, rarely exceeding 15 subjects, as shown by a comprehensive selection of manual handling studies (Table 1). In addition, a majority of the selected studies reported an exposure standard deviation between subjects beyond 50% of the group mean EMG amplitude. The combination of a small study size and a large exposure variability leads to the risk of low statistical power, i.e. an insufficient ability of the study to detect statistically significant differences in exposure. Analyses of power and optimal allocation of measurement resources are widely accepted study design tools, giving guidance to the necessary investment of measurement resources to reach an acceptable chance of success in a planned study, as well as to an efficient use of these resources. So-far, however, these tools have been only sporadically discussed in the context of ergonomic intervention studies in general, and even less so with a specific focus on EMG [5] and [37]. Conventional power analysis accepts that group mean values of exposure may fluctuate according to an observed variance within and between subjects, but it does not take into account that this exposure variance is in itself a stochastic variable, subject to random error [9]. The influence on power estimates of this additional source of uncertainty has not previously been discussed with reference to ergonomic intervention studies. The present investigation had the general purpose of illustrating a procedure for considering statistical power and resource allocation in the design of studies comparing two independent groups. These issues were explored with respect to assessments of upper trapezius EMG amplitude. Data were obtained from a previous study with a repeated-measures design which allowed values of mean exposure and variance components to be derived, as well as their statistical distributions.
نتیجه گیری انگلیسی
The present paper shows that intervention studies using upper trapezius EMG amplitude percentiles are at a high risk of having low statistical power. Study populations in excess of what is usually considered feasible are needed to detect EMG changes of a magnitude which is common in industrial contexts. For instance, a 20% decrease in the 50-APDF-percentile was found to accompany a 20% decrease in work pace in a study of assembly work (38); a changed arm rest resulted in a 33% reduction in EMG amplitude in a study of simulated crane operation [4]; and a comprehensive company-driven intervention in an assembly system caused median exposure to change by 31% in the study by Bao et al. [7]. Low power occurs when exposure variability between and within subjects is large as compared to the expected exposure difference between groups. Different sources of variability may influence the normalized EMG amplitude percentiles of a specific task, as investigated in the present study. Within-day variance (s2q) may be explained preferentially as a sign of flexible motor control patterns. The between-days component (s2d) probably includes a major contribution from methodological sources, e.g. associated with replacement of electrodes and reproduction of the normalization procedure. Differences between individuals in working technique and further methodological variance add to give the variance between subjects (s2s). Overviews of probable sources of variance in EMG percentiles have been provided by Veiersted [52] and [53] and by Aarås et al. [1]. Methodological sources of EMG amplitude variability within and between subjects have been the concern of several recent investigations. A previous reanalysis of a part of the present data material using a Monte Carlo technique suggested that errors associated with normalization was responsible for 2–7% of the total variance on the mean exposure of the group, depending on the number of reference contractions [36]. Thus, normalization alone introduced a CVS of 0.04 to 0.08 on the 50-APDF-percentile [cf. Eq. (2)], as compared to the total CVS of 0.26 (cf. Table 3). Hansson et al. [24] demonstrated that normalization using a submaximal reference contraction, i.e. a RVE procedure as applied in the present study, implied a smaller ‘between-subjects’ variance among cleaners and office workers than the common MVE procedure using a maximal reference. Although the ‘between-subjects’ variance in the cited study included between-days as well as between-subjects components, the result may be taken as an indication that the ‘anthropometric’ RVE scale leads to more homogeneous groups than the ‘capacity’-related MVE scale, at least in jobs mainly consisting in holding and moving the arms with minor or sporadic hand-held loads. This notion has been confirmed in a large study of laminate sheet handling [5], while two smaller studies gave conflicting evidence [6] and [40]. A study by Veiersted [52] indicated that small inaccuracies in electrode positioning can be a potent source of amplitude variability among standardized contractions repeated on different days. However, later experiments suggest that the positioning effect almost disappears in normalized EMG [27]. Together, the studies above indicate that a major part of the CVS on EMG amplitude percentiles is due to ‘biological’ rather than methodological sources. The present data were collected in the laboratory using strictly standardized procedures, and subjects performed a single, controlled work task. Exposure variability can, therefore, be expected to be larger than found here, e.g. in field studies of individuals with less stereotyped work tasks. A review of several other studies of manual handling tasks confirms that the CVS value of 0.26 obtained for the 50-APDF-percentile in the present material was indeed small (Table 1). Thus, studies in the field may suffer from an even lower power than that suggested by the present data, unless study populations are large. This is confirmed by the detectable effects shown in the last column of Table 1, calculated on basis of Eq. (1). The most sensitive study design is able to detect a change of 23% from the mean exposure in the reference condition, by employing in all 16 subjects [31]. The largest study in the selection is slightly less sensitive although it would include in all 156 subjects [23]. In general, the detectable difference is beyond 50% of the mean. Some of the studies in Table 1 were designed to compare exposure between two independent groups or conditions [2], [7], [23], [26], [48] and [51]. In all but one [23] of these studies, the investigated contrast in the 50-APDF-percentile was smaller than the detectable difference according to Table 1. Thus, insufficient power may well explain that inconclusive findings were reported in all cases. In one study, power calculations were made in advance, but exposure variability showed to be unexpectedly large [51]. Only the study by Hansson and colleagues managed to demonstrate a significant exposure difference between two groups: laminate sheet handlers and office workers [23]. In this unique case, the detectable difference of 31% of the mean (cf. Table 1) was safely below the observed exposure difference of 57% between the groups, even when adjusting for the power-reducing effect of the office group being smaller than the industrial group. Studies using the 10-APDF-percentile of EMG amplitude seem to be at an even greater risk of suffering from low power than studies employing the 50-percentile (as reported in Table 1), since CVS has been shown by the present and other investigations to be considerably larger for the 10-percentile (Table 3 [2], [5], [6] and [24]). On the other hand, a larger detectable effect (in fractions of the mean value) may be considered acceptable in studies using the 10-percentile, due to the fact that this might still imply a sufficient sensitivity of the study design in absolute terms, as measured, e.g. in %MVE. Other EMG-based exposure parameters besides percentiles have been suggested to be indicative of musculoskeletal disorder risk, in particular the occurrence of periods at very low EMG amplitude levels (‘gaps’ [19], [26] and [55]). However, the statistical properties of these parameters seem to be no better than those of APDF-percentiles [1], [42] and [53], and studies based on ‘gap’ parameters may also suffer from major problems in reaching sufficient power and sensitivity with feasible study sizes. The detectable differences shown in Table 1 and Fig. 2(b) were calculated for studies comparing two groups of equal sizes by means of a two-sample, two-tailed t-test, and under a number of assumptions concerning data distributions and exposure variability as described in the methods section. The present data did not oppose the assumption of independent and identical distributions of data at all levels within each factor, i.e. subjects, days, and quanta. Assessments of power and sensitivity may be conducted for other study designs, as well as for data sets and sampling strategies violating assumptions [13], but this falls beyond the scope of the present paper. Measuring for multiple days on every subject may, according to the present results, improve power somewhat, while collection of additional data within each day seems to have marginal effects beyond the sampling of about ten minutes of EMG [cf. Fig. 2(a)]. The effect of increasing sample duration or number of samples per day may be more prominent in studies of jobs composed of several tasks, since the within-day variance s2q is expected to be larger in that case. Very few studies have, however, presented data on within-day variance of EMG amplitude parameters during occupational work [1], [35] and [53]. A major gain in power may, however, be obtained if subjects engage in a study as their own controls in a ‘paired’ design. A number of ergonomic studies have used this option, some of which are included in Table 1[17], [31], [32], [37] and [46]. The paired design relies on the reasonable expectation that pairwise exposures in the two compared conditions will correlate. The gain in sensitivity as compared to an unpaired design depends on the size of this correlation [30] and [37]. Unfortunately, paired EMG studies in the literature very rarely report pairwise correlations. Unpublished data by Mathiassen and colleagues suggest the value to be about 0.6 for 50-APDF-percentiles of EMG amplitude collected at two different days in subjects performing modifications of the present assembly task. With a correlation of 0.6, the detectable difference of 40% obtained when comparing eight subjects with another group of similar size (see above), would be reduced to 27% if the same eight subjects were instead assessed again on a different day. Correlation would probably increase, leading to a further reduction in detectable difference, if all data from a particular subject were collected during the same day, thus eliminating variance introduced by re-application of electrodes and re-normalization. While paired designs are feasible even in the field for studies of, for example, work station design or differences between tools or tasks [2], [24] and [40], they are less practicable in investigations on a longer time scale of, for example, company-based interventions. One major problem to this end is that the intervention may per se imply changes in the workforce [20]. Evidently, paired designs can not be applied to comparisons between companies or occupations. Paired designs operating at the group level may still suffer from the statistical consequences of a large between-subjects variance, due to the fact that individuals do not react coherently to an intervention [21], [37] and [46]. One alternative is to relax the conventional attempt to interpret results in terms of conditions in a general population, and use statistical procedures pertaining to case studies in limited groups [33]. Another alternative is to explore whether the study population can be divided into sub-groups with homogeneous reaction patterns. The ultimate extension of this approach is to analyze intervention effects only at the individual level. In that case, statistical power and detectable effects depend only on the exposure variability within subjects, since each individual constitutes his own ‘group’. Thus, the sensitivity of the study will be increased as compared to non-paired as well as paired group-based designs, however at the expense of results not being generalizable to a population. As revealed by the bootstrap procedure, a considerable uncertainty characterized the estimates of mean exposure, variance components and power derived from the parent material. The reported sizes of confidence intervals may apply to many EMG studies, although they will decrease in larger data sets [47]. Besides being an important descriptive tool, the intervals indicate useful scenarios to be considered when planning studies. Thus, it was recently proposed on basis of Monte Carlo simulations that study sizes should be decided on basis of an upper confidence limit of exposure variability between subjects rather than the observed central value [9]. This means that the upper 95% confidence limits on study size, ns, presented in Table 3 may be safer estimates of the true needs than the values estimated directly from parent data. Some of the confidence intervals obtained by bootstrapping could have been obtained also with analytical procedures. Thus, Eq. (3) expresses the precision of an estimated mean exposure at the group level on basis of variance components of exposure. The 95% confidence limits on the mean, obtained by using Eq. (3) on the assumedly normally distributed parent data set of 50-APDF-percentiles resembled those obtained by bootstrapping: 39.3%RVE to 60.9%RVE (analytical estimate) as compared to 41.6%RVE to 59.3%RVE (bootstrapping, cf. Table 3). In the case of variance components and coefficients of variation, confidence intervals can also be estimated using analytical procedures. These procedures are, however, approximate and assume normality [47] and [50], and distribution-free bootstrap intervals offer an attractive alternative [16]. The present paper implicitly proposed a general strategy for considering power issues and measurement allocation when planning for a comparative study of two groups. Although the strategy was exemplified using upper trapezius EMG data from cyclic assembly work, it is generally applicable to any study of group differences in mechanical exposure. The basic remedies, (1), (2), (3) and (4), can be applied whenever data of mean exposure and variance components are available. The bootstrap procedure adds important information concerning the stability of estimates of power and measurement allocation obtained from the parent data set. The strategy comprised the following major steps, which are suggested as a ‘best practice’: 1. Retrieve mean exposures and variance components for the exposure measures of interest, either from a pilot study or from the literature. In principle, multiple measurements are required at every source of variance (subjects, days within subject, registrations within day), but valuable information may still be obtained from less exhaustive data. 2. Analyze the relationship between the desired sensitivity of a planned study, i.e. its ability to detect differences of a certain size, and the required measurement effort in terms of subjects and recordings per subject. 3. Estimate the sensitivity of the reached conclusions to random fluctuations in the data material, for instance using bootstrap resampling. 4. Compare the necessary study size with the limits set by resources or feasibility, and consider whether the constrained study is worth while performing at all. 5. If the study design is to be implemented, determine the measurement strategy with the best possible power according to invested efforts. This step may require economic considerations beyond the scope of the present paper.