توسعه بانکی موردی برای اندازه گیری عمدی رفتارهای خود خودآسیبی:تسهیل مقیاس طراحی شده و تست تطبیقی کامپیوتر برای تحقیقات خاص و اهداف بالینی
|کد مقاله||سال انتشار||مقاله انگلیسی||ترجمه فارسی||تعداد کلمات|
|36845||2014||8 صفحه PDF||سفارش دهید||محاسبه نشده|
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Psychiatry Research, Volume 217, Issue 3, 30 July 2014, Pages 240–247
Abstract The purpose of this study was to investigate the application of item banking to questionnaire items intended to measure Deliberate Self-Harm (DSH) behaviours. The Rasch measurement model was used to evaluate behavioural items extracted from seven published DSH scales administered to 568 Australians aged 18–30 years (62% university students, 21% mental health patients, and 17% community members). Ninety four items were calibrated in the item bank (including 12 items with differential item functioning for gender and age). Tailored scale construction was demonstrated by extracting scales covering different combinations of DSH methods but with the same raw score for each person location on the latent DSH construct. A simulated computer adaptive test (starting with common self-harm methods to minimise presentation of extreme behaviours) demonstrated that 11 items (on average) were needed to achieve a standard error of measurement of 0.387 (corresponding to a Cronbach׳s Alpha of 0.85). This study lays the groundwork for advancing DSH measurement to an item bank approach with the flexibility to measure a specific definitional orientation (e.g., non-suicidal self-injury) or a broad continuum of self-harmful acts, as appropriate to a particular research/clinical purpose.
. Introduction Early theorists have defined Deliberate Self-harm (DSH) (also referred to as self-harm) as an act of bodily self-harm that is intentional, direct and immediate (Babiker and Arnold, 1997 and Kreitman, 1977) with a non-fatal outcome (Morgan, 1979). In more recent times, several specific DSH definitions have been proposed that can be distinguished according to the dimensions of method, intent, lethality and outcome (Ougrin and Zundel, 2009), ranging from a narrow set of visible tissue damage acts performed in the absence of a desire to die (Nock, 2010) to broad spectrums of self-injurious behaviours with multiple intentions (Skegg, 2005). There is on-going debate about the relative merits of the various definitions (De Leo, 2011 and Wilkinson and Goodyer, 2011), and some researchers argue that no clear picture of the epidemiology of DSH can be gained until clinicians and researchers agree on a definition of DSH (Rodham and Hawton, 2009). An example of a narrow DSH definition (covering visible tissue damage acts with non-suicide intent) is Non-Suicidal Self-Injury (NSSI) (Nock, 2010). An independent mental health disorder based on the NSSI definition has been included in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (American Psychiatric Association, 2013) as a condition requiring further study. The argument for NSSI as a disorder is based, in part, on the proposition that the methods of DSH most associated with NSSI (viz., mild to moderate forms of visible tissue damage) (Wilkinson and Goodyer, 2011) may form a distinct grouping on a latent continuum of self-harmful behaviours (Ougrin and Zundel, 2009). An example of a less restricted definition of DSH includes self-poisoning and self-injury, irrespective of suicide intent or type of motivation (Hawton and James, 2005). Many researchers and clinicians argue the merits of adopting a broader definition of DSH because of the difficulties in reliably measuring intent (Ougrin and Zundel, 2009), the lack of support for inferring intent from choice of method (Nada-Raja et al., 2004), and the frequent situation of suicide and non-suicide related self-harm occurring in the same individual (Nock et al., 2006). Regardless of the specific definition of DSH, it is common practice to measure engagement in self-harm by interview schedules and self-report tests that include a behavioural scale comprised of specific methods of self-harm (Borschmann et al., 2012). Scale developers select items that accord with their adopted conceptualisation of DSH. Scales that conform to the narrow NSSI definition are generally restricted to items related to visible tissue damaging acts (e.g., cutting, bruising, scratching and burning) (Gratz, 2001). Scales that are consistent with less restrictive definitions of DSH are likely to combine visible tissue damaging methods with highly dangerous methods (e.g., swallowing dangerous substances) (Croyle and Waltz, 2007), self-harm without visible injury (e.g., exercised an injury to cause harm) (Vrouva et al., 2011), lack of self-care (e.g., taking too little medication to cause harm) (Nixon et al., 2002), and deliberate recklessness to cause harm (e.g., driven recklessly to cause harm) (Sansone et al., 1998). Most published DSH behavioural scales exhibit good content validity (at least within their particular definitional orientation), reliability, and external validity (Borschmann et al., 2012). Our recent investigation of six frequently used DSH scales found strong evidence for the unidimensionality of the item sets contained in each scale (Latimer et al., 2013). It should be noted that some published scales exhibit high levels of local dependency (i.e., a person׳s response on one item influences their response on another item) (Latimer et al., 2013) thereby inflating estimates of reliability (Boyle, 1991). Consistent with their overall sound psychometric properties, published DSH behavioural scales have made a major contribution to advancing our understanding of DSH. They assist clinicians to accurately identify the range of past methods which is the recommended first step in the assessment of DSH (Skegg, 2005), and they are preferred to open-ended questions which are likely to under-estimate the range of self-harm acts (Nock, 2010). In research studies, they inform DSH prevalence rates which are based on the endorsement of at least one method of self-harm (Plener et al., 2009), and a count of the different methods over a person׳s lifetime appears to provide a very useful estimate of a person׳s location on a latent DSH construct with high scores indicating a progression to more extreme forms of DSH (Latimer et al., 2013). Their application in both clinical and research settings is likely to increase with the emerging evidence that a count of past methods of DSH has the strongest association with psychopathology (compared to frequency and recency of any one method) (Nock et al., 2006), and it appears to be the best single predictor of future DSH (Glenn and Klonsky, 2011). However, there are several shortcomings associated with published DSH behavioural scales. First, existing scales cover different combinations of methods (as a consequence of being orientated to specific definitions of DSH) thus preventing the comparison of prevalence rates across studies and equating of clinical cut-off scores (Gratz, 2001 and Rodham and Hawton, 2009). Second, some scales are based on narrow definitions of DSH and provide less information about the range of past and present methods of DSH compared to scales based on more expansive definitions (Latimer et al., 2012). Third, concerns have been raised about the potential for scales to cause distress (or arouse curiosity) as a result of the presentation of a fixed set of methods (a necessary feature of traditional pencil and paper formats) to all respondents regardless of their level of experience in DSH (Patton et al., 1997 and Zetterqvist et al., 2013). Consequently, ethics committees may be reluctant to approve research designs involving a checklist of specific methods (Swannell et al., 2014) and more research is required to fully inform the risks associated with in-depth DSH assessments (Reynolds et al., 2006 and Whitlock et al., 2013). Fourth, many DSH scale developers (Croyle and Waltz, 2007 and Nock et al., 2007) do not report Cronbach׳s Alpha for behavioural items thereby making it difficult to researchers and clinicians to select a scale with the required level of measurement precision for group level research (Cronbach׳s Alpha greater than 0.70) or for individual level research/clinical assessment (Cronbach׳s Alpha greater than 0.85) (Ponterotto and Ruckdeschel, 2007). It may be possible to advance the measurement of specific DSH methods by the development of a unidimensional item bank based on modern test theory (Hambleton et al., 2005). This approach has the potential to maintain the significant past achievements of the published scales while addressing at least some of the limitations that impact on their use in both clinical and research settings. First, a DSH item bank may allow the extraction of tailored scales (covering different combinations of DSH) with the same raw score for each person location on the latent DSH construct (Boekkooi-Timminga, 1990 and Hambleton et al., 2005) thus allowing for the direct comparison of prevalence rates and clinical cut-off scores. Second, it may be possible to tailor extracted scales to a particular clinical purpose (e.g., escalation from mild to extreme forms of visible tissue damage) (Favazza, 2006) or a specific research task (e.g., relationship between visible tissue damage acts and highly dangerous methods) (Nada-Raja et al., 2004) rather than to a specific definition of DSH. Third, a DSH item bank may facilitate computer adaptive testing (CAT) applications (Hambleton et al., 2005) to minimise the presentation of items that are beyond the experience of the person taking the test (e.g., extreme methods that she/he has not even thought about). This innovation would be consistent with the recommendation for a graduated assessment of DSH (Nock, 2010), and would address ethical concerns about pencil and paper formats (Patton et al., 1997 and Zetterqvist et al., 2013). Fourth, it may be possible to configure DSH CAT applications to provide specific levels of measurement precision (Choi et al., 2012a) that correspond to the values of Cronbach׳s Alpha required for various research and clinical tasks (Ponterotto and Ruckdeschel, 2007). A unidimensional DSH item bank (and associated applications of tailored scale extraction and CAT) requires the calibration of a sufficiently expansive set of DSH methods on the same latent DSH construct (Hambleton et al., 2005). We have successfully co-calibrated six published DSH scales on a common measurement metric for the purpose of providing raw score conversion tables (Latimer et al., 2012). An additional outcome from the co-calibration is a pool of candidate items for a DSH item bank. The item pool covers all three accepted groupings of DSH behaviours proposed by Skegg (2005), namely, DSH by the use of dangerous methods, DSH by the use of visible tissue damage, and DSH by the use of other methods without visible injury (including lack of self-care and risky behaviours to cause harm). All such items are expected to relate the same latent DSH construct consistent with a continuum model of DSH behaviours (Nock, 2010). The major objective of the present study was therefore to develop an item bank covering specific methods of self-harm extracted from published tests of DSH. A second objective was to investigate the utility of the item bank by extracting tailored scales and by simulating a CAT application.
نتیجه گیری انگلیسی
3. Results 3.1. Development of item bank In Step 1, the seven DSH scales were shown to be strictly unidimensional when analysed as testlets, formed by summating all the items in each scale into one item so that each scale becomes one testlet (see analyses A, B and C in Table 2). In the pool of 98 candidate items, 27 items were free of local dependency (LD) and 71 items (in 17 clusters) exhibited LD. Table 2. Results of item banking and extraction of scales. Step Analysis No. of items Item–trait interaction Item fit Person fit Alpha % Sign. Banked items χ2 d.f. P Mean (S.D.) Mean (S.D.) t-tests 1 A Common scales 2a 28.053 16 0.031 −0.347 (2.517) −0.460 (0.813) 0.83 1.49% n.a. 1 B Set A scales 4b 42.161 32 0.108 −0.038 (1.234) −0.451 (0.944) 0.90 3.99% n.a. 1 C Set B scales 5c 59.681 40 0.023 0.242 (1.574) −0.501 (1.046) 0.91 2.99% n.a 2 D Core item set 38 260.096 214 0.017 −0.636 (1.096) −0.248 (0.556) 0.83d 0.82e 2.48%d 0.71%e 38 3 E Set aside set 1 14 78.213 56 0.027 −0.445 (0.998) −0.257 (0.730) 0.75 0.53% 9 3 F Set aside set 2 14 90.950 70 0.047 −0.443 (0.833) −0.219 (0.621) 0.74 1.06% 5 3 G Set aside set 3 14 72.621 56 0.067 −0.605 (0.806) −0.288 (0.712) 0.71 0.53% 5 3 H Set aside set 4 11 56.436 53 0.348 -0.326 (0.711) -0.250 (0.707) 0.64 0.35% 3 3 I Set aside set 5 17 115.803 85 0.015 −0.268 (0.843) −0.231 (0.678) 0.67 1.24% 8 3 J Set aside set 6 12 50.200 45 0.275 −0.570 (0.894) −0.271 (0.703) 0.65 0.18% 3 3 K Set aside set 7 13 77.765 62 0.085 −0.289 (1.205) −0.222 (0.686) 0.68 1.42% 4 3 L Set aside set 8 15 97.405 70 0.017 −0.650 (1.015) −0.252 (0.638) 0.74 0.18% 6 3 M Set aside set 9 10 42.346 39 0.329 −0.663 (0.758) −0.271 (0.686) 0.61 0.00% 1 4 N Re-check set 1 13 113.648 89 0.040 −0.307 (1.207) −0.239 (0.725) 0.66f 0.53%f 10g 4 O Re-check set 2 10 65.873 56 0.172 −0.547 (0.645) −0.281 (0.698) 0.65f 0.00%f 4g 4 P Re-check set 3 10 56.893 46 0.130 −0.393 (0.535) −0.294 (0.700) 0.55f 0.53%f 4g 4 Q Re-check set 4 10 53.449 48 0.273 −0.431 (0.989) −0.278 (0.688) 0.55f 0.00%f 4g 4 R Re-check set 5 9 41.246 38 0.331 −0.517 (0.646) −0.288 (0.693) 0.57f 0.35%f 2g n.a. S Extracted scale 1 22 105.499 88 0.098 −0.513 (0.722) −0.199 (0.479) 0.80 1.60% n.a n.a. T Extracted scale 2 22 118.174 88 0.018 −0.488 (0.958) −0.207 (0.568) 0.82 1.06% n.a Ideal values P>Bonferroni adjusted p values h 0.0 (1.0) 0.0 (1.0) >0.80 <5% a Common scales as testlets. b Set A scales as testlets. c Set B scales as testlets. d Set A items in core item set. e Set B items in core item set. f Calculated before items split for DIF. g Split items. h 0.05 divided by number of items. Table options In Step 2, a candidate set of 44 items for the core item set was compiled from the 27 items free of LD and one item from each of the 17 clusters of items with LD, with the remaining 54 items with LD set aside for Step 3. In the 44 items, Item 34 (SHI-22) (abused alcohol) and Item 52 (SHIF-16) (bitten your fingernails … bleeding or pain) exhibited misfit due to a lack of discrimination. That is, the probability of response was the same across all overall levels of DSH. The two misfitting items were re-evaluated in Step 3. Also in Step 2, two items showed differential item function (DIF) for gender, namely, Item 30 (SHI-22) (cut; females>males), and Item 45 (SHI-22) (lost job on purpose; males>females). Two items showed DIF for age, namely, Item 35 (SHI-22) (driven recklessly; 20 years and over >18–19 years) and Item 39 (SHI-22) (been promiscuous; 20 years and over >18–19 years). The four items with DIF items were re-evaluated in Step 4. The remaining 38 items (called core item set) showed adequate fit to the Rasch model (see analysis D in Table 2). In Step 3, nine set aside item sets (each one free of LD) were constructed from the remaining 54 items with LD that were not analysed in Step 2. Following the deletion of items with individual misfit and the removal of items with DIF for re-evaluation in Step 4 (see details in following paragraphs), each set aside set was anchored by items from the core item set. Anchor items were selected according to the following standards: (1) free of LD with any other item, (2) maximised spread of item locations, and (3) no extreme locations. After anchoring, all set aside item sets showed adequate fit to the Rasch model (see analyses E–M in Table 2). Step 3 re-evaluated the two items identified with misfit in Step 2 by including them in one of the nine set aside item sets. Item 34 (SHI-22) (abused alcohol) was no longer misfitting and therefore retained in the item bank. Item 52 (SHIF-16) (bitten your fingernails … bleeding or pain) was confirmed as misfitting. Step 3 identified three additional misfitting items, namely, Item 46 (SHI-22, attempted suicide) with deviations in individual class intervals, Item 51 (SHIF-16, interfered with the healing of a wound, such as by repeatedly pulling off scabs) exhibiting under-discrimination, and Item 62 (SHIF-16, cut other areas of your body …) showing over-discrimination. Items 46, 51, 52 and 62 were excluded from the item bank. Also in Step 3, eight items were detected with DIF, including four cutting items biased to females (extracted from SHIF-16, SITBI-11, DSHI-16 and SIQTR-5) and three self-hitting items biased to males (extracted from ISAS-12, SHIF-16 and SITBI-11). The remaining DIF item (bit yourself; extracted from SITBI-11) was biased to younger people. Step 4 re-evaluated the four items identified with DIF in Step 2 and the eight items identified with DIF in Step 3. Because some items with DIF also exhibited LD, it was necessary to construct five re-check item sets (see analyses N–R in Table 2). DIF was confirmed for all 12 items and required item splitting to provide an estimate of item location for sub-groups (males vs females, 18–19 years vs 20 years and over). The split items in each set were anchored on the metric for the core item set. The re-check item sets (see Analyses N–R) showed adequate fit to the Rasch model, noting the need to conduct the PCA and t-tests prior to item splitting. In Step 5, all items calibrated in the previous steps were compiled into a unidimensional item bank. In summary of the previous steps, 38 items were provided by the core item set (Step 2), 44 items were recovered from the set aside item sets (Step 3), and 24 split items (from 12 items with DIF) were provided by the re-check item sets (Step 4). 3.2. Extraction of tailored scales Two tailored scales were extracted from the item bank. Scale 1 (22 items and mostly constructed from items extracted from SASII-16 and SHIF-16) covered mild and moderate visible tissue damaging methods (11 items), highly dangerous methods (nine items), and other self-harmful methods without visible injury (two items). Scale 2 (22 items and mostly constructed from items extracted from SHIF-16, SITBI-11 and DSHI-16) comprised mild, moderate and extreme visible tissue damaging methods (18 items) in combination with highly dangerous methods (two items) and other self-harmful methods without visible injury (two items). The two extracted scales showed adequate fit to the Rasch model and good internal consistency with Cronbach׳s Alpha greater than 0.80 (see analyses S and T in Table 2). Because of the equal dispersion of item locations, both scales provided the same raw scores for each person location on the latent construct thus avoiding the need for a raw score conversion table. 3.3. CAT simulation The CAT simulations were conducted for 1000 cases and the 82 items calibrated on the common metric in Steps 2 and 3. The 12 items with DIF (Step 4) were not included as this would require a different order of item presentation in sub-groups (according to the different estimates of location provided by the item splitting procedure) and this feature is not presently available in Firestar-D software. The first item administered in the simulations was selected from items at the lower end of the latent construct. The first CAT simulation showed that 11 items, on average, were administered for a SEM of 0.387 (approximating to a Cronbach׳s Alpha of 0.85), and the second simulation showed that six items, on average, were administered for a SEM of 0.521 (approximating to a Cronbach׳s Alpha of 0.72). In comparison, values of Cronbach׳s Alpha (based on 0,1 scores for lifetime presence) reported by DSH scale developers include 0.82 for the DSHI-16 (Gratz, 2001), 0.84 for the ISAS-12 (Klonsky and Glenn, 2009), and 0.90 for the SHI-22 (Sansone et al., 2006).