مدل سازی سلسله مراتبی بیزی از پویایی زمانی بهزیستن ذهنی: مطالعه طولی 10 سال
کد مقاله | سال انتشار | تعداد صفحات مقاله انگلیسی |
---|---|---|
38044 | 2015 | 14 صفحه PDF |
Publisher : Elsevier - Science Direct (الزویر - ساینس دایرکت)
Journal : Journal of Research in Personality, Volume 59, December 2015, Pages 1–14
چکیده انگلیسی
Abstract This study demonstrates, for the first time, how Bayesian hierarchical modeling can be applied to yield novel insights into the long-term temporal dynamics of subjective well-being (SWB). Several models were proposed and examined using Bayesian methods. The models were assessed using a sample of Australian adults (n = 1081) who provided annual SWB scores on between 5 and 10 occasions. The best fitting models involved a probit transformation, allowed error variance to vary across participants, and did not include a lag parameter. Including a random linear and quadratic effect resulted in only a small improvement over the intercept only model. Examination of individual-level fits suggested that most participants were stable with a small subset exhibiting patterns of systematic change.
مقدمه انگلیسی
. Introduction Researchers have long been interested in the long-term stability and change of subjective well-being (SWB). Test–retest correlations from longitudinal data (Schimmack & Oishi, 2005) and twin studies (Lykken & Tellegen, 1996), together with the generally small long-term effect of major life events, all attest to the stability of SWB over time. However, test–retest correlations do decline as test–retest intervals increase (Schimmack & Oishi, 2005), and more recent work suggests that some life events lead to long-term changes in SWB for some people. To explain these temporal dynamics, several theoretical models of SWB have been proposed (e.g., Brickman and Campbell, 1971, Cummins, 2015, Easterlin, 2003 and Headey and Wearing, 1989). Underpinning the evidence for these theoretical models are various statistical approaches that have been used to analyze longitudinal datasets (e.g., Charles et al., 2001, Easterlin, 2003, Ehrhardt et al., 2000, Headey and Wearing, 1989, Helliwell, 2003, Lucas and Donnellan, 2007, Mroczek and Spiro, 2005 and Orth et al., 2010). In particular, various hierarchical modeling and latent variable approaches have provided insights into the nature of SWB dynamics. While these statistical models have provided useful insights, they also have their limitations. In particular, they have tended to rely on standard distributional assumptions and used a limited set of model comparison tools. More recently, researchers in a wide range of fields, including psychology, have begun to explore the potential of the Bayesian approach to model estimation and comparison (e.g., Anglim and Wynton, 2015, Averell and Heathcote, 2011, Elliott et al., 2005, Lee, 2008 and Nikodijevic et al., 2015). Software such as BUGS, Jags, and Stan have made flexible Bayesian model specification more accessible to applied quantitative researchers by reducing the need for the user to specify an algorithm for parameter estimation. Furthermore, the Bayesian approach offers a range of powerful model comparison tools which include model recovery, measures of fit with advanced penalties for model complexity, and checks on whether models recover theoretically important features of the data (Gelman et al., 2013). However, despite their increased accessibility, such models have not yet been applied to longitudinal SWB research. Thus, the purpose of this paper is to apply the Bayesian approach in order to parsimoniously model the features of long-term change in SWB. We propose several alternative models and show how a Bayesian approach to estimation and model comparison provides novel insights into the temporal dynamics of SWB. We estimate models and apply this approach to 10 years of SWB data from a large representative sample of Australian adults. 1.1. Subjective well-being (SWB): An overview Subjective well-being (SWB) commonly refers to a broad range of emotional reactions and cognitive evaluations that represent an individual’s assessment of their overall life quality (Diener, Suh, Lucas, & Smith, 1999). When measured either by a single global life satisfaction item or by a composite scale based on satisfaction with multiple domains of life (e.g., the Personal Wellbeing Index, International Wellbeing Group, 2013), several robust findings have particular relevance to the current investigation. First, most people report feeling positive about their lives most of the time (Cummins, 1998, Cummins, 2003 and Cummins, 2013). Second, positive mood provides an explanation for this stability with the combination of happiness, contentment, and alertness accounting for up to 80% of SWB variance (Blore, Stokes, Mellor, Firth, & Cummins, 2011). Third, from the perspective of homeostatic theory, individual differences in this positive affect forms the basis of an affective set-point (Tomyn & Cummins, 2011), and when emotions create a level of SWB different from set-point, a homeostatic system is activated with responsibility for returning SWB to set-point (Cummins, Li, Wooden, & Stokes, 2014). An essential feature of SWB that can be understood as a consequence of the above is that it tends to be fairly stable over time. Hartmann (1934) provided initial evidence of this, reporting a one-month test–retest correlation of .70 in self-reported general happiness among college students. By the 1970s it was clear that considerable levels of stability in SWB extend over several years (Andrews and Withey, 1976 and Palmore and Kivett, 1977). A meta-analysis by Schimmack and Oishi (2005) obtained average test–retest correlations for multi-item scales at 1 year of around r = .60, and at 10 years of around r = .35, but estimates based on more than 5 years were based on small sample sizes. Supporting a partial genetic basis for this stability, Lykken and Tellegen (1996) found much larger SWB intraclass correlations for monozygotic twins (r = .44) than for dizygotic twins (r = .08). Finally, many major life events appear to have only a temporary effect on SWB ( Headey and Wearing, 1989 and Suh et al., 1996). Adding to the understanding of these trends, several strands of evidence suggest that SWB measurement for a given individual is more than just sampling from a stationary distribution. Test–retest correlations do tend to decline somewhat over time and even over one-year intervals such correlations are typically less than internal consistency measures of reliability. Furthermore, covariance models that seek to partial out trait and auto-regressive variance have estimated that auto-regressive factors explain almost as much variance as traits (Lucas & Donnellan, 2007). Additional auto-regressive variance may be explained by extreme life events, like approaching death (Gerstorf et al., 2008), marital transition (Lucas, Clark, Georgellis, & Diener, 2003), and acquiring a disability (Lucas, 2007). Finally, studies of overall age effects do suggest that small but meaningful changes in SWB occur over the life course (e.g. Mastekaasa & Moum, 1984). Despite the demonstration of such small changes, it is the overall stability of SWB over time that has led researchers to propose various stabilizing mechanisms (Cummins, 1995 and Cummins et al., 2003). For example, Brickman and Campbell (1971) proposed that people adjust expectations to changing circumstances while Headey and Wearing, 1989 and Headey and Wearing, 1992 proposed that stable personality traits systematically influence the experience and perception of life events which, in turn, influences SWB. Finally, Cummins (2015) proposed that homeostatically protected mood (HPMood) set-points are the key to SWB stability, where systematic change in SWB is caused by homeostatic failure, when an individual’s resources are insufficient to effectively counter the level of experienced challenge. Such failure, however, is usually an acute event, with SWB normally recovering to the level of its set point. 1.2. Longitudinal statistical models of SWB Researchers have applied a range of statistical models to study the long-term temporal dynamics of SWB (for a review, see Eid & Kutscher, 2014). Such models have almost always included a random intercept and generally adopt either a latent growth curve (e.g., Helson et al., 2002 and Orth et al., 2010) or a hierarchical modeling approach (e.g., Lucas and Donnellan, 2011 and Mroczek and Spiro, 2005). Stochastic change is typically modeled using a lag parameter, whereas systematic change is commonly modeled using random linear and quadratic effects, although discrete change and growth-mixture models have also been employed (Mancini et al., 2011 and Wang, 2007). In particular, trait-state-error models (Kenny & Zautra, 2001) include parameters representing stable and lag components, as well as a state component which includes both occasion specific variance and measurement error (for a review, see Cole, Martin, & Steiger, 2005). In contrast to latent growth curve models, hierarchical models have the benefit of easily incorporating unequal numbers of observations per participant, as well as placing the emphasis on predicting the criterion variable. A range of other approaches include iterative procedures to explore set points (Cummins et al., 2014), models designed to capture changes in test–retest structure over time (Fraley & Roberts, 2005), and models of momentary measurement error and short to medium-term response biases (Ehrhardt et al., 2000). Despite the popularity and insights gained from traditional hierarchical and latent growth curve approaches, they both have several limitations. First, many such models are incorporated into software which makes assumptions that are both difficult to modify and inappropriate for SWB data. For example, individuals differ in within-person variability, but standard models assume that variability is constant over individuals. Second, the data generating process implied by such models is rarely evaluated in terms of whether it captures theoretically relevant features of longitudinal SWB data, as described earlier. Such features include degree of change, distributions of individual scores, and distribution of person-level means. Third, models are only sometimes compared, which in turn raises a number of challenges related to evaluating model complexity. To overcome these limitations, a Bayesian data analytic approach provides a promising framework for refining longitudinal models of SWB. 1.3. Bayesian hierarchical modeling Bayesian hierarchical methods are increasingly applied in psychology to model repeated measures data (e.g., Anglim and Wynton, 2015, Averell and Heathcote, 2011, Lee, 2008 and Nikodijevic et al., 2015). Adoption of Bayesian methods has been aided by increased computational power, refinement of algorithms, accessible software (e.g., WingBugs, JAGS, and Stan) and textbooks relevant to a general applied quantitative audience (e.g., Gelman and Hill, 2007, Gelman et al., 2013 and Kruschke, 2010). While Elliott et al. (2005) performed a Bayesian analysis of short term mood data, we are not aware of any attempt to apply Bayesian hierarchical methods to the study of long-term temporal dynamics of SWB. The Bayesian hierarchical approach incorporates all the advantages of standard hierarchical modeling, but also offers several additional benefits. First, it allows substantial flexibility in defining the probability model proposed to underlie the data generating process. For example, the distribution of residuals is not required to be constant or normal. Similarly, the distribution of person-level parameters is not required to be normal. Second, the Bayesian approach provides a useful set of model evaluation tools. In particular, posterior predictive checks provide a powerful way of defining theoretical properties of interest and assessing whether a candidate model adequately recovers these features. This moves beyond simple measures of fit, and seeks to assess the model on a range of theoretically important features using posterior predictive checks (for an overview, see Gelman et al., 1996 and Kruschke, 2013). The checks involve (a) defining a set of features of the sample data that a model should capture; (b) simulating data from the model; and (c) evaluating the model based on whether it produces simulated data that capture the features of the sample data. While general measures of fit are typically based on the global likelihood of the model, often with a penalty based on model complexity, posterior predictive checks allow more weight to be assigned to particular features when evaluating model performance. Such checks also can provide greater guidance regarding how the model could be improved. They also highlight the relative strengths of different models. For example, posterior predictive checks have been applied to models of SAT scores (Sinharay & Stern, 2003) and learning (Anglim & Wynton, 2015). To inform this approach, we now consider what features of longitudinal SWB data should be captured by a successful model. 1.4. Features of the temporal dynamics of SWB There are several important temporal dynamics of SWB that a comprehensive model should capture. First, there is the distribution of person-level means. While people normally have positive levels of SWB, there is a general negative skew. Second, there is the distribution of person-level standard deviations. That is, people differ in the extent to which they fluctuate around their set-point over time (Cummins et al., 2014) and thus individual SWB levels differ from occasion to occasion. This is a neglected feature of emotional dynamics in many statistical models. Additional consideration needs to be given to the relationship between mean levels of SWB and the standard deviation. This particularly applies to the degree to which lower standard deviations are caused by scale range restriction, whereby people with means closer to scale end points show less variability. Third, there is within-person change in terms of what we label stochastic and systematic change. Stochastic change refers to random change from a previous state, and is typically represented by some form of lag parameter. Systematic change refers to long-term trends that are commonly represented using linear or quadratic models, although many functional forms are possible. Finally, there is the distribution of observation-level residuals. Such residuals are likely to be negatively skewed, showing that while people are generally positive, they may also experience short term periods where SWB is much lower than their person-level mean. 1.5. The present study The primary aim of the present study is to demonstrate how the Bayesian approach can be used as a framework to refine and evaluate models of the long-term temporal dynamics of SWB. Illustrating the flexibility of the Bayesian approach, we propose several innovations to existing models of SWB. The primary innovations are to (a) allow within-person error variance to vary between people; (b) include both lag and polynomial trend effects in one model; and (c) include a probit transformation of SWB. The focus of analyses is on comparing the 16 models that result by crossing the four model features of within-person error variance (fixed or random), lag effect (present or absent), polynomial effects (random linear, and quadratic excluded or included), and transformation (probit transformed or untransformed). These models were applied to data from 10 waves of annually collected longitudinal SWB data. The sample size, the number of waves, the use of a multi-item scale to measure SWB, and the use of a representative sample with a broad cross-section of ages, made the longitudinal dataset well-suited to evaluate the competing models. Traditional models have often incorporated polynomial trends or lag effects but almost never incorporated random error or formally evaluated implications of transformations. Thus, we used comparisons of model fits and posterior predictive checks to assess the degree to which random within-person error and probit transformations resulted in superior model fit and whether these modifications altered the importance of lag or polynomial effects. We predicted that using a probit transformation and allowing within-person error variance to vary between people would substantially improve model fit. The need for lag or polynomial effects was more of an open question. Whether such parameters are required to parsimoniously represent the temporal dynamics of SWB will be a test-case for the idea of SWB set points.
نتیجه گیری انگلیسی
. Results 3.1. Descriptive statistics Cronbach’s alpha reliability was consistently high for the SWB scale. Calculated for each measurement occasion (i.e., 1–10) the mean alpha was .87 (range: .86–.89). A factor analysis of data from the first measurement occasion showed clear support for a one factor solution with item loadings ranging from .60 to .79. The intraclass correlation (ICC1) was .72 indicating that 72% of variance in SWB was due to differences between people. When each participant’s scores were averaged over all available times points, the mean was 7.50 (SD = 1.25, skew = −0.95) with 95% of participants having means on the positive side of the scale (i.e., >5.0). Within-person standard deviations were moderate and highly skewed with a mean of 0.66 (SD = 0.39, skew = 1.80). The correlation between person-level means and person-level standard deviations was r = −.48 indicating that lower levels of SWB were associated with greater volatility in SWB. Further descriptive statistics are provided in Table 4, but will be discussed further under posterior predictive checks. Fig. 1 shows the distributions of all observations, person-level means, deviations from person-level means (unstandardized and standardized, i.e., division by person-level SD) for both raw and probit transformed SWB. It highlights how the probit transformation reduces the skew in the person-level means. It also highlights the kurtosis in unstandardized deviations. However, after division by person-level standard deviations, kurtosis is less prominent. This adds support for allowing person-level standard deviations to vary as kurtosis of unstandardized observations may result from a mixture of normal distributions with different SDs. Histograms showing the distribution for observations, person-level means, and ... Fig. 1. Histograms showing the distribution for observations, person-level means, and deviation of observations from person-level means, and standardized (Z) deviations from person-level means for both raw and probit transformed SWB. Figure options Estimates of stability using test–retest correlations were examined for different subsets of the data based on number of waves provided. This involved comparing the correlation between baseline and one year with baseline and final wave. Test–retest correlations were as follows: (a) Five or more waves (baseline with 1 year: r = .77; baseline with 4 years r = .68, n = 1081; Δr=.77-.68=.09Δr=.77-.68=.09). (b) Seven or more waves (baseline with 1 year: r = .76; baseline with 6 years r = .67, n = 559, Δr=.09Δr=.09). (c) Nine or more waves (baseline with 1 year r = .76; baseline with 8 years r = .70, n = 189, Δr=.07Δr=.07). In summary, while using different subsets of participants and time-points yields slightly different estimates, in all cases the reduction in test–retest correlations over many years was small. This suggests that there is some, but not a lot, of either systematic or stochastic change. 3.2. General temporal trends Fig. 2 shows the relationship between age and SWB using all observations and a model fit with 95% credible interval for group-level data based on the probit transformed, polynomial, no lag, random error model. The plot highlights several features of well-being data. First, the model is negatively skewed with most observations in the positive range. Group-level model fits show that SWB generally fluctuated between about 0.2 or 0.3 either side of 7.5. In terms of age trends, SWB is lower for people aged in their 40s and early 50s, as has been commonly reported (e.g. Mastekaasa & Moum, 1984), followed by increases in their 60s and 70s and a decline in their early 80s. There are not enough observations to determine whether SWB is systematically higher in the 20s or lower in the late 80s. The relationship between age and subjective well-being. Points are raw data ... Fig. 2. The relationship between age and subjective well-being. Points are raw data after a small amount of random noise has been applied to reduce the overlap of points and better highlight the density of the data. Line of best fit and error band represents predictions and 95% credible intervals for estimates of group-level age effect based on probit transformed, polynomial, no lag, random error model. Predictions are truncated to only show data with at least 100 data points above or below the given age. Figure options Fig. 3 shows the relationship between measurement occasion and SWB scores, at the individual-level, for a random sample of 15 participants who provided 10-years of data. Focusing only on the pattern of the data, the overall impression is one of underlying stability, with clear individual differences in the average levels of SWB. There are also a small number of cases that seem to incrementally change relative to their previous observations. This pattern is consistent with some systematic influence. Finally, there are a few instances where SWB dropped abruptly and was typically restored on the subsequent measurement occasion. Relationship between measurement occasion and SWB for a random sample of ... Fig. 3. Relationship between measurement occasion and SWB for a random sample of participants who provided data for all 10 measurement occasions. Each cell is one participant where participant ID is shown above. The same participants are used in both (a) and (b). These two models represent the two best fitting models. In each cell the dark line indicates the expected value and the shaded area is the 95% credible interval for the given participant on the given measurement occasion based on the probit transformed, no lag, random error model with either (a) only a random-intercept or (b) random polynomial (i.e., random intercept, linear, and quadratic terms). Figure options 3.3. Model estimates The set of models were estimated based on (a) whether or not a probit transformation was applied; (b) whether a lag effect was estimated; (c) whether person-level error was allowed to vary over people; and (d) whether random linear and quadratic terms were included. Table 1 presents model fit statistics for all models. While deviance based measures of fit cannot be compared across untransformed and transformed variable models, the general pattern of fit statistics was similar. Allowing the within-person standard deviation to vary over people led to a large reduction in deviance, and despite the penalty associated with estimating additional parameters, the Deviance Information Criteria (DIC) was also much less. Inclusion of a lag effect increased the deviance and DIC. Adding random linear and quadratic terms led to a massive reduction in the deviance, but only a small reduction in DIC and this reduction in DIC appeared somewhat smaller for the transformed data. Table 1. Overall model fit statistics for candidate models. Untransformed Transformed Deviance Penalty DIC Deviance Penalty DIC No lag Fixed error intercept 16,730 1024 17,754 −1851 1024 −827 Random error intercept 13,873 2000 15,873 −3848 2083 −1765 Fixed error polynomial 15,626 1708 17,334 −3259 2032 −1227 Random error polynomial 12,709 2992 15,702 −5409 3565 −1845 Lag Fixed error intercept 16,981 1030 18,010 −1602 1029 −573 Random error intercept 14,265 1944 16,210 −3517 2067 −1450 Fixed error polynomial 16,107 1559 17,666 −3102 1998 −1104 Random error polynomial 13,348 2713 16,061 −5156 3477 −1680 Note. Model fit statistics cannot be compared across transformed and untransformed data. Models with smaller deviance are better fitting and models with larger penalties are more complex. DIC incorporates both fit and complexity whereby smaller models are generally preferred. Smallest DIC for transformed and untransformed data are shown in bold. Table options Thus, based on DIC the preferred model had random error, no lag effect, and a random polynomial effect. In general, including both the random linear and quadratic effects gives the model a great deal more flexibility. The improvement in fit from including the random polynomial effect was less for random error models than for fixed error models, and for transformed rather than untransformed data. It may be that this flexibility is needed more for the untransformed and fixed error data in order to recover from other problems related to outliers and skewness. Furthermore, the improvement in DIC from including the polynomial effect was relatively small compared to whether random error or the lag effect was included. Parameter estimates for the random error, no lag, random intercept and random polynomial models are presented in Table 2 for the raw scale and Table 3 for the probit transformed scale. For all models the mean within-person standard deviation (mean of σiσi) is a little over half the standard deviation of person-intercepts σβinterceptσβintercept highlighting that between person variability in SWB was much greater than within-person variability. The ratio of the standard deviation of σiσi to the mean of σiσi captures how much individuals differed in their within-person standard deviations. In particular, it was slightly larger for the untransformed model 0.49 (0.34/0.69) than for the transformed model 0.40 (0.08/0.20). The negative correlation between intercept and SD between-person coefficients was also smaller for the probit transformed variable. Table 2. Parameter estimates for untransformed, random error, no lag models. Intercept Polynomial M Lower CI Upper CI M Lower CI Upper CI Mean of intercepts μβinterceptμβintercept 7.56 7.48 7.64 7.54 7.46 7.62 SD of intercepts μβSDμβSD −0.47 −0.50 −0.44 −0.55 −0.58 −0.51 Mean of SDs σiσi 0.69 0.68 0.71 0.65 0.64 0.67 SD of intercepts σβinterceptσβintercept 1.19 1.13 1.24 1.19 1.13 1.25 SD of linear σβlinearσβlinear 0.09 0.08 0.10 SD of quadratic σβquadraticσβquadratic 0.04 0.03 0.04 SD of Beta SDs σβSDσβSD 0.45 0.42 0.48 0.48 0.45 0.51 SD of SDs σiσi 0.34 0.32 0.36 0.34 0.32 0.37 cor(βintercept,βSD)cor(βintercept,βSD) −0.68 −0.73 −0.62 −0.67 −0.72 −0.62 Age linear effect θθ 0.008810 0.003640 0.013850 0.012490 0.006420 0.018900 Age quadratic θθ −0.000410 −0.000620 −0.000210 −0.000280 −0.000510 −0.000050 Age cubic θθ −0.000020 −0.000030 −0.000010 −0.000020 −0.000030 −0.000010 Note. M, Lower CI, and Upper CI are the mean and lower and upper 95% credible intervals of the posterior density estimate of parameters. Intercept, linear and quadratic effects refer the random effects of time, whereas age effects are fixed effects of age. Table options Table 3. Parameter estimates for probit transformed, random error, no lag models. Intercept Polynomial M Lower CI Upper CI M Lower CI Upper CI Mean of intercepts μβinterceptμβintercept 0.67 0.65 0.69 0.66 0.64 0.69 SD of intercepts μβSDμβSD −1.68 −1.71 −1.65 −1.79 −1.82 −1.75 Mean of SDs σiσi 0.20 0.20 0.20 0.18 0.18 0.18 SD of intercepts σβinterceptσβintercept 0.34 0.33 0.36 0.34 0.33 0.36 SD of linear σβlinearσβlinear 0.04 0.03 0.04 SD of quadratic σβquadraticσβquadratic 0.02 0.02 0.02 SD of Beta SDs σβSDσβSD 0.36 0.34 0.39 0.39 0.36 0.42 SD of SDs σiσi 0.08 0.07 0.08 0.07 0.07 0.08 cor(βintercept,βSD)cor(βintercept,βSD) −0.21 −0.29 −0.13 −0.21 −0.30 −0.13 Age linear effect θθ 0.002490 0.000870 0.004080 0.004250 0.002280 0.006180 Age quadratic θθ −0.000160 −0.000230 −0.000100 −0.000120 −0.000200 −0.000040 Age cubic θθ −0.000010 −0.000010 0.000000 −0.000010 −0.000010 0.000000 Note. M, Lower CI, and Upper CI are the mean and lower and upper 95% credible intervals of the posterior density estimate of parameters. Intercept, linear and quadratic effects refer the random effects of time, whereas age effects are fixed effects of age. Table options The group-level age effects were centered on the rounded mean age of all observations (62 years). Thus, the mean of the intercept (μβinterceptμβintercept) can be interpreted as the expected SWB value at that age. Linear and quadratic terms were small and significant in both transformed and untransformed models, while the cubic term was only significant in the untransformed model. The group-level trend line is summarized in Fig. 2. As mentioned above, mean deviance and DIC was not reduced by the inclusion of the lag parameter. Given the emphasis of existing trait-state-error models in the literature on an auto-regressive term, we further investigated this result. Across the eight models estimates of the lag effect ranged from .03 to .24. In general random error models had a slightly larger lag effect (.03–.05 larger than fixed error models), and models that included the random polynomial effect had a much smaller lag effect (.10–.17 smaller than intercept models). For the random error probit intercept model, when a lag effect was included the estimate of the lag effect was .24, 95% CI [.20, .28]. Given that it was a significant lag effect, it was surprising that mean deviance was greater when a lag effect was added to the model. To further investigate this result, we temporarily fixed the lag effect to be the sample estimate (.24), but the model without the lag effect still had a much lower mean deviance. Examination of model fits showed how the small changes in predictions implied by the lag effect did not visually seem to capture the model predictions. One plausible explanation of this result is that our approach does not treat the time before the first observation as known. While we could have discarded the first observation as outcome data and used it as the basis for estimating the lag effect, such a lag effect would be confounded by the fact that this first time point would contribute substantially to the reliability of estimation of the participant-level mean and thus should be positive purely because of this artifact. As a comparison, we also compared parameter estimates using the above Bayesian approach with standard frequentist multilevel modeling approaches using the lme4 package in R (Bates, Maechler, Bolker, & Walker, 2013) in general and the nlme package in R (Pinheiro, Bates, DebRoy, & Sarkar, 2013) to examine the lag model. For the random intercept models without lag effects the parameter estimates the frequentist estimates were almost identical to the corresponding Bayesian estimates. Lag models and linear and quadratic models generally required some re-specification to be implemented but parameter estimates were similar. The models with random standard deviations could not readily be implemented. 3.4. Individual-level model predictions and credible intervals Fig. 3 shows the model fits and 95% credible intervals for the two best performing models, i.e., the probit transformed, no lag, random error models with either (a) a random intercept or (b) a random polynomial. The credible intervals show how these two models capture individual differences in within-person variation. They also capture the skewness of that variation, as person-level means deviate from the scale mid-point. Comparing the model fits for the two models is particularly useful as the random intercept model embodies a strict form of set-point theory whereas the random polynomial model allows for systematic trends over multiple years. It is worth noting that the random intercept model does allow for small changes over time due to group-level age related changes. Overall, there are several cases where the polynomial model appears to provide improved fit to the data. However, the polynomial model also has substantially greater flexibility which may result in curves capturing noise. Table 1 shows how the polynomial model resulted in a massive reduction in the model deviance, but only a small reduction in the parsimony adjusted measure of fit, DIC. The model fits in Fig. 3 appear consistent with this interpretation whereby polynomial curves are fitting a mixture of systematic change and noise in the data. 3.5. Posterior predictive checks To further evaluate the proposed models we ran posterior predictive checks. Interpreting posterior predictive checks involves first interpreting the sample statistics of interest when applied to the sample data and then examining the degree to which the models are able to generate simulated data with statistics consistent with the sample data. 3.5.1. Statistics on sample data The sample data (see the “dataset” column of Table 4) shows that the typical within-person standard deviation (0.66) was around half the standard deviation of the means (1.25). It also shows that the SD of within person SDs (0.39) is fairly large relative to the mean of the within-person SDs (0.66), thereby suggesting the importance of including a parameter that captures this variability. The mean of means of 7.50 combined with the SD of within-person means (1.25) is consistent with previous reports that 95% of the data from such samples lie above the scale arithmetic average of 5.0 due to the combined influence of SWB set points (Cummins et al., 2014) and homeostasis (Cummins, 2015). Table 4. Posterior predictive checks for models: Untransformed data. Statistic Dataset No lag Lag Intercept Polynomial Intercept Polynomial FE RE FE RE FE RE FE RE Mean(meani) 7.50 7.50 7.50 7.51 7.51 7.50 7.51 7.51 7.50 SD(meani) 1.25 1.27 1.22 1.26 1.23 1.26 1.21 1.26 1.21 Skew(meani) −0.94 −0.01 −0.11 0.00 −0.10 0.00 −0.18 0.00 −0.14 Mean(SDi) 0.66 0.73 0.66 0.73 0.67 0.72 0.66 0.73 0.67 SD(SDi) 0.39 0.23 0.38 0.23 0.38 0.23 0.38 0.23 0.38 Mean(AR1i) −0.06 −0.15 −0.15 −0.09 −0.08 −0.06 −0.04 −0.06 −0.04 Mean(skewi) −0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Mean(outlieri) 1.08 1.16 1.05 1.17 1.07 1.15 1.06 1.16 1.06 SD(outlieri) 0.72 0.40 0.64 0.43 0.64 0.41 0.64 0.42 0.64 SD(beta_agei) 0.22 0.18 0.18 0.20 0.19 0.20 0.21 0.20 0.20 Cor(meani, SDi) −0.48 0.00 −0.50 0.00 −0.48 0.00 −0.51 0.00 −0.49 Note. FE = Fixed error (within person SD); RE = Random error (within person SD). Values indicate mean statistic for samples generated from posterior density estimates. Dataset is the statistic for the sample data. Mean, SD, Skew and Cor are functions of participant-level statistics. Arguments in parentheses with subscript i indicate that it is a property of the individual (e.g., mean SWB score for ith individual). See Section 2 for details. Table options The absence of a positive lag 1 autocorrelation is somewhat surprising (mean lag 1 autocorrelation of −.06). Several theories suggest that SWB stability involves restoring values to a base-line or set point. Additionally, there is much evidence that life events can temporarily alter well-being, and that, if the level of challenge is strong and maintained, the reduced level of SWB can become chronic. Where such effects are operating, a positive lag1 autocorrelation would be expected. This is not evident in these data. Negative skew was present in both between and within-person-levels. However, the skew in between-person means (−0.94) was much larger than the mean skew of within-person data (−0.17). Thus, while people occasionally experience SWB outside their set point-range, they are more likely to experience a chronic reduction rather than a chronic elevation. Interpreting the outlier statistic is a little more challenging, but it does indicate that average maximum deviation from a person’s mean is 1.08 units, which is a little under double the mean of SDs. It may also be that the SD of the outliers is a little larger than one might normally expect based on normal distribution assumptions. This reflects the non-normality of the actual data distribution, the fact that few people have substantial outliers, and is in conformity with homeostasis theory. When a linear regression was fit to individual SWB trajectories, the mean linear change was close to zero, but at the individual-level some values increased and some decreased. However, a certain amount of variability in individual-level linear change is due to error in estimating such a coefficient for each participant on only 5–10 data points. The correlation between the within-person means and within-person SDs was −.48. This shows that that greater variability is associated with lower means. While this is partially due to scale constraints, this is only part of the reason. Specifically, scale constraints dictate that the maximum possible standard deviation for person i on a 0–10 scale is View the MathML sourcey¯i.(10-y¯i.) where View the MathML sourcey¯i. is the person-level mean for person i. To inspect the degree to which the correlation between means and SDs was driven by scale constraints, we compared the sample correlation between person-level means and (a) raw person-level SDs (r = −.48) or (b) person-level SDs divided by the maximum possible SD given the person’s scale mean (r = −.25). Thus, about half of the correlation was explained by scale constraints. The remainder is consistent with prediction based on homeostasis theory. 3.5.2. Model recovery of statistics Results of the posterior predictive checks for candidate models are presented in Table 4 for untransformed data and Table 5 for transformed data. The values for the models are the mean statistic for the posterior samples, and statistics are bolded where they appear to capture the sample statistic or are notably better than other models at capturing the sample statistic. This bolding is purely a device to facilitate interpretation and a more sophisticated interpretation should emphasize the proximity of the statistic of the model simulated data to the sample dataset. Table 5. Posterior Predictive Checks for Models: Probit Transformed Data. Statistic Dataset No Lag Lag Intercept Polynomial Intercept Polynomial FE RE FE RE FE RE FE RE Mean(meani) 7.50 7.48 7.49 7.48 7.49 7.48 7.50 7.48 7.49 SD(meani) 1.25 1.22 1.22 1.23 1.23 1.22 1.21 1.23 1.22 Skew(meani) −0.94 −0.54 −0.56 −0.54 −0.55 −0.54 −0.57 −0.55 −0.56 Mean(SDi) 0.66 0.70 0.66 0.73 0.70 0.69 0.66 0.73 0.69 SD(SDi) 0.39 0.26 0.37 0.29 0.37 0.26 0.37 0.29 0.37 Mean(AR1i) −0.06 −0.15 −0.15 −0.05 −0.03 −0.06 −0.05 −0.04 −0.01 Mean(skewi) −0.17 −0.12 −0.10 −0.12 −0.11 −0.12 −0.10 −0.12 −0.11 Mean(outlieri) 1.08 1.11 1.05 1.18 1.12 1.10 1.04 1.17 1.11 SD(outlieri) 0.72 0.45 0.61 0.54 0.64 0.46 0.61 0.53 0.63 SD(beta_agei) 0.22 0.17 0.17 0.21 0.20 0.20 0.20 0.21 0.20 Cor(meani, SDi) −0.48 −0.52 −0.48 −0.49 −0.49 −0.51 −0.48 −0.49 −0.49 Note. FE = Fixed error (within person SD); RE = Random error (within person SD). Values indicate mean statistic for samples generated from posterior density estimates. Dataset is the statistic for the sample data. Mean, SD, Skew and Cor are functions of participant-level statistics. Arguments in parentheses with subscript i indicate that it is a property of the individual (e.g., mean SWB score for ith individual). See Section 2 for details. Table options Several major points can be made. First, models where standard deviations were allowed to vary were better able to capture within-person standard deviations, outliers, and the correlation between the mean and standard deviations. Interestingly the correlation between person-level means and person-level standard deviations was captured with a fixed standard deviation when the probit transformation was applied. Second, the autocorrelation and SD of linear change statistics provide indices of the within-person change. Without a lag or polynomial parameter, the lag 1 autocorrelation and the SD of linear change was slightly underestimated. Introduction of either the lag effect or the polynomial terms seemed sufficient to capture these statistics. Third, transformed data were much better at capturing the skewness in the data. That said, while the untransformed model implies that there is no skew, the probit transformed model captured about two thirds of the skew in the sample data. Thus, overall, the random error transformed models with lag or polynomial effects performed best in the posterior predictive checks as determined by greater recovery of sample statistics. The transformed intercept-only random error model preferred by DIC also performed well, but appeared to fail to capture small amounts of more systematic within-person change. Based on principles of parsimony, the posterior predictive checks would favor the lag model over the polynomial because the lag model involves a single fixed parameter, whereas the polynomial model is estimating two additional parameters per participant.