Gini coefficient is among the most popular and widely used measures of income inequality in economic studies, with various extensions and applications in finance and other related areas. This paper studies confidence intervals on the Gini coefficient for simple random samples, using normal approximation, bootstrap percentile, bootstrap-t and the empirical likelihood method. Through both theory and simulation studies it is shown that the intervals based on normal or bootstrap approximation are less satisfactory for samples of small or moderate size than the bootstrap-calibrated empirical likelihood ratio confidence intervals which perform well for all sample sizes. Results for stratified random sampling are also presented.
Income inequality has long been an active research area in economic studies. Among various measures of income inequality proposed in the statistical and economic literature, the Gini coefficient, G, is probably the most popular and widely used measure. It was originated from Gini's mean difference ( Gini, 1912 and Gini, 1936), and is closely related to the Lorenz curve, the popular measure for the size distribution of income and wealth. Lorenz curves are also widely used in economic analysis (Kakwani, 1977).
Let F(y) = P(Y ≤ y) be the cumulative distribution function of a nonnegative continuous random variable Y. We will refer to Y as the income variable. Let X and Y be two independent random variables following the same distribution F(y). The Gini mean difference is then defined as
View the MathML sourceD=E∣∣X−Y∣∣=∫+∞0∫+∞0∣∣x−y∣∣dF(x)dF(y).
Turn MathJax on
The value of D is the average absolute difference of incomes of two randomly selected individuals and hence reflects the income inequality in the population. Noting that 0 ≤ D ≤ 2μ, where μ = E(Y) = ∫ 0 + ∞ ydF(y) is the population mean income, the Gini coefficient, G, is defined as the normalized mean difference, i.e., G = D / (2μ) ∈ [0, 1], which can be equivalently written as (David, 1968)
equation(1)
View the MathML sourceG=1μ∫+∞0{2F(y)−1}ydF(y).
Turn MathJax on
The Gini coefficient is also closely related to another popular measure of income inequality, the Lorenz curve (Lorenz, 1905 and Sendler, 1979). Let F − 1(t) = inf{ξ : F(ξ) ≥ t} for t ∈ [0, 1]. The Lorenz curve based on the income distribution F(⋅) is then defined as
View the MathML sourceL(α,F)=1μ∫α0F−1(t)dt=1μ∫F−1(α)0xdF(x)
Turn MathJax on
for α ∈ [0, 1]. The Gini coefficient G is equal to twice the area between a 45-degree line and the Lorenz curve, i.e., View the MathML sourceG=2{0.5−∫10L(α,F)dα}.
There exists an extensive literature on the Gini measure of income inequality. In addition to various applications and extensions in economic studies, statistical investigations focused largely on variance estimation; see, for instance, Glasser, 1962, Sandström et al., 1985, Sandström et al., 1988, Yitzhaki, 1991 and Karagiannis & Kovacevic, 2000, among others. In particular, Yitzhaki (1991) calculated jackknife variance estimators of the plug-in moment estimator, View the MathML sourceGˆ, of G, under simple random sampling and stratified random sampling. However, confidence intervals for the Gini coefficient have not been studied by previous authors, with the exception of Sandström et al. (1988) where 95% normal approximation confidence intervals based on three variance estimators were briefly mentioned.
This paper presents confidence intervals on the Gini coefficient, G, using normal and bootstrap approximations and empirical likelihood (EL) based methods. We first consider the case of independent and identically distributed (iid) samples (or simple random samples when the sampling fraction is negligible), and then extend the results to stratified random sampling. In Section 2, we establish the asymptotic normality of the point estimator, View the MathML sourceGˆ, of G and construct confidence intervals on G based on the normal approximation. Confidence intervals on G based on the bootstrap percentile and the bootstrap-t methods are also given. In Section 3, the limiting distribution of the EL ratio statistic is established and the EL ratio confidence intervals are presented. A bootstrap-calibrated EL confidence interval on G is also presented. Results of a limited simulation study on the finite sample performance of the proposed confidence intervals are reported in Section 4. Extensions to stratified random sampling are outlined in Section 5. Proofs of theorems are relegated to Appendix A.