In this paper we tackle the problem of estimating the power-law tail exponent of income distributions by using the Hill's estimator. A subsample semi-parametric bootstrap procedure minimizing the mean squared error is used to choose the power-law cutoff value optimally. This technique is applied to personal income data for Australia and Italy.
Since Pareto it has been recognized that a power-law provides a good fit for the distribution of high incomes [1]. The Pareto's law asserts that the complementary cumulative distribution View the MathML source, with y⩾u, where u>0 is the threshold value of the distribution and α>0 turns out to be some kind of index of inequality of distribution. The fit of such distribution is usually performed by judging the degree of linearity in a double logarithmic plot involving the empirical and theoretical distribution functions, in such a way that the estimation of u of the distribution does not seem to follow a neutral procedure. Moreover, recent studies have criticized the reliability of this geometrical method by showing that linear-fit based methods for estimating the power-law exponent tend to provide biased estimates, while the maximum likelihood estimation method produces more accurate and robust estimates [2] and [3]. Hill proposed a conditional maximum likelihood estimator for α based on the k largest order statistics for non-negative data with a Pareto's tail [4]. That is, if y[n]⩾y[n-1]⩾⋯⩾y[n-k]⩾⋯⩾y[1], with y[i] denoting the ith order statistic, are the sample elements put in descending order, then the Hill's estimator is
equation(1)
View the MathML source
Turn MathJax on
where n is the sample size and k an integer value in [1,n]. Unfortunately, the finite-sample properties of the estimator (Eq. (1)) depend crucially on the choice of k: increasing k reduces the variance because more data are used, but it increases the bias because the power-law is assumed to hold only in the extreme tail.
Over the last 20 years, estimation of the Pareto's index has received considerable attention in extreme value statistics [5]. All of the proposed estimators, including the Hill's estimator, are based on the assumption that the number of observations in the upper tail to be included, k, is known. In practice, k is unknown; therefore, the first task is to identify which values are really extreme values. Tools from exploratory data analysis, as the quantile-quantile plot and/or the mean excess plot, might prove helpful in detecting graphically the quantile y[n-k] above which the Pareto's relationship is valid; however, they do not propose any formal computable method and, imposing an arbitrary threshold, they only give very rough estimates of the range of extreme values.
Given the bias-variance trade-off for the Hill's estimator, a general and formal approach in determining the best k value is the minimization of the Mean Squared Error (MSE) between View the MathML source and the theoretical value α. Unfortunately, in empirical studies of data the theoretical value of α is not known. Therefore, an attempt to find an approximation to the sampling distribution of the Hill's estimator is required. To this end, a number of innovative techniques in the statistical analysis of extreme values proposes to adopt the powerful bootstrap tool to find the optimal number of order statistics adaptively [6], [7], [8] and [9]. By capitalizing on these recent advances in the extreme value statistics literature, in this paper we adopt a subsample semi-parametric bootstrap algorithm in order to make a reasonable and more automated selection of the extreme quantiles useful for studying the upper tail of income distributions and to end up at less ambiguous estimates of α. This methodology is described in Section 2 and its application to Australian and Italian income data [10] and [11] is given in Section 3. Some conclusive remarks are reported in Section 4.
In this paper, we have considered the problem of the estimation of the power-law tail exponent of income distributions and we have adopted a subsample semi-parametric bootstrap procedure in order to arrive at less ambiguous estimates of α. This methodology has been empirically applied to the estimation of personal income distribution data for Australia and Italy. The reliability and robustness of the results have been tested by running different repeated bootstrap replications and comparing the variability of the estimates through a jackknife method.
From the economic point of view, this technique for the estimation of the Pareto's tail index of income distribution is expected to allow a deeper understanding of both the way in which cyclical fluctuations in economic activity affect factor income shares and the channels through which these effects work through the size distribution of income, which are issues of relevance for the modeling of the income process in the high-end tail of the distribution.