A subsampled double bootstrap for massive data
Loading...
Date
2015
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The bootstrap is a popular and powerful method for assessing precision
of estimators and inferential methods. However, for massive datasets which are increasingly
prevalent, the bootstrap becomes prohibitively costly in computation and
its feasibility is questionable even with modern parallel computing platforms. Recently
Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BLB (Bag
of Little Bootstraps) for massive data which is more computationally scalable with
little sacrifice of statistical accuracy. Building on BLB and the idea of fast double
bootstrap, we propose a new resampling method, the subsampled double bootstrap,
for both independent data and time series data. We establish consistency of the subsampled
double bootstrap under mild conditions for both independent and dependent
cases. Methodologically, the subsampled double bootstrap is superior to BLB in terms
of running time, more sample coverage and automatic implementation with less tuning
parameters for a given time budget. Its advantage relative to BLB and bootstrap is
also demonstrated in numerical simulations and a data illustration.
Description
Table of contents
Keywords
big data, resampling, subsampling, computational cost