A subsampled double bootstrap for massive data
Lade...
Datum
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Verlag
Sonstige Titel
Zusammenfassung
The bootstrap is a popular and powerful method for assessing precision
of estimators and inferential methods. However, for massive datasets which are increasingly
prevalent, the bootstrap becomes prohibitively costly in computation and
its feasibility is questionable even with modern parallel computing platforms. Recently
Kleiner, Talwalkar, Sarkar, and Jordan (2014) proposed a method called BLB (Bag
of Little Bootstraps) for massive data which is more computationally scalable with
little sacrifice of statistical accuracy. Building on BLB and the idea of fast double
bootstrap, we propose a new resampling method, the subsampled double bootstrap,
for both independent data and time series data. We establish consistency of the subsampled
double bootstrap under mild conditions for both independent and dependent
cases. Methodologically, the subsampled double bootstrap is superior to BLB in terms
of running time, more sample coverage and automatic implementation with less tuning
parameters for a given time budget. Its advantage relative to BLB and bootstrap is
also demonstrated in numerical simulations and a data illustration.
Beschreibung
Inhaltsverzeichnis
Schlagwörter
big data, resampling, subsampling, computational cost
