Parametric and Nonparametric Learners for Adaptive Experiments Under Interference
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Alternative Title(s)
Abstract
This dissertation develops a framework and a family of both parametric and nonparametric algorithms for the multi-armed bandit (MAB) problem under batched interference. In the first chapter we start with a rigorous overview of the stochastic bandit problem without interference and review the classic Thompson sampling algorithm (Thompson 1933) with the Bayesian regret analysis. In the second chapter, as our first novel contribution, we formalize the stochastic bandit problem under batched interference and define the stochastic bandit, the arm vectors that target different optimals, and the corresponding notions of regret. We then propose four parametric extensions of Thompson sampling, we call them the i.i.d., global-one arm, clustered-one arm, and clustered-multi arm learners, each differing in its strengths across different interference scenarios. Finally, we give some Bayesian regret analysis for the global-one arm learner and end the chapter with numerical simulations that show the strengths and weaknesses of each algorithm. Parametric learners depend on parametric assumptions. In Chapter 3, we propose nonparametric versions of our learners based on Nonparametric Thompson Sampling (Riou and Honda 2020) and the Subsampling Duelling Algorithm (Baudry, Kaufmann and Maillard 2020). These are non-trivial extensions and very flexible algorithms without any parametric assumptions. In the fourth and final chapter we turn to application, we study the problem of learning optimal prices under interference. This is motivated by a real life example of street hawkers in Bangladesh. Here a buyer's purchase decision is influenced by prices offered to other buyers she is socially connected to. We apply both the parametric and nonparametric learners in this setting that maximizes the seller's revenue over a horizon. Together, our contributions open new connections among bandit algorithms, interference, and dynamic pricing that may be useful in practice.
Description
Table of contents
Keywords
Multi-armed bandit, Adaptive experiments, Thompson sampling, Interference, Clustering, Nonparametric bandits, Resampling, Dynamic pricing
Subjects based on RSWK
Multi-armed bandit, Resampling, Algorithmus, Anwendung, Lernendes System, Dynamic Pricing
