Parametric and nonparametric learners for adaptive experiments under interference

Hossain, Shaikh Tanvir

Parametric and nonparametric learners for adaptive experiments under interference

Dateien

Dissertation_Hossain.pdf (6.24 MB)

Datum

2026

Autor:innen

Hossain, Shaikh Tanvir

Zusammenfassung

This dissertation develops a framework and a family of both parametric and nonparametric algorithms for the multi-armed bandit (MAB) problem under batched interference. In the first chapter we start with a rigorous overview of the stochastic bandit problem without interference and review the classic Thompson sampling algorithm (Thompson 1933) with the Bayesian regret analysis. In the second chapter, as our first novel contribution, we formalize the stochastic bandit problem under batched interference and define the stochastic bandit, the arm vectors that target different optimals, and the corresponding notions of regret. We then propose four parametric extensions of Thompson sampling, we call them the i.i.d., global-one arm, clustered-one arm, and clustered-multi arm learners, each differing in its strengths across different interference scenarios. Finally, we give some Bayesian regret analysis for the global-one arm learner and end the chapter with numerical simulations that show the strengths and weaknesses of each algorithm. Parametric learners depend on parametric assumptions. In Chapter 3, we propose nonparametric versions of our learners based on Nonparametric Thompson Sampling (Riou and Honda 2020) and the Subsampling Duelling Algorithm (Baudry, Kaufmann and Maillard 2020). These are non-trivial extensions and very flexible algorithms without any parametric assumptions. In the fourth and final chapter we turn to application, we study the problem of learning optimal prices under interference. This is motivated by a real life example of street hawkers in Bangladesh. Here a buyer's purchase decision is influenced by prices offered to other buyers she is socially connected to. We apply both the parametric and nonparametric learners in this setting that maximizes the seller's revenue over a horizon. Together, our contributions open new connections among bandit algorithms, interference, and dynamic pricing that may be useful in practice.

Schlagwörter

Multi-armed bandit, Adaptive experiments, Thompson sampling, Interference, Clustering, Nonparametric bandits, Resampling, Dynamic pricing

Schlagwörter nach RSWK

Multi-armed bandit, Resampling, Algorithmus, Anwendung, Lernendes System, Dynamic Pricing

URI

http://hdl.handle.net/2003/44960
http://dx.doi.org/10.17877/DE290R-26727

Sammlungen

Lehrstuhl Volkswirtschaftslehre (Makroökonomie)

Komplettanzeige

Parametric and nonparametric learners for adaptive experiments under interference

Dateien

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Sonstige Titel

Zusammenfassung

Beschreibung

Inhaltsverzeichnis

Schlagwörter

Schlagwörter nach RSWK

Zitierform

URI

Sammlungen

Befürwortung

Review

Ergänzt durch

Referenziert von