Multi-armed Bandit Allocation Indices

by Gittins, John; Glazebrook, Kevin; Weber, Richard

Edition: 2nd

ISBN13: 9780470670026

ISBN10: 0470670029

Format: Hardcover

Pub. Date: 2011-03-21

Publisher(s): Wiley

Other versions by this Author

List Price: ~~$148.21~~

Buy New

Usually Ships in 8 - 10 Business Days.

$147.47

Add to Cart

Rent Textbook

Select for Price

Add to Cart

There was a problem. Please try again later.

Used Textbook

We're Sorry
Sold Out

eTextbook

We're Sorry
Not Available

Buy from our Marketplace starting at $131.27

Summary

Statisticians are familiar with bandit problems, operational researchers with scheduling problems, and economists with problems of resource allocation. Most such problems are computationally intractable and cannot be solved in polynomial time, meaning that accurate solutions are unobtainable except for small-scale problems. This is particularly true under conditions of uncertainty. This book shows that there is, however, a large class of allocation problems for which the optimal solution is expressible in terms of a priority index which is defined for each of the competing projects independently of the properties of the other projects. Such problems are therefore solved once the appropriate index has been found. In some cases there is a concise formula for the index; at worst it can usually be determined by a manageable calculation. Since the discovery of the index, which has become known as the Gittins index, its properties and its range of applicability have been worked out in some detail. This book gives an account of these developments and includes extensive tables of index values. The Gittens Index has had a great influence on the analysis of cost benefit trade-offs in a range if areas, from computer science and engineering to finance and marketing. This book re-introduces the topic and illustrates its relevance to modern statistical, economic and operations research projects.

Author Biography

John Gittins, Statistics Department, University of Oxford, UK

Kevin Glazebrook, Department of Management Science, Lancaster University, UK

Richard Weber, Statistical Laboratory, University of Cambridge, UK

Foreword	p. ix
Foreword to the first edition	p. xi
Preface	p. xiii
Preface to the first edition	p. xv
Introduction or exploration	p. 1
Exercises	p. 16
Main ideas: Gittins index	p. 19
Introduction	p. 19
Decision processes	p. 20
Simple families of alternative bandit processes	p. 21
Dynamic programming	p. 23
Gittins index theorem	p. 24
Gittins index	p. 28
Gittins index and the multi-armed bandit	p. 28
Coins problem	p. 29
Characterization of the optimal stopping time	p. 30
The restart-in-state formulation	p. 31
Dependence on discount factor	p. 32
Myopic and forwards induction policies	p. 32
Proof of the index theorem by interchanging bandit portions	p. 33
Continuous-time bandit processes	p. 36
Proof of the index theorem by induction and interchange argument	p. 40
Calculation of Gittins indices	p. 43
Monotonicity conditions	p. 44
Monotone indices	p. 44
Monotone jobs	p. 45
History of the index theorem	p. 47
Some decision process theory	p. 49
Exercises	p. 50
Necessary assumptions for indices	p. 55
Introduction	p. 55
Jobs	p. 56
Continuous-time jobs	p. 58
Definition	p. 58
Policies for continuous-time jobs	p. 58
The continuous-time index theorem for a SFABP of jobs	p. 61
Necessary assumptions	p. 61
Necessity of an infinite time horizon	p. 61
Necessity of constant exponential discounting	p. 62
Necessity of a single processor	p. 63
Beyond the necessary assumptions	p. 64
Bandit-dependent discount factors	p. 64
Stochastic discounting	p. 66
Undiscounted rewards	p. 68
A discrete search problem	p. 70
Multiple processors	p. 73
Exercises	p. 76
Superprocesses, precedence constraints and arrivals	p. 79
Introduction	p. 79
Bandit superprocesses	p. 80
The index theorem for superprocesses	p. 83
Stoppable bandit processes	p. 88
Proof of the index theorem by freezing and promotion rules	p. 90
Freezing rules	p. 93
Promotion rules	p. 95
The index theorem for jobs with precedence constraints	p. 97
Precedence constraints forming an out-forest	p. 102
Bandit processes with arrivals	p. 105
Tax problems	p. 106
Ongoing bandits and tax problems	p. 106
Klimov's model	p. 108
Minimum EWFT for the M/G/1 queue	p. 110
Near optimality of nearly index policies	p. 111
Exercises	p. 113
The achievable region methodology	p. 115
Introduction	p. 115
A simple example	p. 116
Proof of the index theorem by greedy algorithm	p. 119
Generalized conservation laws and indexable systems	p. 124
Performance bounds for policies for branching bandits	p. 132
Job selection and scheduling problems	p. 136
Multi-armed bandits on parallel machines	p. 139
Exercises	p. 147
Restless bandits and Lagrangian relaxation	p. 149
Introduction	p. 149
Restless bandits	p. 150
Whittle indices for restless bandits	p. 152
Asymptotic optimality	p. 155
Monotone policies and simple proofs of indexability	p. 155
Applications to multi-class queueing systems	p. 159
Performance bounds for the Whittle index policy	p. 162
Indices for more general resource configurations	p. 169
Exercises	p. 171
Multi-population random sampling (theory)	p. 173
Introduction	p. 173
Jobs and targets	p. 179
Use of monotonicity properties	p. 181
General methods of calculation: use of invariance properties	p. 185
Random sampling times	p. 195
Brownian reward processes	p. 201
Asymptotically normal reward processes	p. 205
Diffusion bandits	p. 210
Exercises	p. 211
Multi-population random sampling (calculations)	p. 213
Introduction	p. 213
Normal reward processes (known variance)	p. 213
Normal reward processes (mean and variance both unknown)	p. 218
Bernoulli reward processes	p. 221
Exponential reward processes	p. 225
Exponential target process	p. 229
Bernoulli/exponential target process	p. 234
Exercises	p. 239
Further exploitation	p. 241
Introduction	p. 241
Website morphing	p. 242
Economics	p. 243
Value of information	p. 244
More on job-scheduling problems	p. 244
Military applications	p. 245
References	p. 249
Tables	p. 261
Index	p. 285
Table of Contents provided by Ingram. All Rights Reserved.