BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260523T154841EDT-4086vwGM6B@132.216.98.100
DTSTAMP:20260523T194841Z
DESCRIPTION:Abstract\n\nRestless bandits are a class of sequential resource
  allocation problems concerned with allocating one or more resources among
  several alternative processes where the evolution of the processes is Mar
 kovian and depends on the resources allocated to them. In 1988\, Whittle d
 eveloped an index heuristic for restless bandit problems which has emerged
  as a popular solution approach due to its simplicity and strong empirical
  performance. The Whittle index heuristic is applicable if the model satis
 fies a technical condition known as indexability.\n\nIn this thesis\, we f
 ocus on three general setups: fully-observable\, partially-observable\, an
 d learning models. For the fully observable setup\, we present two general
  sufficient conditions for indexability and identify simpler to verify ref
 inements of these conditions. Afterwards\, we revisit a previously propose
 d algorithm called adaptive greedy algorithm which is known to compute the
  Whittle index for a subclass of restless bandits. We show that a generali
 zation of the adaptive greedy algorithm computes the Whittle index for all
  indexable restless bandits. We present an efficient implementation of thi
 s algorithm which can compute the Whittle index of an arm with K states in
  O(K3) computations. Finally\, we present a detailed numerical study which
  affirms the strong performance of the Whittle index heuristic.\n\nRegardi
 ng the partially observable setup\, we consider a restart bandits with two
  observational models: first\, the state of each bandit is not observable 
 at all\, and second\, the state of each bandit is observable only if it is
  chosen. For both models\, we show that the system is indexable. For the f
 irst model\, we derive a closed-form expression for the Whittle index. For
  the second model\, we propose an efficient algorithm to compute the Whitt
 le index by exploiting the qualitative properties of the optimal policy. W
 e present detailed numerical experiments which indicates that the Whittle 
 index policy outperforms myopic policy and can be close to optimal in diff
 erent setups.\n\nFor the learning setup\, we assume the true system dynami
 cs are unknown and present a Thompson sampling reinforcement learning (RL)
  algorithm which has a regret which scales polynomially with the number of
  arms. This contrasts with most RL algorithms where the cumulative regret 
 scales exponentially with the number of arms. We present two characterizat
 ions of the regret of the proposed algorithm with respect to the Whittle i
 ndex policy. Depending on the reward model and technical assumptions\, we 
 show that for a restless bandit with n arms and at most m activations at e
 ach time\, the regret scales either as O(nmT1/2)\, O(n2T1/2) or O(n1.5T1/2
 ). We present numerical examples to illustrate the salient features of the
  algorithm.\n
DTSTART:20220714T140000Z
DTEND:20220714T160000Z
LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H
 3A 0E9\, 3480 rue University
SUMMARY:PhD defence of Nima Akbarzadeh - Indexability\, Planning\, and Lear
 ning in Restless Bandits
URL:/ece/channels/event/phd-defence-nima-akbarzadeh-in
 dexability-planning-and-learning-restless-bandits-340256
END:VEVENT
END:VCALENDAR