BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260523T154841EDT-4086vwGM6B@132.216.98.100 DTSTAMP:20260523T194841Z DESCRIPTION:Abstract\n\nRestless bandits are a class of sequential resource allocation problems concerned with allocating one or more resources among several alternative processes where the evolution of the processes is Mar kovian and depends on the resources allocated to them. In 1988\, Whittle d eveloped an index heuristic for restless bandit problems which has emerged as a popular solution approach due to its simplicity and strong empirical performance. The Whittle index heuristic is applicable if the model satis fies a technical condition known as indexability.\n\nIn this thesis\, we f ocus on three general setups: fully-observable\, partially-observable\, an d learning models. For the fully observable setup\, we present two general sufficient conditions for indexability and identify simpler to verify ref inements of these conditions. Afterwards\, we revisit a previously propose d algorithm called adaptive greedy algorithm which is known to compute the Whittle index for a subclass of restless bandits. We show that a generali zation of the adaptive greedy algorithm computes the Whittle index for all indexable restless bandits. We present an efficient implementation of thi s algorithm which can compute the Whittle index of an arm with K states in O(K3) computations. Finally\, we present a detailed numerical study which affirms the strong performance of the Whittle index heuristic.\n\nRegardi ng the partially observable setup\, we consider a restart bandits with two observational models: first\, the state of each bandit is not observable at all\, and second\, the state of each bandit is observable only if it is chosen. For both models\, we show that the system is indexable. For the f irst model\, we derive a closed-form expression for the Whittle index. For the second model\, we propose an efficient algorithm to compute the Whitt le index by exploiting the qualitative properties of the optimal policy. W e present detailed numerical experiments which indicates that the Whittle index policy outperforms myopic policy and can be close to optimal in diff erent setups.\n\nFor the learning setup\, we assume the true system dynami cs are unknown and present a Thompson sampling reinforcement learning (RL) algorithm which has a regret which scales polynomially with the number of arms. This contrasts with most RL algorithms where the cumulative regret scales exponentially with the number of arms. We present two characterizat ions of the regret of the proposed algorithm with respect to the Whittle i ndex policy. Depending on the reward model and technical assumptions\, we show that for a restless bandit with n arms and at most m activations at e ach time\, the regret scales either as O(nmT1/2)\, O(n2T1/2) or O(n1.5T1/2 ). We present numerical examples to illustrate the salient features of the algorithm.\n DTSTART:20220714T140000Z DTEND:20220714T160000Z LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H 3A 0E9\, 3480 rue University SUMMARY:PhD defence of Nima Akbarzadeh - Indexability\, Planning\, and Lear ning in Restless Bandits URL:/ece/channels/event/phd-defence-nima-akbarzadeh-in dexability-planning-and-learning-restless-bandits-340256 END:VEVENT END:VCALENDAR