BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260522T203356EDT-2642BUOaLK@132.216.98.100
DTSTAMP:20260523T003356Z
DESCRIPTION:Abstract\n\nThis thesis examines various aspects of integrating
  learning and control across different families of stochastic systems. Thi
 s work includes three key aspects: (i) learning the unknown system paramet
 ers from input-output data sequences (i.e.\, system identification)\, (ii)
  integrating learning within the control of dynamical systems (i.e.\, adap
 tive control)\, and (iii) providing probabilistic guarantees for the devel
 oped methodologies\, including regret upper bounds and concentration bound
 s. The analysis is conducted within three frameworks of stochastic systems
 : Markov jump linear systems\, finite-state and finite-action Markov Decis
 ion Processes (MDPs)\, and linear stochastic systems.\n\nIn the framework 
 of Markov jump linear systems\, two primary problems are addressed. First\
 , we focus on the full-state observation system identification problem\, i
 .e.\, learning system parameters from observed sequences of discrete and c
 ontinuous states. We propose a variant of least squares algorithm called s
 witched least squares. By leveraging classical regression theory\, we esta
 blish the algorithm's strong consistency and derive its convergence rate. 
 Furthermore\, we integrate this algorithm into a certainty-equivalence fra
 mework tailored for controlling Markov jump linear systems. By leveraging 
 the convergence rate of switched least squares\, a novel regret decomposit
 ion\, and the concentration properties of martingale difference sequences\
 , we derive a sub-linear regret upper bound for the proposed algorithm.\n
 \nIn the second part of this thesis\, we investigate the concentration pro
 perties of cumulative rewards in Markov Decision Processes (MDPs)\, focusi
 ng on both asymptotic and non-asymptotic settings. We introduce a unified 
 approach to characterize reward concentration in MDPs\, covering both infi
 nite-horizon settings (i.e.\, average and discounted reward frameworks) an
 d finite-horizon setting. The asymptotic results include the law of large 
 numbers\, the central limit theorem\, and the law of iterated logarithm\, 
 while the non-asymptotic results include Azuma-Hoeffding-type inequalities
  and a non-asymptotic version of the law of iterated logarithm. Using thes
 e results\, we show that two alternative definitions of regret for learnin
 g policies in the literature are rate-equivalent. The proofs rely on a nov
 el martingale decomposition of cumulative reward\, properties of the solut
 ions of the policy-evaluation fixed-point equation\, and asymptotic and no
 n-asymptotic concentration of martingales.\n\nFinally\, the analysis is ex
 tended to the case of linear systems\, where we establish the asymptotic n
 ormality of the cumulative cost induced by the optimal policies in linear 
 quadratic regulators (LQRs). These results address some of the key theoret
 ical questions in integrating learning and control in stochastic systems.
 \n
DTSTART:20250219T150000Z
DTEND:20250219T170000Z
LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H
 3A 0E9\, 3480 rue University
SUMMARY:PhD defence of Borna Sayedana – Learning\, Control and Concentratio
 n of Cumulative Rewards in MDPs and Markov Jump Systems
URL:/ece/channels/event/phd-defence-borna-sayedana-lea
 rning-control-and-concentration-cumulative-rewards-mdps-and-markov-363501
END:VEVENT
END:VCALENDAR