BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260522T203356EDT-2642BUOaLK@132.216.98.100 DTSTAMP:20260523T003356Z DESCRIPTION:Abstract\n\nThis thesis examines various aspects of integrating learning and control across different families of stochastic systems. Thi s work includes three key aspects: (i) learning the unknown system paramet ers from input-output data sequences (i.e.\, system identification)\, (ii) integrating learning within the control of dynamical systems (i.e.\, adap tive control)\, and (iii) providing probabilistic guarantees for the devel oped methodologies\, including regret upper bounds and concentration bound s. The analysis is conducted within three frameworks of stochastic systems : Markov jump linear systems\, finite-state and finite-action Markov Decis ion Processes (MDPs)\, and linear stochastic systems.\n\nIn the framework of Markov jump linear systems\, two primary problems are addressed. First\ , we focus on the full-state observation system identification problem\, i .e.\, learning system parameters from observed sequences of discrete and c ontinuous states. We propose a variant of least squares algorithm called s witched least squares. By leveraging classical regression theory\, we esta blish the algorithm's strong consistency and derive its convergence rate. Furthermore\, we integrate this algorithm into a certainty-equivalence fra mework tailored for controlling Markov jump linear systems. By leveraging the convergence rate of switched least squares\, a novel regret decomposit ion\, and the concentration properties of martingale difference sequences\ , we derive a sub-linear regret upper bound for the proposed algorithm.\n \nIn the second part of this thesis\, we investigate the concentration pro perties of cumulative rewards in Markov Decision Processes (MDPs)\, focusi ng on both asymptotic and non-asymptotic settings. We introduce a unified approach to characterize reward concentration in MDPs\, covering both infi nite-horizon settings (i.e.\, average and discounted reward frameworks) an d finite-horizon setting. The asymptotic results include the law of large numbers\, the central limit theorem\, and the law of iterated logarithm\, while the non-asymptotic results include Azuma-Hoeffding-type inequalities and a non-asymptotic version of the law of iterated logarithm. Using thes e results\, we show that two alternative definitions of regret for learnin g policies in the literature are rate-equivalent. The proofs rely on a nov el martingale decomposition of cumulative reward\, properties of the solut ions of the policy-evaluation fixed-point equation\, and asymptotic and no n-asymptotic concentration of martingales.\n\nFinally\, the analysis is ex tended to the case of linear systems\, where we establish the asymptotic n ormality of the cumulative cost induced by the optimal policies in linear quadratic regulators (LQRs). These results address some of the key theoret ical questions in integrating learning and control in stochastic systems. \n DTSTART:20250219T150000Z DTEND:20250219T170000Z LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H 3A 0E9\, 3480 rue University SUMMARY:PhD defence of Borna Sayedana – Learning\, Control and Concentratio n of Cumulative Rewards in MDPs and Markov Jump Systems URL:/ece/channels/event/phd-defence-borna-sayedana-lea rning-control-and-concentration-cumulative-rewards-mdps-and-markov-363501 END:VEVENT END:VCALENDAR