BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260521T114123EDT-5292FgWzLd@132.216.98.100
DTSTAMP:20260521T154123Z
DESCRIPTION:Abstract\n\nSeveral real world applications in autonomous syste
 ms (robotics\, autonomous driving\, energy grids\, etc.) involve a decisio
 n maker or an agent that selects actions based on limited available inform
 ation. Such applications\, where the decision maker does not observe the g
 lobal state of the system\, can be modeled as partially observable Markov 
 decision processes (POMDP). Although it is possible to obtain a history de
 pendent policy by treating the entire history of observations as a state\,
  such an approach has complexity that increases exponentially with time. A
 n alternative approach is to use belief states\, which is the posterior di
 stribution of the environment state given the history. Such belief states 
 can be updated recursively and provide a dynamic programming decomposition
  whose complexity scales linearly with time.\n\nHowever\, it may be challe
 nging to consider the belief state for problems with large state spaces. I
 n practice\, it is often more convenient to work with a general representa
 tion\, which is often referred to as an agent state and is essentially the
  agent's internal representation of all the information available to the a
 gent for decision making. Since such an agent state representation may not
  be a sufficient statistic like a belief state\, it falls into a non-class
 ical information structure. As a result\, the standard dynamic programming
  techniques that are applied to POMDPs with belief states are not applicab
 le to POMDPs with agent states.\n\nWe first analyze the finite horizon POM
 DP setting and consider the use of model information to develop a planning
 -based approach to optimize for agent-state based policies. We achieve thi
 s by introducing a policy search method that guarantees monotonic performa
 nce improvements at every step and also guarantees convergence. Based on t
 his policy search method\, we develop a simple planning-based policy searc
 h algorithm called partially observable conservative policy iteration (POC
 PI). Although such an algorithm only guarantees convergence to locally opt
 imal solutions\, we show empirically that it often converges to the global
 ly optimal solution.\n\nSecondly\, we analyze infinite horizon POMDPs with
 out the use of model information to develop a learning-based approach to o
 ptimize for agent-state based policies. We consider the use of Q-learning 
 since it is a popular learning algorithm and has a strong theoretical basi
 s with provable guarantees. One of the noteworthy features of Q-learning i
 s that it gives us stationary and deterministic policy solutions. When con
 sidering belief states\, this is not an issue because the optimal value ca
 n be achieved by stationary deterministic policies. However\, for the case
  of agent-state policies\, the optimal value may be achieved by a non-stat
 ionary deterministic policy. But it is difficult in practice to have a rea
 lizable non-stationary deterministic policy for the infinite horizon case\
 , and so\, we propose using periodic policies instead. Periodic policies a
 re not only realizable in practice but also offer some degree of non-stati
 onarity in contrast to stationary policies.\n\nWe provide a learning-based
  algorithm called periodic agent-state based Q-learning (PASQL) which comb
 ines the standard Q-learning approach with the idea of periodicity. In add
 ition\, since Q-learning only gives us deterministic policies\, we investi
 gate the use of regularization with PASQL to obtain stochastic policies. W
 e rigorously prove the convergence of such periodic forms of Q-learning an
 d we precisely characterize the solutions quantitatively. We also show thr
 ough empirical studies that such periodic policies are capable of outperfo
 rming stationary policies.\n
DTSTART:20260224T160000Z
DTEND:20260224T180000Z
LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H
 3A 0E9\, 3480 rue University
SUMMARY:PhD defence of Amit Sinha – Planning and learning for agent-state b
 ased policies in POMDPs
URL:/ece/channels/event/phd-defence-amit-sinha-plannin
 g-and-learning-agent-state-based-policies-pomdps-371058
END:VEVENT
END:VCALENDAR