BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260521T114123EDT-5292FgWzLd@132.216.98.100 DTSTAMP:20260521T154123Z DESCRIPTION:Abstract\n\nSeveral real world applications in autonomous syste ms (robotics\, autonomous driving\, energy grids\, etc.) involve a decisio n maker or an agent that selects actions based on limited available inform ation. Such applications\, where the decision maker does not observe the g lobal state of the system\, can be modeled as partially observable Markov decision processes (POMDP). Although it is possible to obtain a history de pendent policy by treating the entire history of observations as a state\, such an approach has complexity that increases exponentially with time. A n alternative approach is to use belief states\, which is the posterior di stribution of the environment state given the history. Such belief states can be updated recursively and provide a dynamic programming decomposition whose complexity scales linearly with time.\n\nHowever\, it may be challe nging to consider the belief state for problems with large state spaces. I n practice\, it is often more convenient to work with a general representa tion\, which is often referred to as an agent state and is essentially the agent's internal representation of all the information available to the a gent for decision making. Since such an agent state representation may not be a sufficient statistic like a belief state\, it falls into a non-class ical information structure. As a result\, the standard dynamic programming techniques that are applied to POMDPs with belief states are not applicab le to POMDPs with agent states.\n\nWe first analyze the finite horizon POM DP setting and consider the use of model information to develop a planning -based approach to optimize for agent-state based policies. We achieve thi s by introducing a policy search method that guarantees monotonic performa nce improvements at every step and also guarantees convergence. Based on t his policy search method\, we develop a simple planning-based policy searc h algorithm called partially observable conservative policy iteration (POC PI). Although such an algorithm only guarantees convergence to locally opt imal solutions\, we show empirically that it often converges to the global ly optimal solution.\n\nSecondly\, we analyze infinite horizon POMDPs with out the use of model information to develop a learning-based approach to o ptimize for agent-state based policies. We consider the use of Q-learning since it is a popular learning algorithm and has a strong theoretical basi s with provable guarantees. One of the noteworthy features of Q-learning i s that it gives us stationary and deterministic policy solutions. When con sidering belief states\, this is not an issue because the optimal value ca n be achieved by stationary deterministic policies. However\, for the case of agent-state policies\, the optimal value may be achieved by a non-stat ionary deterministic policy. But it is difficult in practice to have a rea lizable non-stationary deterministic policy for the infinite horizon case\ , and so\, we propose using periodic policies instead. Periodic policies a re not only realizable in practice but also offer some degree of non-stati onarity in contrast to stationary policies.\n\nWe provide a learning-based algorithm called periodic agent-state based Q-learning (PASQL) which comb ines the standard Q-learning approach with the idea of periodicity. In add ition\, since Q-learning only gives us deterministic policies\, we investi gate the use of regularization with PASQL to obtain stochastic policies. W e rigorously prove the convergence of such periodic forms of Q-learning an d we precisely characterize the solutions quantitatively. We also show thr ough empirical studies that such periodic policies are capable of outperfo rming stationary policies.\n DTSTART:20260224T160000Z DTEND:20260224T180000Z LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H 3A 0E9\, 3480 rue University SUMMARY:PhD defence of Amit Sinha – Planning and learning for agent-state b ased policies in POMDPs URL:/ece/channels/event/phd-defence-amit-sinha-plannin g-and-learning-agent-state-based-policies-pomdps-371058 END:VEVENT END:VCALENDAR