BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260523T231113EDT-7422p4u5go@132.216.98.100 DTSTAMP:20260524T031113Z DESCRIPTION:Abstract\n\nDeep Reinforcement Learning (DRL) has transformed d ecision-making in areas such as game playing\, robotics\, protein structur e prediction\, and reasoning in large language models. However\, its pract ical use is often hindered by the issue of low sample efficiency. Unlike h umans\, DRL agents typically require millions of interactions to learn eff ective policies\, making training costly and time-consuming. This thesis t ackles the sample efficiency challenge in DRL through three novel approach es and demonstrates a practical application in time series forecasting.\n \nFirst\, we address the offline RL setting\, where policies are learned f rom fixed datasets without further online environment interaction. We show that existing model-free methods tend to produce overly conservative poli cies and propose a relaxed behavior regularization strategy to overcome th is issue.\n\nNext\, we investigate the use of pre-trained Vision-Language Models (VLMs) to guide online RL in reward-sparse environments. While VLMs can provide useful task progress signals\, we identify a reward misalignm ent problem. To fix this\, we introduce FuRL\, a method that aligns VLM-de rived rewards with task goals\, significantly improving learning efficienc y.\n\nWe also explore Inverse Reinforcement Learning (IRL) from expert vid eo demonstrations. Existing Optimal Transport-based methods often ignore t emporal structure. To remedy this\, we propose a method that integrates co ntext embeddings and a masking mechanism to capture temporal order\, enabl ing policy learning from just two action-free videos.\n\nFinally\, we appl y DRL to ensemble learning for time series forecasting under non-stationar y conditions. By treating model combination as a reinforcement learning ta sk\, we design a system that dynamically adjusts model weights\, achieving strong performance even with limited training data.\n DTSTART:20250829T170000Z DTEND:20250829T190000Z LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H 3A 0E9\, 3480 rue University SUMMARY:PhD defence of Yuwei Fu – Sample Efficient Reinforcement Learning: Methods and Applications URL:/ece/channels/event/phd-defence-yuwei-fu-sample-ef ficient-reinforcement-learning-methods-and-applications-366417 END:VEVENT END:VCALENDAR