BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260523T210009EDT-9330NVaJCF@132.216.98.100 DTSTAMP:20260524T010009Z DESCRIPTION:Abstract\n\nIn machine learning\, attention is an effective met hod mimicking the human cognitive attention. This approach aims at enhanci ng the effect of some parts of the data and reducing those of other parts. Attention have exhibited promising potential in enhancing learning models by identifying salient portions of input data\, in various fields. In thi s thesis\, hard attention finding is explored in two vision domains includ ing human activity recognition and few-shot learning.\n\nFinding attention can benefit human activity recognition (HAR)\, which is a challenging res earch field. Current methods in skeleton-based activity recognition primar ily develop deep learning architectures to identify key features from 2D o r 3D coordinates of human body joints. These approaches typically treat al l joints as equally important\, which may not be accurate\, as the relevan ce of joints varies throughout and between activities. Also\, not all vide o frames equally contribute to recognizing an activity. Our research intro duces a method that simultaneously finds both temporal (key frames) and sp atial (key joints) attention\, potentially enhancing baseline classifier p erformance and reducing computational load. Hence\, we first propose a met hod consisting of two agents i.e. temporal and spatial which are trained b y interacting together. The temporal agent finds the key frames\, and the spatial agent looks for key joints. After that\, since the benchmark datas ets mainly have short video sequences\, we decided to withdraw the tempora l agent to investigate and improve the performance of the spatial agent al one. Therefore\, we propose a spatial hard attention-finding method that a ims to discard the irrelevant and misleading joints and preserve the most discriminative ones\, per frame. In the above approaches\, we formulate th e frame selection and joint selection problems as Markov decision process and use deep reinforcement learning to solve them. The proposed methods ar e general frameworks that can be applied to the existing HAR models to imp rove their performance. We achieve very competitive results on the widely used human activity datasets in this field. We have published our results to the Pattern Recognition journal(Elsevier)\, IEEE Transactions on System s\, Man\, and Cybernetics: Systems\, and 2021 IEEE International Conferenc e on Systems\, Man\, and Cybernetics (SMC). Also\, we conducted a survey o n reinforcement learning-based HAR techniques\, published in IEEE Transact ions on Neural Networks and Learning Systems.\n\nAttention finding is part icularly valuable in scenarios where limited training samples are accessib le\, which is the case most of the times\, due to challenges in data colle ction and labeling. Learning from a few labeled data is specifically refer red to as few-shot learning. Hence\, we further aimed to explore the idea of hard attention finding in this area. Attention mechanisms help model to focus on relevant parts of the data\, that is particularly valuable when dealing with scarce training data. By attending to the most informative fe atures or regions in the input\, the model can make better decisions and g eneralize more effectively from the few examples it has been exposed to. I n situations with few training samples\, existing studies struggle to loca te such informative regions due to the large number of training parameters that cannot be effectively learned from the available limited samples. In this work\, we introduce a novel framework for achieving explainable hard attention finding\, specifically adapted to few-shot learning scenarios\, called FewXAT. Our approach employs deep reinforcement learning to implem ent the concept of hard attention\, directly impacting raw input data and thus rendering the process interpretable for human understanding. Through extensive experimentation across benchmark datasets\, we demonstrate the e fficacy of our proposed method. The results of this work are submitted to the 2024 European Conference on Computer Vision (ECCV2024).\n DTSTART:20241104T193000Z DTEND:20241104T213000Z LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H 3A 0E9\, 3480 rue University SUMMARY:PhD defence of Bahar Nikpour – Hard Attention Finding using Reinfor cement Learning URL:/ece/channels/event/phd-defence-bahar-nikpour-hard -attention-finding-using-reinforcement-learning-360731 END:VEVENT END:VCALENDAR