BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260523T210009EDT-9330NVaJCF@132.216.98.100
DTSTAMP:20260524T010009Z
DESCRIPTION:Abstract\n\nIn machine learning\, attention is an effective met
 hod mimicking the human cognitive attention. This approach aims at enhanci
 ng the effect of some parts of the data and reducing those of other parts.
  Attention have exhibited promising potential in enhancing learning models
  by identifying salient portions of input data\, in various fields. In thi
 s thesis\, hard attention finding is explored in two vision domains includ
 ing human activity recognition and few-shot learning.\n\nFinding attention
  can benefit human activity recognition (HAR)\, which is a challenging res
 earch field. Current methods in skeleton-based activity recognition primar
 ily develop deep learning architectures to identify key features from 2D o
 r 3D coordinates of human body joints. These approaches typically treat al
 l joints as equally important\, which may not be accurate\, as the relevan
 ce of joints varies throughout and between activities. Also\, not all vide
 o frames equally contribute to recognizing an activity. Our research intro
 duces a method that simultaneously finds both temporal (key frames) and sp
 atial (key joints) attention\, potentially enhancing baseline classifier p
 erformance and reducing computational load. Hence\, we first propose a met
 hod consisting of two agents i.e. temporal and spatial which are trained b
 y interacting together. The temporal agent finds the key frames\, and the 
 spatial agent looks for key joints. After that\, since the benchmark datas
 ets mainly have short video sequences\, we decided to withdraw the tempora
 l agent to investigate and improve the performance of the spatial agent al
 one. Therefore\, we propose a spatial hard attention-finding method that a
 ims to discard the irrelevant and misleading joints and preserve the most 
 discriminative ones\, per frame. In the above approaches\, we formulate th
 e frame selection and joint selection problems as Markov decision process 
 and use deep reinforcement learning to solve them. The proposed methods ar
 e general frameworks that can be applied to the existing HAR models to imp
 rove their performance. We achieve very competitive results on the widely 
 used human activity datasets in this field. We have published our results 
 to the Pattern Recognition journal(Elsevier)\, IEEE Transactions on System
 s\, Man\, and Cybernetics: Systems\, and 2021 IEEE International Conferenc
 e on Systems\, Man\, and Cybernetics (SMC). Also\, we conducted a survey o
 n reinforcement learning-based HAR techniques\, published in IEEE Transact
 ions on Neural Networks and Learning Systems.\n\nAttention finding is part
 icularly valuable in scenarios where limited training samples are accessib
 le\, which is the case most of the times\, due to challenges in data colle
 ction and labeling. Learning from a few labeled data is specifically refer
 red to as few-shot learning. Hence\, we further aimed to explore the idea 
 of hard attention finding in this area. Attention mechanisms help model to
  focus on relevant parts of the data\, that is particularly valuable when 
 dealing with scarce training data. By attending to the most informative fe
 atures or regions in the input\, the model can make better decisions and g
 eneralize more effectively from the few examples it has been exposed to. I
 n situations with few training samples\, existing studies struggle to loca
 te such informative regions due to the large number of training parameters
  that cannot be effectively learned from the available limited samples. In
  this work\, we introduce a novel framework for achieving explainable hard
  attention finding\, specifically adapted to few-shot learning scenarios\,
  called FewXAT. Our approach employs deep reinforcement learning to implem
 ent the concept of hard attention\, directly impacting raw input data and 
 thus rendering the process interpretable for human understanding. Through 
 extensive experimentation across benchmark datasets\, we demonstrate the e
 fficacy of our proposed method. The results of this work are submitted to 
 the 2024 European Conference on Computer Vision (ECCV2024).\n
DTSTART:20241104T193000Z
DTEND:20241104T213000Z
LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H
 3A 0E9\, 3480 rue University
SUMMARY:PhD defence of Bahar Nikpour – Hard Attention Finding using Reinfor
 cement Learning
URL:/ece/channels/event/phd-defence-bahar-nikpour-hard
 -attention-finding-using-reinforcement-learning-360731
END:VEVENT
END:VCALENDAR