BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260523T064451EDT-23215m9TmN@132.216.98.100
DTSTAMP:20260523T104451Z
DESCRIPTION:Abstract\n\nProtein-protein interactions (PPIs) underpin critic
 al cellular functions\, from metabolism to DNA repair\, and their dysregul
 ation drives diseases like cancer and viral infections. Traditional experi
 mental methods for characterizing PPIs (e.g\, yeast two-hybrid\, affinity 
 chromatography) are resource-intensive\, requiring specialized equipment\,
  expertise\, and weeks of labour per experiment. Computational approaches 
 offer scalable alternatives\, enabling rapid inference of putative PPIs ac
 ross entire proteomes. However\, despite leveraging large PPI databases an
 d deep learning advances\, current models fail to generalize to out-of-dis
 tribution (OOD) proteins\, limiting real-world applicability in disease re
 search and biologic therapy design.\n\nThis thesis addresses persistent OO
 D generalization challenges through three interconnected studies. First\, 
 Chapter 3 introduces RAPPPID\, a regularization-optimized PPI inference mo
 del that achieves state-of-the-art accuracy by the judicious application o
 f regularization and SentencePiece tokenization\, while exposing critical 
 failures of contemporaneous models on unseen proteins. Second\, Chapter 4 
 describes INTREPPPID\, which leverages evolutionary orthology to enhance c
 ross-species generalization. By embedding an 'orthologous locality' loss t
 erm\, it reshapes protein latent spaces to cluster evolutionarily related 
 proteins\, outperforming prior methods when models trained on human data p
 redict PPIs in distant organisms (e.g.\, S. cerevisiae\, D. melanogaster).
  Third\, Chapter 5 identifies a pervasive\, previously uncharacterized dat
 a leakage source in PPI models incorporating pre-trained protein language 
 models (pLLMs). It establishes rigorous dataset curation protocols to elim
 inate leakage while maintaining performance and reveals fundamental genera
 lization barriers between pLLM-based and non-pLLM architectures.\n\nCollec
 tively\, these contributions advance robust PPI inference by directly conf
 ronting OOD limitations\, orthology-aware generalization\, and pervasive s
 ources of data leakage\, all critical for translating computational predic
 tions into biological insights and therapeutic innovations.\n
DTSTART:20251124T180000Z
DTEND:20251124T200000Z
LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H
 3A 0E9\, 3480 rue University
SUMMARY:PhD defence of Joseph Szymborski – Addressing Pervasive Challenges 
 in Generalizable Machine Learning Models of Protein-Protein Interaction
URL:/ece/channels/event/phd-defence-joseph-szymborski-
 addressing-pervasive-challenges-generalizable-machine-learning-models-3686
 44
END:VEVENT
END:VCALENDAR