BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260523T064451EDT-23215m9TmN@132.216.98.100 DTSTAMP:20260523T104451Z DESCRIPTION:Abstract\n\nProtein-protein interactions (PPIs) underpin critic al cellular functions\, from metabolism to DNA repair\, and their dysregul ation drives diseases like cancer and viral infections. Traditional experi mental methods for characterizing PPIs (e.g\, yeast two-hybrid\, affinity chromatography) are resource-intensive\, requiring specialized equipment\, expertise\, and weeks of labour per experiment. Computational approaches offer scalable alternatives\, enabling rapid inference of putative PPIs ac ross entire proteomes. However\, despite leveraging large PPI databases an d deep learning advances\, current models fail to generalize to out-of-dis tribution (OOD) proteins\, limiting real-world applicability in disease re search and biologic therapy design.\n\nThis thesis addresses persistent OO D generalization challenges through three interconnected studies. First\, Chapter 3 introduces RAPPPID\, a regularization-optimized PPI inference mo del that achieves state-of-the-art accuracy by the judicious application o f regularization and SentencePiece tokenization\, while exposing critical failures of contemporaneous models on unseen proteins. Second\, Chapter 4 describes INTREPPPID\, which leverages evolutionary orthology to enhance c ross-species generalization. By embedding an 'orthologous locality' loss t erm\, it reshapes protein latent spaces to cluster evolutionarily related proteins\, outperforming prior methods when models trained on human data p redict PPIs in distant organisms (e.g.\, S. cerevisiae\, D. melanogaster). Third\, Chapter 5 identifies a pervasive\, previously uncharacterized dat a leakage source in PPI models incorporating pre-trained protein language models (pLLMs). It establishes rigorous dataset curation protocols to elim inate leakage while maintaining performance and reveals fundamental genera lization barriers between pLLM-based and non-pLLM architectures.\n\nCollec tively\, these contributions advance robust PPI inference by directly conf ronting OOD limitations\, orthology-aware generalization\, and pervasive s ources of data leakage\, all critical for translating computational predic tions into biological insights and therapeutic innovations.\n DTSTART:20251124T180000Z DTEND:20251124T200000Z LOCATION:Room 603\, McConnell Engineering Building\, CA\, QC\, Montreal\, H 3A 0E9\, 3480 rue University SUMMARY:PhD defence of Joseph Szymborski – Addressing Pervasive Challenges in Generalizable Machine Learning Models of Protein-Protein Interaction URL:/ece/channels/event/phd-defence-joseph-szymborski- addressing-pervasive-challenges-generalizable-machine-learning-models-3686 44 END:VEVENT END:VCALENDAR