Research Scientist at Ai2 on the PRIOR team
location_on Seattle, WA, USAHello! I am a Research Scientist at Ai2 on the PRIOR team, based in Seattle, WA.
My academic interests lie in computer vision, machine learning and their applications to real world problems. Specifically, I focus on multimodal representation learning, especially for high-level video understanding and reasoning. Prior to joining Ai2, I received my Ph.D. in Computer Science and Engineering from Seoul National University where I was advised by Prof. Gunhee Kim.
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Matt Deitke*, Christopher Clark*, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Jen Dumas, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross Girshick, Ali Farhadi, Aniruddha Kembhavi (*: equal contribution)
preprint
[paper]
[demo]
[blog post]
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha*, Chaeyun Kim*, Donghwa Kim*, Junho Lee, Sangho Lee, and Joonseok Lee (*: equal contribution)
ECCV 2024
[paper]
[project page]
[code]
[poster]
[slides]
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu*, Christopher Clark*, Sangho Lee*, Zichen Zhang*, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi (*: equal contribution)
CVPR 2024
[paper]
[webpage]
Can Language Models Laugh at YouTube Short-form Videos?
Dayoon Ko, Sangho Lee and Gunhee Kim
EMNLP 2023
[paper]
[code&dataset]
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee*, Jiwan Chung*, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik and Yale Song (*: equal contribution)
ICCV 2021
CVPR 2021: The Third Workshop on Learning from Unlabeled Videos
[paper]
[code&dataset]
Unsupervised Representation Learning via Neural Activation Coding
Yookoon Park, Sangho Lee, Gunhee Kim and David Blei
ICML 2021 (long talk)
[paper]
[poster]
[slides]
[code]
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz and Yale Song
ICLR 2021
CVPR 2021: The Second Intertional Workshop on Large Scale Holistic Video Understanding
[paper]
[poster]
[slides]
Self-Supervised Learning of Compressed Video Representations
Youngjae Yu*, Sangho Lee*, Gunhee Kim and Yale Song (*: equal contribution)
ICLR 2021
[paper]
[poster]
[slides]
A Memory Network Approach for Story-based Temporal Summarization of 360° Videos
Sangho Lee, Jinyoung Sung, Youngjae Yu and Gunhee Kim
CVPR 2018
ECCV 2018 Workshop on 360° Perception and Interaction
[paper]
[project page (poster/slides/bibtex)]
A Deep Ranking Model for Spatio-temporal Highlight Detection from a 360° Video
Youngjae Yu, Sangho Lee, Joonil Na, Jaeyoun Kang and Gunhee Kim
AAAI 2018 (spotlight)
[paper]
[poster]
[slides]
[bibtex]
A Read-Write Memory Network for Movie Story Understanding
Seil Na, Sangho Lee, Jisung Kim and Gunhee Kim
ICCV 2017
ICCV 2017: The Joint Video and Language Understanding Workshop
[paper]
[code]
[poster]
[bibtex]
Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset
Seil Na, Youngjae Yu, Sangho Lee, Jisung Kim and Gunhee Kim
CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding
[paper]
[code]
[bibtex]