Xiao Wang
Xiao Wang
Google DeepMind
Verified email at - Homepage
Cited by
Cited by
Pali: A jointly-scaled multilingual language-image model
X Chen, X Wang, S Changpinyo, AJ Piergiovanni, P Padlewski, D Salz, ...
ICLR 2023 (Oral), 2022
LiT: Zero-Shot Transfer with Locked-image Text Tuning
X Zhai, X Wang, B Mustafa, A Steiner, D Keysers, A Kolesnikov, L Beyer
CVPR 2022, 2021
Measuring compositional generalization: A comprehensive method on realistic data
D Keysers, N Schärli, N Scales, H Buisman, D Furrer, S Kashubin, ...
ICLR 2020, 2019
Scaling vision transformers to 22 billion parameters
M Dehghani, J Djolonga, B Mustafa, P Padlewski, J Heek, J Gilmer, ...
ICML 2023 (Oral), 2023
Simple Open-Vocabulary Object Detection with Vision Transformers
M Minderer, A Gritsenko, A Stone, M Neumann, D Weissenborn, ...
ECCV 2022, 2022
Pali-x: On scaling up a multilingual vision and language model
X Chen, J Djolonga, P Padlewski, B Mustafa, S Changpinyo, J Wu, ...
CVPR 2024, 2023
Pali-3 vision language models: Smaller, faster, stronger
X Chen, X Wang, L Beyer, A Kolesnikov, J Wu, P Voigtlaender, B Mustafa, ...
arXiv preprint arXiv:2310.09199, 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
J Kossen, M Collier, B Mustafa, X Wang, X Zhai, L Beyer, A Steiner, ...
NeuIPS 2023, 2023
A study of autoregressive decoders for multi-tasking in computer vision
L Beyer, B Wan, G Madan, F Pavetic, A Steiner, A Kolesnikov, AS Pinto, ...
arXiv preprint arXiv:2303.17376, 2023
LocCa: Visual Pretraining with Location-aware Captioners
B Wan, M Tschannen, Y Xian, F Pavetic, I Alabdulmohsin, X Wang, ...
arXiv preprint arXiv:2403.19596, 2024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
I Alabdulmohsin, X Wang, A Steiner, P Goyal, A D'Amour, X Zhai
ICLR 2024, 2024
Locked-Model Multimodal Contrastive Tuning
D Keysers, X Zhai, X Wang, L Beyer, B Mustafa, A Steiner, A Kolesnikov
US Patent App. 18/051,106, 2024
The system can't perform the operation now. Try again later.
Articles 1–12