Siyuan Huang

Cited by

	All	Since 2019
Citations	580	580
h-index	8	8
i10-index	8	8

400

200

100

300

2022202320249 176 394

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Co-authors

Peng GaoShanghai AI LabVerified email at pjlab.org.cn
Yu QiaoProfessor of Shanghai AI Laboratory; Shenzhen Institutes of Advanced Technology, CASVerified email at siat.ac.cn
Hongsheng Li (李鸿升)The Chinese University of Hong KongVerified email at ee.cuhk.edu.hk
Wenqi ShaoResearcher at Shanghai AI LaboratoryVerified email at pjlab.org.cn
Hao DongAssistant Professor at Peking UniversityVerified email at pku.edu.cn
Renrui ZhangMMLab CUHK & Peking UniversityVerified email at pku.edu.cn
Ziyi LinThe Chinese University of Hong KongVerified email at link.cuhk.edu.hk
Zhengkai JiangTencentVerified email at tencent.com
Jiaming HanPhD Student, CUHK MMLabVerified email at link.cuhk.edu.hk
Zhang BoShanghai Artificial Intelligence LaboratoryVerified email at pjlab.org.cn
Aojun ZhouCUHKVerified email at link.cuhk.edu.hk
Shanghang ZhangEECS, UC BerkeleyVerified email at eecs.berkeley.edu
Haonan ChangRutgers University, Robotics Ph.D.Verified email at scarletmail.rutgers.edu

Siyuan Huang

Shanghai AI Lab && SJTU && MMLab CUHK

Verified email at sjtu.edu.cn - Homepage

Robotics Pre-training


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners R Zhang, X Hu, B Li, S Huang, H Deng, Y Qiao, P Gao, H Li Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	113	2023
Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei, F Meng, S Huang, Y Qiao, ... arXiv preprint arXiv:2306.09265, 2023	107	2023
Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models Z Lin, C Liu, R Zhang, P Gao, L Qiu, H Xiao, H Qiu, C Lin, W Shao, ... arXiv preprint arXiv:2311.07575, 2023	103	2023
Multi-modal sensor fusion for auto driving perception: A survey K Huang, B Shi, X Li, X Li, S Huang, Y Li arXiv preprint arXiv:2202.02703, 2022	98	2022
Instruct2act: Mapping multi-modality instructions to robotic actions with large language model S Huang, Z Jiang, H Dong, Y Qiao, P Gao, H Li arXiv preprint arXiv:2305.11176, 2023	68	2023
Sphinx-x: Scaling data and parameters for a family of multi-modal large language models P Gao, R Zhang, C Liu, L Qiu, S Huang, W Lin, S Zhao, S Geng, Z Lin, ... arXiv preprint arXiv:2402.05935, 2024	39	2024
Tiny lvlm-ehub: Early multimodal experiments with bard W Shao, Y Hu, P Gao, M Lei, K Zhang, F Meng, P Xu, S Huang, H Li, ... arXiv preprint arXiv:2308.03729, 2023	21	2023
Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill W Cai, S Huang, G Cheng, Y Long, P Gao, C Sun, H Dong ICRA2024, 2023	10	2023
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification S Huang, B Zhang, B Shi, H Li, Y Li, P Gao Proceedings of the 31st ACM International Conference on Multimedia, 8644-8652, 2023	6	2023
Adas: A simple active-and-adaptive baseline for cross-domain 3d semantic segmentation B Fei, S Huang, J Yuan, B Shi, B Zhang, T Chen, M Dou, Y Qiao arXiv preprint arXiv: 2212.10390, 2022	5	2022
Manipvqa: Injecting robotic affordance and physically grounded information into multi-modal large language models S Huang, I Ponomarenko, Z Jiang, X Li, X Hu, P Gao, H Li, H Dong IROS2024, 2024	4	2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models X Lu, Q Liu, Y Xu, A Zhou, S Huang, B Zhang, J Yan, H Li arXiv preprint arXiv:2402.14800, 2024	4	2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want W Lin, X Wei, R An, P Gao, B Zou, Y Luo, S Huang, S Zhang, H Li arXiv preprint arXiv:2403.20271, 2024	2	2024
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices Q Lu, W Shao, Z Liu, F Meng, B Li, B Chen, S Huang, K Zhang, Y Qiao, ... arXiv preprint arXiv:2406.08451, 2024		2024
A3VLM: Actionable Articulation-Aware Vision Language Model S Huang, H Chang, Y Liu, Y Zhu, H Dong, P Gao, A Boularias, H Li arXiv preprint arXiv:2406.07549, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–15

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors