Shen Zhuoran (Zhuoran is the first name) is a Research Scientist at a Stealth Startup in Palo Alto, CA, working on coding large language models (coding LLMs) to push on the frontier of artificial intelligence (AI) reasoning and general intelligence.

He has been focusing on the non-local attention mechanism and Transformers for computer vision since 2018, arguably being one of the first batch of scholars investigating this topic. Over his tenure at SenseTime, Tencent, and Google AI Residency, he worked on various novel efficient formulations of attention and Transformer and building a visual foundation model. His papers on efficient attention have accumulated more than 700 citations and direct follow-up works in multiple fields, especially medical imaging and remote sensing. After Google, he spent two years in self-driving, lead the end-to-end deep learning transition for Pony.ai’s motion prediction system, and worked on establishing data-driven continuous learning for behavior planning models at Cruise.

He holds a BEng in Computer Science from The University of Hong Kong with First-Class Honours and the top-1 GPA in class.

Education

The University of Hong Kong, Hong Kong

Sep. 2015 - Jun. 2019.
Bachelor of Engineering in Computer Science.
GPA: 3.85/4.30. Standing: 1/111.

University of California, Davis, Davis, CA, United States

Sep. 2017 - Dec. 2017.
Bachelor’s Reciprocity Student in Computer Science.
GPA: 4.00/4.00.

Work Experience

Stealth Starup, San Francisco Bay Area, United States

Present.
Research Scientist, Research
Working on pre-/mid-training of large code models (code LLMs), including scaling law estimation and mixture-of-experts (MoE) support.

Cruise, San Francisco Bay Area, United States

Jan. 2023 – Mar. 2024.
Senior ML/Robotics Engineer, Behaviors Data, AI
Worked on data-driven machine learning transition of the planning stack.
Established the continuous training mechanism for Cruise’s planning models.
Addressed the planning models’ several behavioral issues around emergency vehicles (EMVs, e.g. police ve-hicle, ambulances, and fire trucks).

Pony.ai, San Francisco Bay Area, United States

Nov. 2021 - Oct. 2023.
Software Engineer, Prediction Department
Lead the development of the next-generation, end-to-end, general-purpose trajectory prediction model for self-driving.

Google, Seattle, United States

Oct. 2019 - Aug. 2021.
AI Resident, Google Brain, Google Research
Proposed an early Transformer arhictecture for computer vision before vision Transformer (ViT), the global self-attention networks (GSA-Nets). Details in Research Experience.
Worked on zero-shot detection using image-text pretrained Transformers. Collaboratively proposed vision Transformer for open-world localization (OWL-ViT). Details in Research Experience.

Tencent, Shenzhen, China

Jul. 2019 - Sep. 2019.
Research Intern, Applied Research Center, Platform and Content Group
Proposed a linear attentive memory mechanism for video understanding, the global context module. Details in Research Experience.

SenseTime, Hong Kong

Jun. 2017 - Jun. 2019.
Research Intern, Intelligent Perception and Services Team, Smart City Group
Proposed one of the first linear-complexity attention mechanisms, efficient attention. Details in Research Experience.

Memberships

Awards

Dean’s Honours List 2018-2019, Faculty of Engineering, The University of Hong Kong
Dean’s Honours List 2017-2018, Faculty of Engineering, The University of Hong Kong
Dean’s Honours List 2016-2017, Faculty of Engineering, The University of Hong Kong
Dean’s Honours List 2015-2016, Faculty of Engineering, The University of Hong Kong
Dean’s Honor List, Fall Quarter 2017, College of Letters and Science, University of California, Davis
YC Cheng Engineering Scholarship, 2017, Faculty of Engineering, The University of Hong Kong

Programming Contests

First Runner-up, ACM-HK Programming Contest 2017
Second Runner-up, ACM-ICPC Hong Kong PolyU International Invitational 2017
Second Runner-up, hackUST 2017 Radica Challenge
First Prize, National Olympiad of Informatics in Provinces (China) 2014

Research Experience

Vision Transformer for Open-World Localization, Google

Dec. 2020 - Aug. 2021.
Supervised by Dr. Mostafa Dehghani, Senior Research Scientist, Google Brain, Google Research, Google.
Worked on vision Transformer for open-world localization (OWL-ViT), a simple zero/few-shot detection framework that transfers from image-text pretraining.
Set a new state-of-the-art for one-shot detection by a wide margin.
To publish a paper at ECCV 2022.

Global Self-Attention Networks, Google

Dec. 2019 - Oct. 2020.
Supervised by Dr. Raviteja Vemulapalli, Senior Research Scientist and Dr. Jia Xuhui, Senior Software Engineer, Google Research, Google.
Proposed global self-attention networks (GSA-Nets), one of the first to use efficient attention mechanisms to fully replace convolution for computer vision applications.
Demonstrated superior trade-offs for accuracy vs. parameters, computation, and latency over CNNs.
Shared a preprint on arXiv.

Global Context Module, Tencent

Jul. 2019 - Sep. 2019.
Supervised by Dr. Shan Ying, Director of Applied Research Center, Platform and Content Group, Tencent.
Proposed the global context module, which effectively and efficiently propagates information through an arbitrarily long video with constant complexity w.r.t. video length and linear complexity w.r.t. resolution.
Developed the first real-time video object segmenter that has state-of-the-art accuracy.
Presented a first-author paper at ECCV 2020.

Efficient Attention, SenseTime

Sep. 2018 - Jun. 2019.
Supervised by Dr. Yi Shuai, Research Director, SenseTime.
In collaboration with Dr. Li Hongsheng, Assistant Professor, Multimedia Laboratory, Chinese University of Hong Kong.
Proposed efficient attention, which reduces the memory and computational complexities of the attention mechanism from quadratic to linear.
Demonstrated significant improvement in performance-cost trade-offs on a variety of tasks including object detection, instance segmentation, stereo depth estimation, and temporal action lcoalization.
Presented a first-author paper at WACV 2021.

Visual Embedding of Chinese, Bachelor’s Final-Year Project

Sep. 2018 - Apr. 2019.
Supervised by Dr. Kwan-Yee Kenneth Wong, Associate Professor, Computer Vision Group, The University of Hong Kong.
Designed OceanText, a novel character embedding algorithm for Chinese that extracts a semantic embedding from the image of a Chinese character with a convolutional neural network.
Developed a PyTorch embedding library. Reduced single-GPU training time from 82 days to 28.1 hours compared to existing open-source implementations.
Significantly improved accuracy for word similarity estimation from character embeddings for Chinese.

Teaching Experience

Software Engineering, Teaching Assistant

Jan. 2019 - May 2019.
Assisted George Mitcheson, Guest Lecturer, Department of Computer Science, The University of Hong Kong.
Developed a Django server as the external HR server for student projects and deployed it to Heroku.
Answered questions from and held consultations with students on Git, the Unified Modeling Language, and software design and engineering principles.

Personal Projects

BeautyNet

May 2018 - Oct. 2019.
Personal open-source project.
Developed the 2nd most popular PyTorch template on GitHub with 190+ stars and very high code quality.

The Walled Planet

Sep. 2016 - Nov. 2016.
Course Project, Virtual Worlds, Real Bodies, The University of Hong Kong
Built a maze runner game for virtual reality. Used Unity as the game engine and SketchUp for 3D modeling.
The game is set in a dystopian future where the entire globe become similar to the former Kowloon Walled City. The game aims to arouse players’ awareness about the quality of life in a modern metropolis.

Rush to 1202!

Jan. 2016 - Mar. 2016.
Course Project, Introduction to Computer Science, The University of Hong Kong
Developed a Super Mario-like game. Used Scratch as the development platform.
The game sets on the day of the final exam of the very course, making it highly immersive for classmates. It features extremely counter-intuitive traps, making it very fun to play.

Publications and Preprint

M. Minderer, A. Gritsenko, A. Stone, M. Neumann, D. Weissenborn, A. Dosovitskiy, A. Mahendran, A. Arnab, M. Dehghani, Shen Z., X. Wang, X. Zhai, T. Kipf, N. Houlsby. (2022). Simple Open-Vocabulary Object Detection with Vision Transformers. ECCV 2022.
Shen Z., Zhang M., Zhao H., Yi S., Li H. (2021). Efficient Attention: Attention with Linear Complexities. WACV 2021.
Shen Z., I. Bello, R. Vemulapalli, Jia X., Chen C.-H. (2020). Global Self-Attention Networks for Image Recognition. arXiv: 2010.03019.
Li Y.*, Shen Z.*, Shan Y. (2020). Fast Video Object Segmentation using the Global Context Module. ECCV 2020. *Equal contributions.

Patents

Shen Z., Wu Y. (2022). Processing Method for Vehicle Driving Data, Relevant Devices, Computer Equipment, and Storage Media. CN Patent CN115465290A. Beijing, China: China National Intellectual Property Administration.
Shen Z., Zhang M., Zhao H., Yi S., Yan J. (2021). Method for Obtaining Attention Features for Neural Networks, Relevant Devices, and Storage Media. CN Patent CN109635926B. Beijing, China: China National Intellectual Property Administration.
(Pending) Shen Z., I. Bello, Jia X., Chen C.-H., R. Vemulapalli. (2021). Modeling Dependencies with Global Self-Attention Neural Networks. US Patent WO2020257812A3. Alexandria, VA, United States: United States Patent and Trademark Office.

Non-Professional Experience

Urumqi Middle School Student StarCraft II League (UM3SL)

Jun. 2012 - Oct. 2012
Organizer, commentator
Organized an online e-Sport tournament for StarCraft II. Promoted the tournament on online forums. Set up broadcasting infrastructure on own3D.tv and ZhiboBox. Commentated and broadcasted the games.
8 players from 6 schools joined. The tournament finished after a group stage and a top-four playoff.

Skills

Programming: Python, C++, Shell script, Markdown, LaTeX
Technologies: TensorFlow, Keras, PyTorch, NumPy, OpenCV, Horovod, Slurm, Git, Bazel, Django
Hobbies: e-sports, StarCraft II, Karaoke
Languages: Mandarin Chinese (native), English (working proficiency, 116 in TOEFL)