Shen Zhuoran (Zhuoran is the first name) is a Research Scientist at Augment, working on coding large language models (coding LLMs) to push on the frontier of artificial intelligence (AI) reasoning and general intelligence.

He has been focusing on the non-local attention mechanism and Transformers for computer vision since 2018, arguably being one of the first batch of scholars investigating this topic. Over his tenure at SenseTime, Tencent, and Google AI Residency, he worked on various novel efficient formulations of attention and Transformer and building a visual foundation model. His papers on efficient attention have accumulated more than 700 citations and direct follow-up works in multiple fields, especially medical imaging and remote sensing. After Google, he spent two years in self-driving, lead the end-to-end deep learning transition for Pony.ai’s motion prediction system, and worked on establishing data-driven continuous learning for behavior planning models at Cruise.

He holds a BEng in Computer Science from The University of Hong Kong with First-Class Honours and the top-1 GPA in class.

Education

The University of Hong Kong, Hong Kong

  • Sep. 2015 - Jun. 2019.
  • Bachelor of Engineering in Computer Science.
  • GPA: 3.85/4.30. Standing: 1/111.

University of California, Davis, Davis, CA, United States

  • Sep. 2017 - Dec. 2017.
  • Bachelor’s Reciprocity Student in Computer Science.
  • GPA: 4.00/4.00.

Work Experience

Augment, San Francisco Bay Area, United States

  • Present.
  • Research Scientist, Research
  • Working on large code model (code LLM) training.

Cruise, San Francisco Bay Area, United States

  • Jan. 2023 – Mar. 2024.
  • Senior ML/Robotics Engineer, Behaviors Data, AI
  • Worked on data-driven machine learning transition of the planning stack.
  • Established the continuous training mechanism for Cruise’s planning models.
  • Addressed the planning models’ several behavioral issues around emergency vehicles (EMVs, e.g. police ve-hicle, ambulances, and fire trucks).

Pony.ai, San Francisco Bay Area, United States

  • Nov. 2021 - Oct. 2023.
  • Software Engineer, Prediction Department
  • Lead the development of the next-generation, end-to-end, general-purpose trajectory prediction model for self-driving.

Google, Seattle, United States

  • Oct. 2019 - Aug. 2021.
  • AI Resident, Google Brain, Google Research
  • Proposed an early Transformer arhictecture for computer vision before vision Transformer (ViT), the global self-attention networks (GSA-Nets). Details in Research Experience.
  • Worked on zero-shot detection using image-text pretrained Transformers. Collaboratively proposed vision Transformer for open-world localization (OWL-ViT). Details in Research Experience.

Tencent, Shenzhen, China

  • Jul. 2019 - Sep. 2019.
  • Research Intern, Applied Research Center, Platform and Content Group
  • Proposed a linear attentive memory mechanism for video understanding, the global context module. Details in Research Experience.

SenseTime, Hong Kong

  • Jun. 2017 - Jun. 2019.
  • Research Intern, Intelligent Perception and Services Team, Smart City Group
  • Proposed one of the first linear-complexity attention mechanisms, efficient attention. Details in Research Experience.

Memberships

Awards

  • Dean’s Honours List 2018-2019, Faculty of Engineering, The University of Hong Kong
  • Dean’s Honours List 2017-2018, Faculty of Engineering, The University of Hong Kong
  • Dean’s Honours List 2016-2017, Faculty of Engineering, The University of Hong Kong
  • Dean’s Honours List 2015-2016, Faculty of Engineering, The University of Hong Kong
  • Dean’s Honor List, Fall Quarter 2017, College of Letters and Science, University of California, Davis
  • YC Cheng Engineering Scholarship, 2017, Faculty of Engineering, The University of Hong Kong

Programming Contests

  • First Runner-up, ACM-HK Programming Contest 2017
  • Second Runner-up, ACM-ICPC Hong Kong PolyU International Invitational 2017
  • Second Runner-up, hackUST 2017 Radica Challenge
  • First Prize, National Olympiad of Informatics in Provinces (China) 2014

Research Experience

Vision Transformer for Open-World Localization, Google

  • Dec. 2020 - Aug. 2021.
  • Supervised by Dr. Mostafa Dehghani, Senior Research Scientist, Google Brain, Google Research, Google.
  • Worked on vision Transformer for open-world localization (OWL-ViT), a simple zero/few-shot detection framework that transfers from image-text pretraining.
  • Set a new state-of-the-art for one-shot detection by a wide margin.
  • To publish a paper at ECCV 2022.

Global Self-Attention Networks, Google

  • Dec. 2019 - Oct. 2020.
  • Supervised by Dr. Raviteja Vemulapalli, Senior Research Scientist and Dr. Jia Xuhui, Senior Software Engineer, Google Research, Google.
  • Proposed global self-attention networks (GSA-Nets), one of the first to use efficient attention mechanisms to fully replace convolution for computer vision applications.
  • Demonstrated superior trade-offs for accuracy vs. parameters, computation, and latency over CNNs.
  • Shared a preprint on arXiv.

Global Context Module, Tencent

  • Jul. 2019 - Sep. 2019.
  • Supervised by Dr. Shan Ying, Director of Applied Research Center, Platform and Content Group, Tencent.
  • Proposed the global context module, which effectively and efficiently propagates information through an arbitrarily long video with constant complexity w.r.t. video length and linear complexity w.r.t. resolution.
  • Developed the first real-time video object segmenter that has state-of-the-art accuracy.
  • Presented a first-author paper at ECCV 2020.

Efficient Attention, SenseTime

  • Sep. 2018 - Jun. 2019.
  • Supervised by Dr. Yi Shuai, Research Director, SenseTime.
  • In collaboration with Dr. Li Hongsheng, Assistant Professor, Multimedia Laboratory, Chinese University of Hong Kong.
  • Proposed efficient attention, which reduces the memory and computational complexities of the attention mechanism from quadratic to linear.
  • Demonstrated significant improvement in performance-cost trade-offs on a variety of tasks including object detection, instance segmentation, stereo depth estimation, and temporal action lcoalization.
  • Presented a first-author paper at WACV 2021.

Visual Embedding of Chinese, Bachelor’s Final-Year Project

  • Sep. 2018 - Apr. 2019.
  • Supervised by Dr. Kwan-Yee Kenneth Wong, Associate Professor, Computer Vision Group, The University of Hong Kong.
  • Designed OceanText, a novel character embedding algorithm for Chinese that extracts a semantic embedding from the image of a Chinese character with a convolutional neural network.
  • Developed a PyTorch embedding library. Reduced single-GPU training time from 82 days to 28.1 hours compared to existing open-source implementations.
  • Significantly improved accuracy for word similarity estimation from character embeddings for Chinese.

Teaching Experience

Software Engineering, Teaching Assistant

  • Jan. 2019 - May 2019.
  • Assisted George Mitcheson, Guest Lecturer, Department of Computer Science, The University of Hong Kong.
  • Developed a Django server as the external HR server for student projects and deployed it to Heroku.
  • Answered questions from and held consultations with students on Git, the Unified Modeling Language, and software design and engineering principles.

Personal Projects

BeautyNet

  • May 2018 - Oct. 2019.
  • Personal open-source project.
  • Developed the 2nd most popular PyTorch template on GitHub with 190+ stars and very high code quality.

The Walled Planet

  • Sep. 2016 - Nov. 2016.
  • Course Project, Virtual Worlds, Real Bodies, The University of Hong Kong
  • Built a maze runner game for virtual reality. Used Unity as the game engine and SketchUp for 3D modeling.
  • The game is set in a dystopian future where the entire globe become similar to the former Kowloon Walled City. The game aims to arouse players’ awareness about the quality of life in a modern metropolis.

Rush to 1202!

  • Jan. 2016 - Mar. 2016.
  • Course Project, Introduction to Computer Science, The University of Hong Kong
  • Developed a Super Mario-like game. Used Scratch as the development platform.
  • The game sets on the day of the final exam of the very course, making it highly immersive for classmates. It features extremely counter-intuitive traps, making it very fun to play.

Publications and Preprint

Patents

  • Shen Z., Wu Y. (2022). Processing Method for Vehicle Driving Data, Relevant Devices, Computer Equipment, and Storage Media. CN Patent CN115465290A. Beijing, China: China National Intellectual Property Administration.
  • Shen Z., Zhang M., Zhao H., Yi S., Yan J. (2021). Method for Obtaining Attention Features for Neural Networks, Relevant Devices, and Storage Media. CN Patent CN109635926B. Beijing, China: China National Intellectual Property Administration.
  • (Pending) Shen Z., I. Bello, Jia X., Chen C.-H., R. Vemulapalli. (2021). Modeling Dependencies with Global Self-Attention Neural Networks. US Patent WO2020257812A3. Alexandria, VA, United States: United States Patent and Trademark Office.

Non-Professional Experience

Urumqi Middle School Student StarCraft II League (UM3SL)

  • Jun. 2012 - Oct. 2012
  • Organizer, commentator
  • Organized an online e-Sport tournament for StarCraft II. Promoted the tournament on online forums. Set up broadcasting infrastructure on own3D.tv and ZhiboBox. Commentated and broadcasted the games.
  • 8 players from 6 schools joined. The tournament finished after a group stage and a top-four playoff.

Skills

  • Programming: Python, C++, Shell script, Markdown, LaTeX
  • Technologies: TensorFlow, Keras, PyTorch, NumPy, OpenCV, Horovod, Slurm, Git, Bazel, Django
  • Hobbies: e-sports, StarCraft II, Karaoke
  • Languages: Mandarin Chinese (native), English (working proficiency, 116 in TOEFL)