Harsh Agrawal

I am a Research Scientist at Apple Machine Learning Research (MLR) group working with Alex Toshev. I received my PhD at Georgia Tech advised by Dhruv Batra in 2023. My research lies at the intersection of Computer Vision, Natural Language Processing and Embodied AI. My goal is to build multi-modal agents with 'open-world' cognition -- agents that can reason about novel scenarios, and learn new concepts incrementally in our dynamic physical world by:

  • combining complementary knowledge from multiple modalities (vision, language) to reason about familiar and unfamiliar concepts,
  • using different forms of reasoning (common-sense reasoning, deductive reasoning) to learn progressively more difficult concepts,
  • being robust to unknown environments, noisy actuation and sensors, and unseen instructions.
Some representative projects towards this goal are:

During my PhD, I was fortunate to intern at Google Brain (2022) with Natasha Jaques, at Google (2021) with Peter Anderson, at NVIDIA (2020) with Gal Chechik, and at Facebook AI Research (2019) with Marcus Rohrbach. I also spent two wonderful semesters at UIUC working with Alex Schwing. My PhD was partially supported by the Snap Fellowship 2019 . In my free time, I also help maintain and manage an AI challenge hosting platform called EvalAI (part of CloudCV project) which aims to make AI research more reproducible. EvalAI hosts 150+ challenges and has 300+ contributors, 2M+ annual pageviews, 1400+ forks, 4500+ solved issues and merged pull requests, 3000+ ‘stars’ on Github, and financial/equipment support from Google, NVIDIA, and Amazon. Before this, I spent a couple of years as a Research Engineer at Snap Research where I was responsible for building large-scale infrastructure for visual recognition, search and developed algorithms for low-shot instance detection.
You can contact me at h-dot-agrawal092-at-gmail.com



June 2023
Started as a Research Scientist at Apple!
May 2023
Successfully defended my PhD Dissertation!
Nov 2022
One paper accepted in AAAI 2023!
Jul 2022
One paper accepted in ECCV 2022!
Jun 2022
Interning at Google Brain with Natasha Jaques.
Sep 2021
One paper accepted in NeurIPS 2021!
Jul 2021
Two papers accepted in ICCV 2021!
May 2021
One paper accepted in UAI 2021!
Jul 2020
One paper accepted in ECCV 2020!
Jun 2020
We were runner-up in the TextVQA Challenge 2020.
Dec 2019
Gave a lecture "On what's possible today?" in Dr. Parikh's Computer Vision course
Nov 2019
Gave a lecture on "Meta Learning" in Dr. Batra's Deep Learning course.
Jul 2019
Two papers accepted in ICCV 2019!
Jun 2019
We were runner-up in the TextVQA Challenge 2019.
Apr 2019
Was awarded the Rising Star Doctoral Student Research Award by Georgia Tech.
Feb 2019
CloudCV selected as a mentoring organization for Google Summer of Code 5th year in a row!


Large Language Models as Generalizable Policies for Embodied Tasks
Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev
ICLR 2024
Simple and Effective Synthesis of Indoor 3D Scenes
Jing Yu Koh*, Harsh Agrawal*, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
AAAI 2023
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot*, and Harsh Agrawal*
ECCV 2022
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra
Neural Information Processing Systems (NeurIPS) 2021
Known unknowns: Learning novel concepts using reasoning-by-elimination
Harsh Agrawal, Eli A. Meirom, Yuval Atzmon, Shie Mannor, Gal Chechik
Uncertainty in Artificial Intelligence (UAI) 2021 (Long Talk)
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, Alex Schwing
International Conference on Computer Vision (ICCV) 2021
Contrast and Classify: Alternate Training for Robust VQA
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
International Conference on Computer Vision (ICCV) 2021
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant, Dhruv Batra, Peter Anderson, Alexander Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
European Conference on Computer Vision (ECCV) 2020
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Jyoti Aneja*, Harsh Agrawal*, Dhruv Batra, Alexander Schwing
International Conference on Computer Vision (ICCV) 2019
nocaps: novel object captioning at scale
Harsh Agrawal*, Karan Desai*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
International Conference on Computer Vision (ICCV) 2019
Sort Story: Sorting Jumbled Images and Captions into Stories
Harsh Agrawal*, Arjun Chandrasekaran*, Dhruv Batra, Devi Parikh, Mohit Bansal
Empirical Methods in Natural Language Processing (EMNLP) 2016
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
Abhishek Das*, Harsh Agrawal*, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra
Computer Vision and Image Understanding (CVIU) 2017
Emperical Methods in Natural Language Processing (EMNLP) 2016
ICML 2016 Workshop on Visualization for Deep Learning (Best Student Paper)
Object-Proposal Evaluation Protocol is 'Gameable'
Neelima Chavali*, Harsh Agrawal*, Aroma Mahendru*, Dhruv Batra
Conference on Computer Vision and Patter Recognition (CVPR) 2016 (Spotlight)
CloudCV: Large Scale Distributed Computer Vision as a Cloud Service
Harsh Agrawal, Clint Solomon Mathialagan, Yash Goyal, Neelima Chavali, Prakriti Banik, Akrit Mohapatra, Ahmed Osman, Dhruv Batra
Book Chapter: Mobile Cloud Visual Media Computing, 265-290

EvalAI: Towards Better Evaluation Systems for AI Agents
Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra
AI Systems Workshop (SOSP 2019)
Fabrik: An Online Collaborative Neural Network Editor
Utsav Garg, Viraj Prabhu, Deshraj Yadav, Ram Ramrakhya, Harsh Agrawal, Dhruv Batra
AI Systems Workshop (SOSP 2019)


Known unknowns: Learning novel concepts using reasoning-by-elimination (UAI Oral Talk)