Harsh Agrawal

I am a fourth year Ph.D student at Georgia Tech advised by Dhruv Batra. I am also working as a Student Researcher at Google Brain with Natasha Jaques. I also closely collaborate with Devi Parikh and Alexander Schwing. My research lies at the intersection of computer vision and natural language processing with a focus on visio-linguistic understanding for Embodied AI. During my PhD, I have been fortunate to intern at Facebook AI Research (2019) with Marcus Rohrbach, at NVIDIA (2020) with Gal Chechik and at Google (2021) with Peter Anderson.

In my free time, I also help maintain and manage an AI challenge hosting platform called EvalAI (part of CloudCV project) which aims to make AI research more reproducible. EvalAI hosts 150+ challenges and has 300+ contributors, 2M+ annual pageviews, 1400+ forks, 4500+ solved issues and merged pull requests, 3000+ ‘stars’ on Github, and financial/equipment support from Google, NVIDIA, and Amazon. Before this, I spent a couple of years as a Research Engineer at Snap Research where I was responsible for building large-scale infrastructure for visual recognition, search and developed algorithms for low-shot instance detection.

You can reach me at hagrawal9 at gatech dot edu



Sep 2021
One paper accepted in NeurIPS 2021!
Jul 2021
Two papers accepted in ICCV 2021!
May 2021
One paper accepted in UAI 2021!
Jul 2020
One paper accepted in ECCV 2020!
Jun 2020
We were runner-up in the TextVQA Challenge 2020.
Dec 2019
Gave a lecture "On what's possible today?" in Dr. Parikh's Computer Vision course
Nov 2019
Gave a lecture on "Meta Learning" in Dr. Batra's Deep Learning course.
Jul 2019
Two papers accepted in ICCV 2019!
Jun 2019
We were runner-up in the TextVQA Challenge 2019.
Apr 2019
Received the College of Computing CS7001 Research Award.
Feb 2019
CloudCV selected as a mentoring organization for Google Summer of Code 5th year in a row!
Jan 2019
Received the Snap Fellowship!


Housekeep: Tidying Virtual Households using Commonsense Reasoning
Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot*, and Harsh Agrawal*
Preprint 2022
  • Coming Soon
Simple and Effective Synthesis of Indoor 3D Scenes
Jing Yu Koh*, Harsh Agrawal*, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
Preprint 2022
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra
Neural Information Processing Systems (NeurIPS) 2021
Known unknowns: Learning novel concepts using reasoning-by-elimination
Harsh Agrawal, Eli A. Meirom, Yuval Atzmon, Shie Mannor, Gal Chechik
Uncertainty in Artificial Intelligence (UAI) 2021 (Long Talk)
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation
Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, Alex Schwing
International Conference on Computer Vision (ICCV) 2021
Contrast and Classify: Alternate Training for Robust VQA
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
International Conference on Computer Vision (ICCV) 2021
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant, Dhruv Batra, Peter Anderson, Alexander Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal
European Conference on Computer Vision (ECCV) 2020
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
Jyoti Aneja*, Harsh Agrawal*, Dhruv Batra, Alexander Schwing
International Conference on Computer Vision (ICCV) 2019
nocaps: novel object captioning at scale
Harsh Agrawal*, Karan Desai*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
International Conference on Computer Vision (ICCV) 2019
Sort Story: Sorting Jumbled Images and Captions into Stories
Harsh Agrawal*, Arjun Chandrasekaran*, Dhruv Batra, Devi Parikh, Mohit Bansal
Empirical Methods in Natural Language Processing (EMNLP) 2016
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
Abhishek Das*, Harsh Agrawal*, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra
Computer Vision and Image Understanding (CVIU) 2017
Emperical Methods in Natural Language Processing (EMNLP) 2016
ICML 2016 Workshop on Visualization for Deep Learning (Best Student Paper)
Object-Proposal Evaluation Protocol is 'Gameable'
Neelima Chavali*, Harsh Agrawal*, Aroma Mahendru*, Dhruv Batra
Conference on Computer Vision and Patter Recognition (CVPR) 2016 (Spotlight)
CloudCV: Large Scale Distributed Computer Vision as a Cloud Service
Harsh Agrawal, Clint Solomon Mathialagan, Yash Goyal, Neelima Chavali, Prakriti Banik, Akrit Mohapatra, Ahmed Osman, Dhruv Batra
Book Chapter: Mobile Cloud Visual Media Computing, 265-290

EvalAI: Towards Better Evaluation Systems for AI Agents
Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra
AI Systems Workshop (SOSP 2019)
Fabrik: An Online Collaborative Neural Network Editor
Utsav Garg, Viraj Prabhu, Deshraj Yadav, Ram Ramrakhya, Harsh Agrawal, Dhruv Batra
AI Systems Workshop (SOSP 2019)


Known unknowns: Learning novel concepts using reasoning-by-elimination (UAI Oral Talk)