Harsh Agrawal

I am a Research Scientist at Apple Machine Learning Research (MLR) group working with Alex Toshev. I received my PhD at Georgia Tech advised by Dhruv Batra in 2023. My research lies at the intersection of Computer Vision, Natural Language Processing and Embodied AI. My goal is to build multi-modal agents with 'open-world' cognition -- agents that can reason about novel scenarios, and learn new concepts incrementally in our dynamic physical world by:

combining complementary knowledge from multiple modalities (vision, language) to reason about familiar and unfamiliar concepts,
using different forms of reasoning (common-sense reasoning, deductive reasoning) to learn progressively more difficult concepts,
being robust to unknown environments, noisy actuation and sensors, and unseen instructions.

Some representative projects towards this goal are:

Vision, Language, and Action:
Novel Object Captioning, Visual Question Answering, Language-guided Navigation Agents,
Different Forms of Reasoning:
Common-sense reasoning using Large Language Models, Reasoning via Elimination,
Interpretability and Robustness:
Human vs Machine Attention, Robust VQA Models, Robust Indoor Navigation Models

During my PhD, I was fortunate to intern at Google Brain (2022) with Natasha Jaques, at Google (2021) with Peter Anderson, at NVIDIA (2020) with Gal Chechik, and at Facebook AI Research (2019) with Marcus Rohrbach. I also spent two wonderful semesters at UIUC working with Alex Schwing. My PhD was partially supported by the Snap Fellowship 2019 . In my free time, I also help maintain and manage an AI challenge hosting platform called EvalAI (part of CloudCV project) which aims to make AI research more reproducible. EvalAI hosts 150+ challenges and has 300+ contributors, 2M+ annual pageviews, 1400+ forks, 4500+ solved issues and merged pull requests, 3000+ ‘stars’ on Github, and financial/equipment support from Google, NVIDIA, and Amazon. Before this, I spent a couple of years as a Research Engineer at Snap Research where I was responsible for building large-scale infrastructure for visual recognition, search and developed algorithms for low-shot instance detection.
You can contact me at h-dot-agrawal092-at-gmail.com

Affiliations

Apple
(Summer 2023 - Present)
Google
(Summer 2021 - Dec 2022)
NVIDIA
(Summer 2020)
Facebook AI Research
(Summer 2019)
Georgia Tech
(2018 - Present)
Snap Research
(2016 - 2018)
Virginia Tech
(2014 - 2016)
Delhi Technological University
(2010- 2014)

News

June 2023

Started as a Research Scientist at Apple!

May 2023

Successfully defended my PhD Dissertation!

Nov 2022

One paper accepted in AAAI 2023!

Jul 2022

One paper accepted in ECCV 2022!

Jun 2022

Interning at Google Brain with Natasha Jaques.

Sep 2021

One paper accepted in NeurIPS 2021!

Jul 2021

Two papers accepted in ICCV 2021!

May 2021

One paper accepted in UAI 2021!

Jul 2020

One paper accepted in ECCV 2020!

Jun 2020

We were runner-up in the TextVQA Challenge 2020.

Dec 2019

Gave a lecture "On what's possible today?" in Dr. Parikh's Computer Vision course

Nov 2019

Gave a lecture on "Meta Learning" in Dr. Batra's Deep Learning course.

Jul 2019

Two papers accepted in ICCV 2019!

Jun 2019

We were runner-up in the TextVQA Challenge 2019.

Apr 2019

Was awarded the Rising Star Doctoral Student Research Award by Georgia Tech.

Feb 2019

CloudCV selected as a mentoring organization for Google Summer of Code 5th year in a row!

Publications

Large Language Models as Generalizable Policies for Embodied Tasks

Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev

ICLR 2024

Simple and Effective Synthesis of Indoor 3D Scenes

Jing Yu Koh*, Harsh Agrawal*, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson

AAAI 2023

Housekeep: Tidying Virtual Households using Commonsense Reasoning

Yash Kant, Arun Ramachandran, Sriram Yenamandra, Igor Gilitschenski, Dhruv Batra, Andrew Szot*, and Harsh Agrawal*

ECCV 2022

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra

Neural Information Processing Systems (NeurIPS) 2021

Known unknowns: Learning novel concepts using reasoning-by-elimination

Harsh Agrawal, Eli A. Meirom, Yuval Atzmon, Shie Mannor, Gal Chechik

Uncertainty in Artificial Intelligence (UAI) 2021 (Long Talk)

PDF
Talk

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Xiaoming Zhao, Harsh Agrawal, Dhruv Batra, Alex Schwing

International Conference on Computer Vision (ICCV) 2021

Contrast and Classify: Alternate Training for Robust VQA

Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal

International Conference on Computer Vision (ICCV) 2021

Spatially Aware Multimodal Transformers for TextVQA

Yash Kant, Dhruv Batra, Peter Anderson, Alexander Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal

European Conference on Computer Vision (ECCV) 2020

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

Jyoti Aneja^*, Harsh Agrawal^*, Dhruv Batra, Alexander Schwing

International Conference on Computer Vision (ICCV) 2019

nocaps: novel object captioning at scale

Harsh Agrawal^*, Karan Desai^*, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

International Conference on Computer Vision (ICCV) 2019

Sort Story: Sorting Jumbled Images and Captions into Stories

Harsh Agrawal^*, Arjun Chandrasekaran^*, Dhruv Batra, Devi Parikh, Mohit Bansal

Empirical Methods in Natural Language Processing (EMNLP) 2016

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

Abhishek Das^*, Harsh Agrawal^*, C. Lawrence Zitnick, Devi Parikh, Dhruv Batra

Computer Vision and Image Understanding (CVIU) 2017
Emperical Methods in Natural Language Processing (EMNLP) 2016
ICML 2016 Workshop on Visualization for Deep Learning (Best Student Paper)

Object-Proposal Evaluation Protocol is 'Gameable'

Neelima Chavali^*, Harsh Agrawal^*, Aroma Mahendru^*, Dhruv Batra

Conference on Computer Vision and Patter Recognition (CVPR) 2016 (Spotlight)

CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

Harsh Agrawal, Clint Solomon Mathialagan, Yash Goyal, Neelima Chavali, Prakriti Banik, Akrit Mohapatra, Ahmed Osman, Dhruv Batra

Book Chapter: Mobile Cloud Visual Media Computing, 265-290

EvalAI: Towards Better Evaluation Systems for AI Agents

Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra

AI Systems Workshop (SOSP 2019)

Fabrik: An Online Collaborative Neural Network Editor

Utsav Garg, Viraj Prabhu, Deshraj Yadav, Ram Ramrakhya, Harsh Agrawal, Dhruv Batra

AI Systems Workshop (SOSP 2019)

Talks

Known unknowns: Learning novel concepts using reasoning-by-elimination (UAI Oral Talk)

Projects

EvalAI

An open source platform to create, collaborate and participate in the AI Challenges. By simplifying and standardizing the process of benchmarking AI, we want to circumvent many of the factors impeding the rate of progress in AI.

Georgia Tech

Fabrik

An online collaborative platform to build, visualize and train deep learning models via a simple drag-and-drop interface.

Georgia Tech

Origami

A tool to allow researchers to automagically convert their deep learning models into an online service in a few simple steps.

Georgia Tech

CloudCV

A collection of open-source platforms that aims to make research more reproducible.

Georgia Tech

Garuda

An Unmanned Aerial Vehicle capable of performing autonomous flight & surveillance.

UAS-DTU

Aarush X-1

An indigenously developed UAV developed under the mentorship of Lockheed Martin

UAS-DTU

AMT Chat Interface

Source for the two-person chat interface used to collect the VisDial dataset on Amazon Mechanical Turk.

Virginia Tech

Noteworthy

An AI powered note-taking app developed while participating in MHacks 2014.

M Hacks

Mobishare

System designed to support opportunistic content search and sharing with limited 2G connection.

IIIT-D

Trippr

A collaborative trip planning app powered by IBM Watson based personal assistant.

AngelHacks 2015