Inventing the next generation of creative tools to “empower people to express themselves, live in the moment, learn about the world, and have fun together.”
Machine Learning and Perception Lab, Virginia Tech.
Advisor: Dr. Dhruv Batra
Worked on problems at the intersection of computer vision, natural language and machine learning.
Google Summer of Code (GSOC), '15, '16
Mentored three GSOC students for the summer who contributed to CloudCV as part of Google Summer of Code, 2015.
Worked on applying machine learning techniques to automate work-flows for sales lifecycle in Microsoft Dynamics CRM
Graduated from Delhi College of Engineering
Worked on applications of computer vision for an Unmanned Aerial Vehicle.
Machine Learning and Perception Lab, Virginia Tech
Advisor: Dr. Dhruv Batra
Worked on building the first prototype of CloudCV.
Mobile and Ubiquitous Computing, IIIT-Delhi
Developed a cloud enabled cell broadcast service based localization algorithms for Android smartphones. Also designed and implemented an algorithm to build mobility profiles for predicting encounters between mobile phone users allowing them to share content locally through bluetooth.
Unmanned Aerial Systems - Delhi Technological University
Performed autonomous extraction and segmentation of objects from aerial imagery in natural scenes for Student UAS Competition. Developed a robust, user-friendly GUI on Qt to control camera properties and process the imagery feed acquired wirelessly.
Temporal common sense has applications in AI tasks such as QA, multi-document summarization, and human-AI communication. We propose the task of sequencing -- given a jumbled set of aligned image-caption pairs that belong to a story, the task is to sort them such that the output sequence forms a coherent story. We present multiple approaches, via unary (position) and pairwise (order) predictions, and their ensemble-based combinations, achieving strong results on this task. As features, we use both text-based and image-based features, which depict complementary improvements. Using qualitative examples, we demonstrate that our models have learnt interesting aspects of temporal common sense.
We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualizations) and quantitatively (via rank-order correlation). Overall, our experiments show that current attention models in VQA do not seem to be looking at the same regions as humans.
Object proposals have quickly become the de-facto pre-processing step in a number of vision pipelines (for object detection, object discovery, and other tasks). Their performance is usually evaluated on partially annotated datasets. In this paper, we argue that the choice of using a partially annotated dataset for evaluation of object proposals is problematic -- as we demonstrate via a thought experiment, the evaluation protocol is 'gameable', in the sense that progress under this protocol does not necessarily correspond to a "better" category independent object proposal algorithm.
We are witnessing a proliferation of massive visual data. Unfortunately scaling existing computer vision algorithms to large datasets leaves researchers repeatedly solving the same algorithmic, logistical, and infrastructural problems. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms. We present CloudCV, a comprehensive system to provide access to state-of-the-art distributed computer vision algorithms as a cloud service through a Web Interface and APIs.