Task Description

Figure 1: An example of agent’s egocentric view during anepisode. Each episode contains 2-5 YCB objects that needto be rearranged in an indoor environment. The blue circle indicates the cross-hair for the magic pointer grab action.

In the rearrangement task, the agent is spawned randomly in a house and is asked to find a small set of objects scattered around the house and place them in their desired final position as efficiently as possible.

Data

Scenes: We use a manually-selected subset of 55 photorealistic scans of indoor environments from the Gibson dataset. These 55 scenes are uncluttered ‘empty’ apartments/houses, i.e. they contain no or very few furniture as part of the scanned mesh. Scanned object meshes are programmatically inserted into these scenes to create episodes. This combination of empty houses and inserted objects allows for controlled generation of training and testing episodes.


Objects: We use object scans from the YCB Dataset. These objects are small enough that they can pass through doors and hallways within the house.


Episodes: As illustrated in Figure 1, each episode requires the agent to rearrange 2-5 objects. At the beginning of each episode, the agent is provided with the goal location and rotation of each object via point-coordinates (3D coordinate of the center of mass of the object). The agent is also given the spawn location and rotation of the agent, initial object location, rotation, and type in the environment. Finally, for each object, the episode defines goal location and rotation for each object’s centre-of-mass (COM).


            
              {  
                'episode_id': 0,  
                'scene_id': 'data/scene_datasets/coda/coda.glb’,  
                'start_position': [-0.15, 0.18, 0.29],  
                'start_rotation': [-0.0, -0.34, -0.0, 0.93]  
                'objects’: [
                  {    
                    'object_id': 0,    
                    'object_template': 'data/test_assets/objects/chair',    
                    'position': [1.77, 0.67, -1.99],    
                    'rotation': [0.0, 0.0, 0.0, 1.0]  
                  }
                ],   
                'goals’: [
                  {    
                    'position': [4.34, 0.67, -5.06],    
                    'rotation': [0.0, 0.0, 0.0, 1.0]   
                  }
                ],
              }
            
          

Agent Specification

Agent: The agent is a virtual Locobot. The simulated agent’s base-radius is 0.61m and the height is 0.175m which matches the LoCoBot dimensions.


Sensors: The agent is equipped with an RGB-D camera placed at the height of 1.5m from the center of the agent’s base and is looking down 30 degrees. The sensor has a resolution of 256x256 pixels and a 90 degree field of view. To mimic the depth camera’s limitations, we clip simulated depth sensing to 10m. The agent is also equipped with a GPS+Compass sensor, providing agent location (x, y, z) and heading (azimuth angle) in an episodic coordinate system defined by agent’s spawn location (origin) and heading (0).


Figure 2: Grab Release Action uses magic pointer abstraction to pick nearby objects. Any object under a fixed crosshair in the agent’s viewport can be picked by the agent if it is within a certain distance threshold.

Actions: The action space for the rearrangement task consists of navigation and interactive actions.

  1. Move Forward: The agent moves forward by 0.25m
  2. Turn Right: The agent turns right by 10 degrees
  3. Turn Left: The agent turns right by 10 degrees
  4. Grab: Grabs an object using the magic pointer abstraction discussed earlier to pick nearby objects that are visible in the agent’s field of view.
  5. Release: Realeases an object that the agent is holding.

Evaluation

In the object rearrangement scenario, task progress is measured by how close the object is placed with respect to the goal pose. We measure the following metrics:

  1. Average distance to target (\(d_{T}\)) measures the distance between the object's current location and the desired goal location.
  2. Episode Success (\(S\)) measures whether the episode is successful or not. An episode is considered successful (\(S=1\)) if the all objects stops within 0.50 meters of their respective goal locations, otherwise the episode is marked as failed (\(S=0\)).
  3. Object Placement Success \(PS\) which measures the percentage of objects placed successfully according to the success criteria for each episode.
  4. Success Weighted by Path Length (\(SPL\)) intuitively captures how closely the agent followed the optimal path and successfully completed the episode.. Given the length of the optimal-path trajectory \(l\) and the length of an agent's path \(l_a\) for an episode. SPL is defined as \(\frac{l}{\text{max}(l_a, l)}\). SPL intuitively captures how closely the agent followed the optimal path and successfully completed the episode.

Code

Code in Habitat Lab.

Tutorial: You can watch how to build an interactive task in Habitat Lab by following the tutorial here:

Youtube Video , Colab Notebook