Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark

Juncheng Li1, David J. Cappelleri1,2

1Multi-Scale Robotics & Automation Lab, School of Mechanical Engineering, Purdue University, West Lafayette, IN USA

2The Weldon School of Biomedical Engineering (By Courtesy), Purdue University, West Lafayette, IN USA

Figure 1

Abstract

In this paper, we present Sim-Grasp, a robust 6-DOF two-finger grasping system that integrates advanced language models for enhanced object manipulation in cluttered environments. We introduce the Sim-Grasp-Dataset, which includes 1,550 objects across 500 scenarios with 7.9 million annotated labels, and develop Sim-GraspNet to generate grasp poses from point clouds. The Sim-Grasp-Polices achieve grasping success rates of 97.14% for single objects and 87.43% and 83.33% for mixed clutter scenarios of Levels 1-2 and Levels 3-4 objects, respectively. By incorporating language models for target identification through text and box prompts, Sim-Grasp enables both object-agnostic and target picking, pushing the boundaries of intelligent robotic systems.

Figures and Tables

Figure 1
Overview of Sim-Grasp system. Sim-Grasp is a deep-learning based system to determine the robust 6-DOF two-finger grasp poses in cluttered environments.
Figure 2
Example of a 6D Grasping Label Dataset. To better visualize the dataset, only a subset of the candidate grasps is displayed after passing collision checks. Green markers indicate successful grasps with a grasp score of 1, while red markers represent unsuccessful grasps with a grasp score of 0.
Figure 3
The Sim-Suction 6D suction grasp pose policy. The green marker represents the 6D grasp pose for the object instance with the highest confidence score. The transparency of the blue markers indicates the confidence score, with higher transparency implying lower confidence and vice versa.
Figure 4
Sim-Grasp Architecture. The Sim-GraspNet network provides the backbone for the Sim-Grasp multi-modal grasping policies. The green marker represents the 6D grasp pose for the object instance with the highest confidence score. The transparency of the blue markers indicates the confidence score, with higher transparency implying lower confidence and vice versa
Figure 5
The experiment setup with Fetch robot equipped with RGB-D camera. The robot picks up objects from the workspace and drops them in the collection bin. We choose 64 household items, with 13 objects in Level 1, 19 objects in Level 2, 21 objects in Level 3, and 11 objects in Level 4.
Figure 6
Example results of the Sim-Grasp-Policies in various scenarios: (a) Grasping a complex 3D printed part. (b) Grasping an object from a partially occluded point cloud. (c) Grasping in a cluttered environment. (d) Targeted picking using a text prompt to pick up a green dinosaur. (e) Targeted picking using a box prompt to pick up the item within the selected region.
Figure 7
Top Row: Performance comparison of the three policies on a red bowl, highlighting challenges in grasping objects with curved surfaces. Middle Row: Performance comparison of the three policies on a joystick, illustrating difficulties in handling objects with irregular shapes. Bottom Row: Performance comparison of the three policies in a cluttered scenario.
Figure 7
Sim-Grasp-Polices failure cases. (a) Multi-grasp and collision scenario leading to an unsecured grasp. (b) Difficulty in grasping ball-shaped objects resulting in empty grasps. (c) Unstable grasp pose on an object with complex geometry causing the object to flip. These cases illustrate the challenges faced in cluttered environments and with objects of varied geometries.
Figure 7

Experimental Videos

Citation and arXiv Link

Arxiv:2405.00841 BibTeX:
        @misc{li2024simgrasp,
              title={Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark}, 
              author={Juncheng Li and David J. Cappelleri},
              year={2024},
              eprint={2405.00841},
              archivePrefix={arXiv},
              primaryClass={cs.RO}
        }
        

Acknowledgements

The authors would like to acknowledge the use of the facilities at the Indiana Next Generation Manufacturing Competitiveness Center (IN-MaC) for this paper. A portion of this work was supported by a Space Technology Research Institutes grant (# 80NSSC19K1076) from NASA’s Space Technology Research Grants Program.

Lab logo 1 Lab logo 2 Funding logo 1