July 31, 2023

Combining Segment Anything and Human Expertise

Meta AI's Segment Anything Model (SAM) generates segmented images with impressive accuracy. Check out our breakdown of SAM.

In this article, we show you how to combine SAM and human validation to label home interior images.

Use Case: Home Interior Image Semantic Segmentation

Imagine you’re building an augmented reality filter for your living room, allowing you to visualize how a new white couch would fit into the space. To achieve this, you need a machine learning model capable of accurately isolating and segmenting each object and surface in the living room. The key to build such a model is semantic segmentation, which labels each pixel and then assigns classes like tables and chairs to the segmented objects.

However, this is a tedious process. To address this, we turn to SAM for a semi-automated solution.

Let’s explore how we can apply this approach to segment the image below.

1. Segmentation Using SAM

We first use SAM to automatically segment the image. It generates the output below:

SAM can generate the segmented masks in less than 3 minutes. The same manual effort takes up to 2 hours, which is 40 times slower. Try it yourself here.

SAM did well segmenting out furniture such as sofas, tables, floor mats and wall features. Nevertheless, there are instances where SAM falls short, particularly in cases of over-segmentation. For example, it may separate the wooden floor and wall art surfaces into multiple disjointed parts.

Over-segmentation arises from SAM's approach of inferring objects based on small areas of the image, using a 32 x 32 grid of points. Each point attempts to predict a set of valid object masks, which can lead to unnecessary divisions. For instance, if a point lies on the arm of a chair, SAM may distinguish it as a separate object from the chair itself.

2. Human Validation to Enhance Segmentation

Recognizing SAM's shortcomings, we introduce a layer of human validation to refine the segmentation output. Human annotators play a crucial role in addressing over-segmented masks, such as odd floor tiles in the image mentioned earlier.

Additionally, human validation allows us to improve the tracing of objects' boundaries, ensuring more precise segmentation.

3. Assigning Classes and Grouping Instances

The next step involves assigning classes to the segmented objects. Human annotators classify each segment as a chair, table, curtain, or other relevant categories.

We then address instances where a single object is split into multiple segments due to obstructions, like the rug blocked by the sofa, coffee table, and chairs. By grouping these separate rug segments, we represent them as a single object.

The final output is shown below. Notice how the different parts of the rug now share the same color after grouping.

Conclusion

SAM has the potential to speed up annotations by 40X. But it may encounter difficulties in segmenting certain objects effectively. With a layer of human validation, we can ensure the output meets the quality requirements.

The synergy between AI tools like SAM and human expertise make data labeling faster and more accurate.