Dataset Optimization, Remote Sensing

FAIR1M Dataset Case Study: Enhancing Satellite Image Object Detection with Hirundo

By
Michael (Misha) Leybovich
October 8, 2024
TL;DR

The FAIR1M dataset, part of the International Society for Photogrammetry and Remote Sensing (ISPRS) Benchmark on Object Detection in High-Resolution Satellite Images, was analyzed using Hirundo's proprietary Data Influence Engine.

We identified high-confidence mislabels in 510 objects across 293 frames (16.9% of the dataset’s overall frames). By correcting these high-confidence mislabels, we improved the model's precision from 70.7% to 75.8%, demonstrating the significant impact of data quality on model performance in satellite image analysis.

About the FAIR1M Dataset

The FAIR1M (Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery) dataset is a comprehensive collection of high-resolution satellite images used for object detection tasks. While the full dataset is extensive, our analysis focused on the currently released portion:

  • Original Dataset Size: 15,000 images with 1 million object instances
  • Analyzed Portion: 1,732 images containing 82,939 objects
  • Image Specifications: 1024x1024 pixels, RGB, 0.3–0.8m resolution
  • Object Categories: 5 main categories, 37 sub-categories
  • Scene Types: Various, including ships, vehicles, airplanes, and structures

The Importance of Reliable Remote Sensing Data for Various Industries

The FAIR1M dataset, adopted by ISPRS as a benchmark, is a game-changer for a range of industries. For defense organizations, it enables data scientists to develop AI models that enhance reconnaissance, threat detection, and decision-making through precise object recognition. The insurance industry also benefits from this dataset, using satellite imagery for assessing damage after natural disasters, improving risk assessment, and streamlining claims processes.

Beyond defense and insurance, FAIR1M's detailed satellite data is invaluable for disaster response, urban planning, environmental monitoring, and geospatial analysis. It allows for accurate mapping, infrastructure planning, and environmental assessments, helping organizations address complex challenges with advanced AI-driven insights.

Hirundo's Analysis: Mislabels in 17% of the Frames

Using our Data Influence Engine, we conducted a thorough analysis of the FAIR1M dataset:

  1. Suspect Identification: We identified 12,158 suspect labels, categorized as HIGH or MEDIUM confidence mislabels.
  2. High-Confidence Mislabels: 510 objects across 293 frames (16.9% of the total frames analyzed) were flagged as HIGH confidence mislabels.
  3. Relabeling Process: We focused on correcting the 510 HIGH confidence mislabels, using Hirundo's suggested labels.

Above you can find a sample of our discovered mislabels and their correction. For a deeper dive, check out our sample library here, or reach out to us for further inquiries.

Effects of an Improved Dataset on Model Precision

After applying our corrections to the high-confidence mislabels:

  • Original Model Precision: 70.7%
  • Improved Model Precision: 75.8%

This improvement in model precision was achieved by relabeling the top 510 suspect bounding boxes according to Hirundo's suggested labels, while keeping all other labels intact.

Benchmarking a model trained on the before and after datasets

Precision: Indicates how many of the positive predictions made by the model are actually correct. Higher precision means fewer false positives.

Recall: Shows how many of the actual positive cases were detected by the model. Higher recall means fewer false negatives.

mAP50: Measures the model's ability to correctly identify objects with at least 50% overlap between the predicted and actual bounding box.

mAP50-95: A comprehensive metric that averages precision across multiple IoU thresholds (from 50% to 95%), giving a better picture of the model's overall object detection performance.

Impact: Why Does it Matter

  1. Data Quality Impact: This case study underscores the critical importance of data quality in satellite image analysis. Even a small percentage of mislabels can significantly affect model performance.
  2. Efficient Quality Assurance: Hirundo's approach allows for targeted correction of high-confidence mislabels, providing substantial improvements without the need for comprehensive relabeling.
  3. Scalability: Given that we analyzed only a portion of the full FAIR1M dataset, the potential for improvement across the entire dataset is considerable.
  4. Application in Remote Sensing: The results demonstrate the value of automated data quality assurance in remote sensing applications, where manual verification of large datasets is often impractical.

Conclusion

The FAIR1M case study showcases Hirundo's ability to significantly enhance the quality of complex datasets in the field of satellite image analysis and remote sensing.

By identifying and correcting a small percentage of high-confidence mislabels, we achieved a notable improvement in model precision. This approach offers a scalable and efficient method for improving data quality and model performance in remote sensing and other domains where large, complex datasets are common.

Michael (Misha) Leybovich
CTO, Hirundo

Ready to forget?

Start removing unwanted data with a few clicks