← İçerik
Yapay Zeka · 6 dk okuma · 22 Nisan 2026

AD-Copilot: Vision-Language Model Trained for Factory Defect Detection

Researchers built a specialized multimodal AI that compares paired industrial images to spot subtle manufacturing flaws, outperforming general-purpose models and human inspectors on benchmark tasks.

Kaynak: arxiv/cs.AI · Xi Jiang, Yue Guo, Jian Li, Yong Liu, Bin-Bin Gao, Hanqiu Deng, Jun Liu, Heng Zhao, Chengjie Wang, Feng Zheng · orijinali aç ↗ ↗
Paylaş: X LinkedIn

AD-Copilot is a specialized vision-language model that detects manufacturing defects by comparing paired images, achieving 82.3% accuracy on industrial anomaly benchmarks.

  • General multimodal models fail at industrial defect detection because they lack domain-specific training on factory imagery.
  • AD-Copilot uses a Comparison Encoder that analyzes two images side-by-side via cross-attention, catching subtle visual differences.
  • Researchers curated Chat-AD, a large dataset of industrial images with precise labels for defect localization and visual question-answering.
  • Multi-stage training incorporates domain knowledge progressively, improving the model's ability to spot manufacturing anomalies.
  • On MMAD-BBox benchmark, AD-Copilot achieves 3.35× improvement over baseline and surpasses human expert performance on several tasks.
  • The model generalizes well to other specialized and general benchmarks, suggesting broad applicability beyond the training domain.

Sık sorulanlar

  • General multimodal models train on web images and encode each image independently, missing the subtle visual differences critical to industrial inspection. AD-Copilot solves this by training on factory-specific data and comparing paired images side-by-side using cross-attention, which highlights fine-grained differences humans and standard models would miss.

İlgili