Yapay Zeka · 6 dk okuma · 22 Nisan 2026
AD-Copilot: Vision-Language Model Trained for Factory Defect Detection
Researchers built a specialized multimodal AI that compares paired industrial images to spot subtle manufacturing flaws, outperforming general-purpose models and human inspectors on benchmark tasks.
Kaynak: arxiv/cs.AI · Xi Jiang, Yue Guo, Jian Li, Yong Liu, Bin-Bin Gao, Hanqiu Deng, Jun Liu, Heng Zhao, Chengjie Wang, Feng Zheng · orijinali aç ↗ ↗
AD-Copilot is a specialized vision-language model that detects manufacturing defects by comparing paired images, achieving 82.3% accuracy on industrial anomaly benchmarks.
- — General multimodal models fail at industrial defect detection because they lack domain-specific training on factory imagery.
- — AD-Copilot uses a Comparison Encoder that analyzes two images side-by-side via cross-attention, catching subtle visual differences.
- — Researchers curated Chat-AD, a large dataset of industrial images with precise labels for defect localization and visual question-answering.
- — Multi-stage training incorporates domain knowledge progressively, improving the model's ability to spot manufacturing anomalies.
- — On MMAD-BBox benchmark, AD-Copilot achieves 3.35× improvement over baseline and surpasses human expert performance on several tasks.
- — The model generalizes well to other specialized and general benchmarks, suggesting broad applicability beyond the training domain.
Sık sorulanlar
- General multimodal models train on web images and encode each image independently, missing the subtle visual differences critical to industrial inspection. AD-Copilot solves this by training on factory-specific data and comparing paired images side-by-side using cross-attention, which highlights fine-grained differences humans and standard models would miss.