Vision Agent
VLM for HSE Compliance
Tech Stack
Qwen2-VL • Transformers • Python • Gradio • Docker • ONNX Runtime
Overview
Vision AI for HSE compliance inspection using Qwen2-VL multimodal model for safety scene understanding.
Problem
Traditional object detection (YOLO, Faster R-CNN) misses behavioral context. A worker wearing a hardhat with an unsecured chin strap passes detection but fails compliance. Safety requires reasoning about actions and environment, not just classifying objects.
Solution
Vision Language Model pipeline using Qwen2-VL that processes site images and generates structured safety assessments across 5 categories — PPE, Housekeeping, Fall Protection, Fire Safety, and Electrical Safety — with severity classification and corrective actions.
Architecture
Site Camera → Image Preprocessing → Qwen2-VL Inference (ONNX Runtime) → Safety Reasoning → Severity Classification → HSE Report