Vision Agent

VLM for HSE Compliance

VLMComputer VisionSafety

Tech Stack

Qwen2-VL • Transformers • Python • Gradio • Docker • ONNX Runtime

VLMMultimodal
HSECompliance
Real-TimeAnalysis

Overview

Vision AI for HSE compliance inspection using Qwen2-VL multimodal model for safety scene understanding.

Problem

Traditional object detection (YOLO, Faster R-CNN) misses behavioral context. A worker wearing a hardhat with an unsecured chin strap passes detection but fails compliance. Safety requires reasoning about actions and environment, not just classifying objects.

Solution

Vision Language Model pipeline using Qwen2-VL that processes site images and generates structured safety assessments across 5 categories — PPE, Housekeeping, Fall Protection, Fire Safety, and Electrical Safety — with severity classification and corrective actions.

Architecture

Site Camera → Image Preprocessing → Qwen2-VL Inference (ONNX Runtime) → Safety Reasoning → Severity Classification → HSE Report

Explore This Project

View the source code and architecture.