Driver Monitoring System

An AI-powered platform that detects dangerous driving behaviors in real-time using dual YOLO models, advanced computer vision, and intelligent detection algorithms running entirely in your browser.

System Architecture

Complete pipeline from image upload to final detection results using dual-model approach.

User Upload Preprocessing Chaitanya Model YOLOv8 5 Classes • 8400 Det Soham Model YOLO11 8 Classes • 8400 Det Parse & Filter Unify Classes Enhanced NMS IoU + Center Check Final Results
Complete detection pipeline from upload to final results

Dual Model Architecture

Two distinct YOLO architectures working in parallel for robust detection.

Model A

Chaitanya Model

YOLOv8 architecture optimized for core safety violations. Uses C2f blocks for efficient feature extraction.

Architecture YOLOv8
Core Block C2f (CSPDarknet)
Output [1, 9, 8400]
Classes 5

Detected Classes

  • 0: Cigarette
  • 1: Drinking
  • 2: Eating
  • 3: Phone
  • 4: Seatbelt

Training Distribution

  • Drinking: 771 samples
  • Cigarette: 365 samples
  • Seatbelt: 429 samples
  • Eating: 261 samples
  • Phone: 55 samples
Model B

Soham Model

YOLO11 next-gen architecture with PSA attention mechanisms and C3k2 blocks for detecting subtle behaviors.

Architecture YOLO11
Core Block C3k2 + PSA
Output [1, 12, 8400]
Classes 8

Detected Classes

  • 0: Distracted
  • 1: Drinking
  • 2: Drowsy
  • 3: Eating
  • 4: PhoneUse
  • 5: SafeDriving
  • 6: Seatbelt
  • 7: Smoking

Dataset Source

Both models were trained on datasets from Roboflow, a comprehensive platform for computer vision datasets and model training.

Visit Roboflow →

Class Name Unification

Standardizing class names across both models for consistent detection results.

Chaitanya Model Soham Model Unified Output
Cigarette Smoking Smoking
Phone PhoneUse Phone Usage
Drinking Drinking Drinking
Eating Eating Eating
Seatbelt Seatbelt Seatbelt
Distracted Distracted
Drowsy Drowsy
SafeDriving Filtered out

Image Preprocessing

Letterbox technique preserves aspect ratio while resizing to 640×640 pixels.

Images are resized using gray padding bars to maintain their original proportions. The preprocessing pipeline normalizes pixel values to [0, 1] and separates RGB channels into float32 tensors.

Original 800×600 Letterboxed 640×640 [1, 3, 640, 640] Tensor Float32
Preprocessing maintains aspect ratio using letterbox padding

Enhanced NMS Algorithm

Core innovation that solves duplicate detection using IoU and center containment.

Problem Same Person 59% 34% 32% IoU: 0.35 NMS Fails Enhanced NMS Solution Drinking 59% Center Check ✓ Single Box Clean Result 1. IoU > 0.45 → Duplicate | 2. Center inside box → Duplicate
Enhanced NMS uses both IoU and center containment to remove duplicates

Complete Detection Flow

Step-by-step process from image upload to final results.

1

Image Upload

User uploads driver images through browser interface. Files are validated and stored in memory as base64 data URLs.

2

Lazy Model Loading

On "Process Images" click, both ONNX models download and initialize with WebAssembly. Models cache in memory for reuse.

3

Preprocessing

Images resize to 640×640 with letterbox, normalize to [0, 1], and convert to float32 tensors with separated RGB channels.

4

Parallel Inference

Both models process tensors simultaneously, outputting 8400 potential detections with bounding boxes and confidence scores.

5

Parse & Filter

Extract detections above 0.25 confidence threshold. Convert coordinates from model space back to original image dimensions.

6

Class Unification

Map different class names to standardized labels (Phone/PhoneUse → Phone Usage, Cigarette → Smoking).

7

Enhanced NMS

Group detections by class, sort by confidence, apply NMS to remove duplicates while preserving separate instances.

8

Visualization

Draw color-coded bounding boxes on canvas (red for dangerous, green for safe). Generate and display safety instructions.

Team & Contributions

Developed under Advanced Course on Green Skills and AI (Skills4Future Program).

Chaitanya kulkarni

YOLOv8 Model Training

Soham Jadhav

YOLO11 Model Training

Divyanshu Mishra

CNN Model Training

Anurag Pawar

Backend Logic

Additional Models & Backend

Divyanshu Mishra trained a CNN-based model that achieves excellent accuracy but is not browser-compatible due to computational requirements. This model can be run locally for enhanced detection capabilities.

Anurag Pawar developed the backend logic and server infrastructure for advanced deployment scenarios.

For complete model access and backend implementation, visit our GitHub repository.

Program: Advanced Course on Green Skills and Artificial Intelligence
Organized by: Edunet Foundation, AICTE, Shell India Markets Pvt. Ltd.
Mentor: Professor Sarthak Narnor