Mapping models to use cases

Author: Stu Feeser

Understanding the transition from traditional models to the latest transformer-based advancements in AI can significantly impact strategic technology investments. Within the application of a Department of Transportation, the following use cases will apply. Here’s a detailed comparison for each of the expected tasks that will showcase both yesterday’s models and the cutting-edge transformer-based alternatives, where applicable. This analysis will reveal models we need to evaluate more closely.

1. Traffic Flow Analysis

General Algorithm: Motion Analysis and Tracking, Object Detection
Yesterday’s Models: YOLOv4, Deep SORT
Advanced Transformer-Based Model: Vision Transformer (ViT) for enhanced object detection accuracy in complex traffic scenes.
Why You Should Care: ViT can “see” the entire image, whereas the older models can only see a specific region of the image, so ViT will see that VW bug partially hidden by a tractor trailer. ViT can make sense of blurry images while the older tech will be confused. ViT will learn much more quickly given more context. We want ViT.

2. Road Condition Monitoring

General Algorithm: Object Detection, Image Classification
Yesterday’s Models: Mask R-CNN, EfficientDet
Advanced Transformer-Based Model: DETR (DEtection TRansformer) for identifying specific road damages with precise localization.
Why You Should Care: DETR can process the entire image at once, allowing it to inherently locate anomalies. This capability enables it to detect a pothole but ignore the fire hydrant since it is not on the road. Anyone who has suffered an ocular migraine knows not to walk around until the migraine passes. The old tech vision tech suffers from an ongoing ocular migraine.

3. License Plate Recognition (LPR)

General Algorithm: Optical Character Recognition (OCR)
Yesterday’s Models: CRNN, Tesseract (enhanced with deep learning methods)
Advanced Transformer-Based Model: Not necessarily applicable; CRNN and enhanced Tesseract remain effective for OCR tasks.
Why You Should Care: OCR works best by blocking out all of the image except the plate. The old tech can only see a small patch, which is desirable in this case.

4. Vehicle Classification

General Algorithm: Object Detection, Image Classification
Yesterday’s Models: YOLOv4, EfficientNet
Advanced Transformer-Based Model: Vision Transformer (ViT) offers a global understanding of the image for accurate vehicle classification.
Why You Should Care: ViT’s self-attention mechanism enables it to focus on fine details that are crucial for differentiating between closely related vehicle models or types. It can dynamically weigh the importance of different image regions, improving its ability to recognize distinctive features that define specific vehicle classes. The old tech can’t touch the capability of a transformer model.

5. Pedestrian Detection and Counting

General Algorithm: Object Detection, Motion Analysis and Tracking
Yesterday’s Models: Faster R-CNN with FPN, Deep SORT
Advanced Transformer-Based Model: DETR, leveraging its ability to handle complex scenes for pedestrian detection without predefined anchor boxes.
Why you should care - Imagine you’re putting together a complex puzzle, but instead of having to find and fit each piece individually, you could simply describe what the finished puzzle looks like and have the pieces fall into place on their own. That’s somewhat similar to how DETR (DEtection TRansformer) simplifies the process of identifying objects like pedestrians in an image in one pass. The old tech will work the problem as one puzzel peice at a time.

6. Sign Detection and Recognition

General Algorithm: Object Detection, Optical Character Recognition (OCR)
Yesterday’s Models: RetinaNet for detection, Tesseract OCR
Advanced Transformer-Based Model: DETR for sign detection with improved accuracy and efficiency.
Why you should care - DETR can instantly “see” all signs in an image, the process the letter on the sign with OCR. Old tech may not find all the signs.

7. Bridge Health Monitoring

General Algorithm: Object Detection, Image Classification
Yesterday’s Models: Mask R-CNN for detailed instance segmentation, EfficientNet
Advanced Transformer-Based Model: Not explicitly available; current models like Mask R-CNN still perform well for specific segmentation tasks in structural health monitoring.
Why you should care - HUGE RED FLAG here. There is not enough work being done with transformer based models here. Anomoly detection may be a better path than this broadly scoped use case.

8. Anomaly Detection in Infrastructure

General Algorithm: Anomaly Detection
Yesterday’s Models: Autoencoders, GANs (Generative Adversarial Networks)
Advanced Transformer-Based Model: Transformers for Anomaly Detection (TransAD) that can model complex data distributions for detecting infrastructure anomalies more effectively.
Why you should care - TransAD can process the entire image and detect subtle anomalies more effectively than older tech which struggels with the distinction between normal wear and actual damage which can be quite subtle.

9. Construction Progress Monitoring

General Algorithm: Time-lapse Analysis
Yesterday’s Models: LSTM Networks, CNN-LSTM
Advanced Transformer-Based Model: Video Vision Transformer (ViViT) or TimeSformer for capturing both spatial and temporal dimensions in construction site analysis.
Why You Should Care: This one is easy. Imagine if you could watch an entire movie from start to finish in an instant. You can see all the scenes, you are aware of the temporally changing data but can hold the entire movie vividly in your mind. That is how ViViT sees the world. Don’t do bad things ever, but especially not when ViViT is watching you.

10. Environmental Impact Analysis

General Algorithm: Image Classification, Semantic Segmentation
Yesterday’s Models: U-Net for semantic segmentation, ResNet
Advanced Transformer-Based Model: SegFormer, integrating MLPs with transformers for efficient and scalable semantic segmentation in environmental analysis.
Why you should care - SegFormer has a neural net that is ptimized for pixel-wise classification needed in segmentation. This focus allows SegFormer to excel in delineating various environmental features (e.g., water, trees, urban areas) with high precision.

11. Parking Space Availability Detection

General Algorithm: Object Detection, Image Classification
Yesterday’s Models: YOLOv4, CNNs with spatial transformer networks
Advanced Transformer-Based Model: DETR offers a streamlined approach for detecting available parking spaces, reducing the need for extensive preprocessing and postprocessing required by traditional CNN-based models.
Why You Should Care: DETR can “see” the entire image at once and can instantly detect empty parking spaces, for instance. DETR’s ability also makes it much easier to train how to recognize empty parking spots.

Conclusion

We need to evaluate the following tech more closely:

Transformers for Anomaly Detection (TransAD)
DETR (DEtection TRansformer)
Vision Transformer (ViT)

RED FLAG

Data model selection can significantly impact data curating processes.