Summary of Computer Vision: 2nd lecture (regions of images and segmentation, feature detection)

Summary of "Computer Vision: 2nd lecture (regions of images and segmentation, feature detection)"

This lecture covers intermediate topics in computer vision focusing on image segmentation, clustering (K-means), image compression, and feature detection methods such as RANSAC, Hough transform, and the Harris corner detector. It builds on the previous lecture’s foundation of low-level vision tasks like edge detection.

Main Ideas and Concepts

1. Review of Previous Lecture

Computer vision defined with AI/machine learning perspective.
Differences between image processing, computer vision, and computer graphics.
Vision system design challenges, including human perception vs. computer perception.
Three levels of vision: low-level, intermediate-level, and high-level.
Digital image representation: grayscale and RGB images as matrices.
Noise and filtering using convolution (e.g., local average filter, Gaussian filter).
Edge detection basics, Sobel operator, and Canny edge detector.

2. Regions of Images and Segmentation

Segmentation Goal: Group pixels into coherent regions or objects based on similarity criteria.
Often a preprocessing step for higher-level tasks like object recognition.
Example: Segmenting cells (normal vs. cancerous) by binarization based on grayscale thresholding.
Conversion of RGB images to grayscale by averaging R, G, B channels.
Thresholding to create binary images for segmentation.
Challenges with lighting and noise affecting segmentation quality.
Segmentation in autonomous driving: separating cars, roads, sidewalks, etc.
Classification of segments is a separate step after segmentation.

Segmentation Approaches:

Perceptual Grouping: Inspired by human perception, Gestalt theory (laws of proximity, similarity, closure, etc.).
Merging Algorithms (Bottom-up): Start from pixels, merge neighboring pixels into larger segments.
Splitting Algorithms (Top-down): Start from whole image, split into smaller uniform regions.
Combination (Quadtree): Split and merge iteratively.
Histogram-based Segmentation: Use grayscale histograms to identify thresholds separating segments.
Noise complicates histogram-based segmentation; filtering (median, Gaussian) may be needed before segmentation.
Lighting variations significantly affect segmentation quality; controlled lighting preferred.

3. Clustering and K-means Algorithm

K-means is an unsupervised clustering algorithm used to group data points into K clusters.
Works by iteratively assigning points to the nearest cluster mean and updating cluster means until convergence.
Random initialization leads to non-deterministic results.
Limitations: must specify number of clusters, sensitive to outliers, assumes spherical clusters.
Comparison with EM (Expectation Maximization) clustering which models elliptical clusters and is more flexible but complex.
Application of K-means in image compression:
- Original images may have tens of thousands of unique colors.
- K-means reduces colors to a smaller set (e.g., 64, 10, 5, or 2 colors).
- This reduces image size while maintaining visual quality better than random color selection.
- Implementation involves reshaping image data, sampling pixels for training, and predicting cluster assignments for all pixels.

4. Feature Detection

Features are distinctive local parts of an image (edges, corners, lines).
Local features help in higher-level tasks like object recognition, image matching, and tracking.
Challenges for feature detection:
- Invariance to translation, rotation, scale.
- Robustness to lighting changes, noise, blur.
Types of features discussed:
- Edges: Already covered (Sobel, Canny).
- Straight Lines: Detected using Hough transform.
- Corners: Detected using Harris corner detector.

5. RANSAC Algorithm

A generic, iterative method to estimate model parameters from data with outliers.
Randomly selects subsets of data points, fits a model, counts inliers fitting the model.
Repeats until best model with maximum inliers is found.
Useful for line fitting and many other model-fitting tasks.
Limitations: fails with excessive noise/outliers.

6. Hough transform for Line Detection

Converts edge points from image space to parameter space (R, θ).
Each edge point corresponds to a sinusoidal curve in (R, θ) space.
Intersection of many such curves indicates a line in the original image.
Robust to noise and partial lines.
Computationally expensive; various optimizations exist.

7. Harris corner detector

Detects corners by analyzing changes in image intensity when shifting a small window.
Uses Taylor expansion to approximate intensity changes.
Constructs a 2x2 matrix (M) from image gradients.
Corner response R is computed using eigenvalues (λ1, λ2) of M:
- Large λ1 and λ2 → corner.
- One large and one small eigenvalue → edge.
- Both small → flat region.