Summary of Computer Vision: 2nd lecture (regions of images and segmentation, feature detection)
Summary of "Computer Vision: 2nd lecture (regions of images and segmentation, feature detection)"
This lecture covers intermediate topics in computer vision focusing on image segmentation, clustering (K-means), image compression, and feature detection methods such as RANSAC, Hough transform, and the Harris corner detector. It builds on the previous lecture’s foundation of low-level vision tasks like edge detection.
Main Ideas and Concepts
1. Review of Previous Lecture
- Computer vision defined with AI/machine learning perspective.
- Differences between image processing, computer vision, and computer graphics.
- Vision system design challenges, including human perception vs. computer perception.
- Three levels of vision: low-level, intermediate-level, and high-level.
- Digital image representation: grayscale and RGB images as matrices.
- Noise and filtering using convolution (e.g., local average filter, Gaussian filter).
- Edge detection basics, Sobel operator, and Canny edge detector.
2. Regions of Images and Segmentation
- Segmentation Goal: Group pixels into coherent regions or objects based on similarity criteria.
- Often a preprocessing step for higher-level tasks like object recognition.
- Example: Segmenting cells (normal vs. cancerous) by binarization based on grayscale thresholding.
- Conversion of RGB images to grayscale by averaging R, G, B channels.
- Thresholding to create binary images for segmentation.
- Challenges with lighting and noise affecting segmentation quality.
- Segmentation in autonomous driving: separating cars, roads, sidewalks, etc.
- Classification of segments is a separate step after segmentation.
Segmentation Approaches:
- Perceptual Grouping: Inspired by human perception, Gestalt theory (laws of proximity, similarity, closure, etc.).
- Merging Algorithms (Bottom-up): Start from pixels, merge neighboring pixels into larger segments.
- Splitting Algorithms (Top-down): Start from whole image, split into smaller uniform regions.
- Combination (Quadtree): Split and merge iteratively.
- Histogram-based Segmentation: Use grayscale histograms to identify thresholds separating segments.
- Noise complicates histogram-based segmentation; filtering (median, Gaussian) may be needed before segmentation.
- Lighting variations significantly affect segmentation quality; controlled lighting preferred.
3. Clustering and K-means Algorithm
- K-means is an unsupervised clustering algorithm used to group data points into K clusters.
- Works by iteratively assigning points to the nearest cluster mean and updating cluster means until convergence.
- Random initialization leads to non-deterministic results.
- Limitations: must specify number of clusters, sensitive to outliers, assumes spherical clusters.
- Comparison with EM (Expectation Maximization) clustering which models elliptical clusters and is more flexible but complex.
- Application of K-means in image compression:
- Original images may have tens of thousands of unique colors.
- K-means reduces colors to a smaller set (e.g., 64, 10, 5, or 2 colors).
- This reduces image size while maintaining visual quality better than random color selection.
- Implementation involves reshaping image data, sampling pixels for training, and predicting cluster assignments for all pixels.
4. Feature Detection
- Features are distinctive local parts of an image (edges, corners, lines).
- Local features help in higher-level tasks like object recognition, image matching, and tracking.
- Challenges for feature detection:
- Invariance to translation, rotation, scale.
- Robustness to lighting changes, noise, blur.
- Types of features discussed:
- Edges: Already covered (Sobel, Canny).
- Straight Lines: Detected using Hough transform.
- Corners: Detected using Harris corner detector.
5. RANSAC Algorithm
- A generic, iterative method to estimate model parameters from data with outliers.
- Randomly selects subsets of data points, fits a model, counts inliers fitting the model.
- Repeats until best model with maximum inliers is found.
- Useful for line fitting and many other model-fitting tasks.
- Limitations: fails with excessive noise/outliers.
6. Hough transform for Line Detection
- Converts edge points from image space to parameter space (R, θ).
- Each edge point corresponds to a sinusoidal curve in (R, θ) space.
- Intersection of many such curves indicates a line in the original image.
- Robust to noise and partial lines.
- Computationally expensive; various optimizations exist.
7. Harris corner detector
- Detects corners by analyzing changes in image intensity when shifting a small window.
- Uses Taylor expansion to approximate intensity changes.
- Constructs a 2x2 matrix (M) from image gradients.
- Corner response R is computed using eigenvalues (λ1, λ2) of M:
- Large λ1 and λ2 → corner.
- One large and one small eigenvalue → edge.
- Both small → flat region.
Category
Educational