Feature Engineering for Image Tasks

2 min readDec 18, 2023

Object recognition simply means we are going to work with labeled images, where each image contains a single object, and the purpose of the model is to classify the image as a category that specifies what object is in the image.

1. Feature construction

Pixels as features

We will take the mean pixel value (MPV) of the channels and then realign the values to be one-dimensional, rather than two-dimensional.

2. Feature extraction

Histogram of oriented gradients

HOG is a feature extraction technique that is used most commonly for object recognition tasks. HOG focuses on the shape of the object in the image by attempting to quantify the gradient (or magnitude) as well as the orientation (or direction) of the edges of the object.

Collect the HOG descriptors, and concatenate them into a final one-dimensional feature vector to represent the entire image.

We can optimize dimension reduction with PCA.

3. Feature learning

Deep Learning model: VGG-11

The VGG family of architectures has two main sections in their networks: a feature learning section and a classifier. The feature learning section of the network uses the convolutional and pooling layers to learn useful representations of images. The classifier section maps these representations to a set of labels to perform the classification.

The left image shows a traditional neural network. The right image is of a convolutional network. The ConvNet will arrange neurons into three dimensions. Each subsequent layer of a ConvNet will map the 3D input to another 3D output of activations.

Using a pretrained VGG-11 as a feature extractor

Wait, what did he say? Feature extractor? I thought this was the feature learning section. It is, but we will be taking the feature learning section of the VGG-11 model that was trained on the ImageNet database and using it to map our images to the pretrained 512-length feature vector.

Fine-tuning VGG-11

Let’s set up the arguments for our training loop to fine-tune the VGG-11 model. We will define our

Loss function is cross-entropy loss, which is common for multiclass classification.
Optimizer as stochastic gradient descent, which is a popular optimizer for deep learning problems.
The number of epochs — 15 to save some time — and we shouldn’t need too many epochs to fine-tune a pre-trained transfer learning model.

Summary

Image vectorization and text vectorization are both ways of converting raw unstructured data into structured, fixed-length feature vectors, which are required for ML.
Feature construction and extraction techniques, like MPV and HOG, provide a great and fast baseline method for image vectorization but will generally fall short compared to longer, more complex deep learning techniques.

References

https://learning.oreilly.com/library/view/feature-engineering-bookcamp/9781617299797/