Feature extraction, table of contents

- 8.4.1. Introduction
- 8.4.2. Quick example
- 8.4.3. Spectral pre-processing
- 8.4.3.1. Computation of spectral indices
- 8.4.4. perClass band extractors
- 8.4.5. Examples
- 8.4.5.1. Define bands by clustering
- 8.4.5.2. Defining band extraction pipeline
- 8.4.5.3. Display band information
- 8.4.5.4. LDA spectral feature extractor
- 8.4.5.5. Defining bands manually

# 8.4.1. Introduction ↩

This section describes feature extraction from spectra. Single spectrum is a 1D signal measuring e.g. reflectivity of an object at different wavelengths. From pattern recognition point of view, spectral data contain rich information useful for building material classifiers.

Spectra have typically tens or hundreds of wavelengths. The key point is that data at neighboring wavelengths exhibit strong correlations. perClass offers tools for extraction of lower-dimensional feature representation from spectral data.

# 8.4.2. Quick example ↩

We will use a data set with spectra of French fries:

**>> a**
8609 by 103 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041)

Each of the 8603 measurements is represented by 103 narrow spectral wavelengths. We have four classes, namely rotten, greening, peel (potato skin) and the healthy flesh.

We will now reduce the data dimensionality by extracting spectra-specific band features.

**>> b=**`sdextract`

(a,'bands','mean','size',10,'step',10)
8609 by 10 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041)

We used the `sdextract`

command, computing bands using the mean
extractor. Bands are defined by simple sliding window in spectral domain
with size 10 and step 10. The output `b`

is a new data set with 10
features, each being a mean of 10 neighboring wavelengths.

Spectral feature extraction provides a way to lower data dimensionality leveraging our prior information on wavelength ordering. We may now train a classifier such as a probabilistic model on 10D data instead of the original 103D space.

# 8.4.3. Spectral pre-processing ↩

perClass provides number of spectral pre-processing methods via the
`sdprep`

command. For example, it is possible to subtract a mean of each
spectrum or divide values at all spectral bands (features) by value at
specific band.

In this example, we create a pre-processing pipeline subtracting a mean from each input spectrum:

**>> tr**
37967 by 103 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814)
**>> pn=**`sdprep`

(tr,'submean')
Sample mean subtraction pipeline 103x103
**>> tr2=tr*pn**
37967 by 103 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814)

Possible pre-processing steps:

`submean`

- subtract sample mean`divsum`

- divide by the sum of each sample`divmean`

- divide by mean of each sample`divband`

,B - divide all sample values by value of specified band B. This is useful if band B exhibits low variability.`smooth`

- smooth 1D spectrum by a Gaussian filter in a sliding window -3*sigma,+3*sigma.`der`

- 1st Gaussian derivative in a sliding window -3*sigma,+3*sigma.`kernel`

,K - customer 1D kernel applied to each spectrum

For `smooth`

and `der`

procedures, `sigma`

option may be used to define custom window size.

## 8.4.3.1. Computation of spectral indices ↩

Spectral indices, such as NDVI, may be computed using `sdprep`

command, specifying index type and bands used:

- 'a-b',A,B
- 'a/b',A,B
- '(a-b)/(a+b)',A,B
- '(a+b)/(a-b)',A,B
- 'a/(b*c)',A,B,C
- 'a/(b-c)',A,B,C

Example:

**>> p=**`sdprep`

(tr,'(a-b)/(a+b)',10,47)
Divide by band pipeline 103x1
**>> out=tr*p**
37967 by 1 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814)
**>> **`sdfeatplot`

(out)

With `add`

option, the computed index is added after all input features.

**>> p1=**`sdprep`

(tr,'(a-b)/(a+b)',10,47,'add')
Divide by band pipeline 103x104

To join multiple indices in a single pipeline, use horizontal concatenation:

**>> p1=**`sdprep`

(tr,'(a-b)/(a+b)',10,47)
Divide by band pipeline 103x1
**>> p2=**`sdprep`

(tr,'(a-b)/(a+b)',30,100)
Divide by band pipeline 103x1
**>> P=[p1 p2]**
stack pipeline 103x2 2 classifiers in 103D space
**>> out=tr*P**
37967 by 2 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814)
**>> **`sdscatter`

(out)

# 8.4.4. perClass band extractors ↩

perClass performs spectral band extraction using the `sdextract`

command
and `sdbands`

command. While the `sdextract`

returns a new data set with
extracted features, `sdbands`

returns a pipeline object that can be applied
to new data or exported for out-of-Matlab execution.

perClass spectral band extraction separates the step of band definition from the feature extraction. In our quick example in previous section, we defined bands by fixing band size to 10 with step of 10. The feature extractor used was a mean of wavelength values within each band.

We may leverage two alternative ways of defining bands:

- With 'cluster' option,
`sdextract`

or`sdbands`

command perform clustering of spectral domain to user-defined number of clusters. - Using 'bands' option, we may define bands manually by wavelength indices.

At present, perClass supports two band feature extractor mechanisms, namely 'mean' and 'LDA'. The mean extractor is applicable to any data set even if all samples belong to a single class (it is un-supervised). On the other hand, the LDA feature extractor leverages supervised class labels and trains a Fisher projection for each of the bands. The output dimensionality of the projection is defined as number of classes minus one. Therefore, we receive a single output feature for each band in case of two-class problem and e.g. 5 features per band in a six-class problem.

# 8.4.5. Examples ↩

## 8.4.5.1. Define bands by clustering ↩

Define bands by clustering of spectral domain into 5 clusters:

**>> rand('state',1); **
**>> b=**`sdextract`

(a,'bands','mean','cluster',5)
clustering wavelengths:done
8609 by 5 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041)

The resulting data set contains five features. Note, we have first fixed the random number generator. This allows us to repeat the same clustering procedure with identical results later on.

## 8.4.5.2. Defining band extraction pipeline ↩

In order to create a band extraction pipeline, use `sdbands`

command:

**>> rand('state',1); p=**`sdbands`

(a,'mean','cluster',5)
clustering wavelengths:done
Band extraction pipeline 103x5 5 bands,mean extractor

The pipeline `p`

may now be applied to any data set containing spectra with
103 wavelengths:

**>> c=a*p**
8609 by 5 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041)

Similarly to any other pipeline object, `p`

may be exported for execution
out-of-Matlab with `sdexport`

.

## 8.4.5.3. Display band information ↩

Detailed information on band extractor pipelines may be displayed using
`sdbands`

without further arguments:

**>> **`sdbands`

(p)
103 input wavelengths, 5 bands, 5 output features
Mean feature extractor
band orig.wavelength output
ind name low high dim
------------------------------------
1 Band 1 1 15 1
2 Band 2 16 58 1
3 Band 3 59 74 1
4 Band 4 75 84 1
5 Band 5 85 103 1

## 8.4.5.4. LDA spectral feature extractor ↩

The LDA supervised feature extractor trains a Fisher projection for each band. It projects the input data to the sub-space maximizing the class separation.

**>> rand('state',1); p=**`sdbands`

(a,'LDA','cluster',5)
clustering wavelengths:done
Band extraction pipeline 103x15 5 bands,LDA extractor
**>> **`sdbands`

(p)
103 input wavelengths, 5 bands, 15 output features
LDA feature extractor
band orig.wavelength output
ind name low high dim
------------------------------------
1 Band 1 1 15 3
2 Band 2 16 58 3
3 Band 3 59 74 3
4 Band 4 75 84 3
5 Band 5 85 103 3
**>> d=a*p**
8609 by 15 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041)

In our example, each band (wavelength group) yields 3 output features
because the data set `a`

contains four classes.

## 8.4.5.5. Defining bands manually ↩

We may specify the bands manually using the 'bands' option. We need to provide the wavelength indices for each band in a cell array.

**>> rand('state',1); p=**`sdbands`

(a,'LDA','bands',{16:58 75:84})
Band extraction pipeline 103x6 2 bands,LDA extractor
**>> **`sdbands`

(p)
103 input wavelengths, 2 bands, 6 output features
LDA feature extractor
band orig.wavelength output
ind name low high dim
------------------------------------
1 Band 1 16 58 3
2 Band 2 75 84 3

We have a complete freedom in defining bands manually - they may overlap or include multiple copies of the same band.