Making color composite (“false color”) images

40. Making color composite (“false color”) images#

40.1. Definitions: “24 bit color”, “true color”, “false color”#

Up to now we have been showing images using “8 bit color”, where pixel values are scaled from 0 to 255 and mapped to a palette with 256 different color values. The finished images that we will create towards the end of this notebook will instead have three layers to drive the red, green and blue levels of a computer/phone display. Each layer is an 8 bit unsigned integer that can take on \(2^8\) = 256 different levels. The whole image therefore takes \(3 \times 8\) = 24 bits per pixel, or “24 bit color”. When the red, green and blue bands are mapped to red, green and blue levels, the image is called “true color”. When other bands are mapped to display red, green and blue levels the image is called “false color”

40.2. Introduction#

Color composite images are a useful tool in remote sensing. They map arbitrary image wavelengths to the primary colors (for example landsat 8 band 5 (near-ir, 0.86 microns) mapped to red, landsat 8 band 4 (red, 0.66 microns) mapped to blue, and landsat band 3 (green, 0.55 microns) mapped to blue). In such a “color ir” image, vegetation, which is very bright in the near-ir, will show up as bright red, while water, different types of soils, etc. will show up as other colors. Here are some examples of popular band combinations for the landsat 8 bands

Something to note: one point of confusion is that the red, green and blue bands for landsat 8 are label band 4, band 3 and band 2 respectively, so remember that an rgb landsat image is desiginated by the band combination 432.

In this notebook we will generate a “color ir” image from bands 5, 4, 3 using a clipped vancouver scene, reprocessing them into a false color composite. The final format needs to follow the “portable network graphics (png) truecolor” standard:

the pixels in each band have values between 0-255 (i.e. one 8 bit byte)
the bands are packed into 3 dimensional array with shape [3,nrows,ncols]
the band order needs to be red=index 0, blue=index 1 and green=index 2

We will also need to rewrite the images so that each band makes use of all the available 255 levels. This means we need to change the data values significantly, for the images below the reflectivities in bands 3 and 4 are below 0.2, which means if we just converted them to 0-255 we’d only be using 20% of the 255 levels. Fixing this problem by redistributing the data is called “stretching”. Below we’ll show how to do a “histogram stretch”. Since the underlying data is changed in the image, false color composites are a qualitative, rather than a quantitative tool.

40.2.1. Running this notebook#

Download the false_color.ipynb notebook from week7 in our gdrive folder
Download the folder downloads/vancouver/week7 containing the clipped tif files and place it in ~/repos/a301/satdata/landsat/vancouver

40.3. Read in the clipped image#

We use numpy.squeeze() to squeeze out the band dimension, so arrays with 3 dimensions (1,nrows,ncols) become 2 dimensional with shape (nrows,ncols)

import xarray
import rioxarray
from pathlib import Path
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sns
from skimage import exposure, img_as_ubyte
from IPython.display import Image
from a301_lib import make_pal

data_dir = Path().home() / 'repos/a301/satdata/landsat/vancouver'
blue = list(data_dir.glob('**/*Blue*tif'))[0]
red = list(data_dir.glob('**/*Red*tif'))[0]
green = list(data_dir.glob('**/*Green*tif'))[0]
nir =  list(data_dir.glob('**/*NIR*tif'))[0]
band_dict = dict(zip(('red','green','nir','blue'),(red,green,nir,blue)))

xarray_dict={}
for key,filename in band_dict.items():
    print(key)
    xarray_dict[key] = rioxarray.open_rasterio(filename,mask_and_scale=True);
    xarray_dict[key] = xarray_dict[key].squeeze()

red
green
nir
blue

xarray_dict['nir'].plot.hist();

../../_images/ad627357fc48e94a64c06b6719336ebb104dd501bdf9507f9936e1ddcf97114d.png

vmin = 0.0
vmax = 0.5
pal_dict = make_pal(vmin=vmin,vmax=vmax)
fig, ax = plt.subplots(1,1)
var = 'nir'
xarray_dict[var].plot.imshow(ax=ax,
                    norm = pal_dict['norm'],
                    cmap = pal_dict['cmap'],
                    extend = "both")
ax.set(title=f"{xarray_dict[var].long_name}");

../../_images/b0a66c429a229434875178ffa309ac8724093d2919b8b145f52cf66b2a0e7696.png

40.3.1. Give the bands shorter names#

Makes it easier to type

b2, b3, b4, b5 = xarray_dict['blue'],xarray_dict['green'], xarray_dict['red'], xarray_dict['nir']

40.4. Plotting joint histograms#

Below we plot some joint histograms of using seaborn.jointplot. This is a very powerful way to look for correlations between datasets, showing the both the 2-dimensional histogram (darker colors mean more pixels in the bin) and the band histograms.

The plots show that bands 3 and 4 (green and red) have reflectivities below about 20%, while band 5 (near-ir) as expected is much brighter, with a few reflectivites up to about 50%. The images are much more interesting if every band has reflectivities ranging between 0 and 1. When we plot using a color palette and 0-255 color values (8 bit color) we use Normalize with vmin and vmax to stretch the colors between a minimum and maximum and ignore low and high values. This is called a “linear stretch”. There is a more sophisticated way to assign color levels by manipulating the probability density distributions of the pixels, as described in the next section.

fig = sns.jointplot(
    x=b3.data.ravel(),
    y=b4.data.ravel(),
    xlim=(0, 0.125),
    ylim=(0.0, 0.125),
    kind="hex",
    color="#4CB391",
    gridsize=100
)
fig.fig.suptitle("green (band 3) vs red (band 4)");
axes = fig.fig.get_axes()
axes[0].set(xlabel = "band 3 reflectivity",
            ylabel = "band 4 reflectivity");

../../_images/e6ac14213bbee5969b54d1df7cca938990ebddc826ea5ff44a4afc84361536ac.png

Note that the plot below looks funny because there are many water pixels with low reflectivity. We’ll mask these out in the next notebook

fig = sns.jointplot(
    x=b4.data.ravel(),
    y=b5.data.ravel(),
    kind="hex",
    xlim=(0, 0.125),
    ylim=(0.0, 0.5),
    color="#4CB391",
    gridsize=100
)
fig.fig.suptitle("red (band 4) vs near ir (band 5)");
axes = fig.fig.get_axes()
axes[0].set(xlabel = "band 4 reflectivity",
            ylabel = "band 5 reflectivity");

../../_images/e4ad69a235aadaebec52fcf8eb2dd829344da9ff320c4987fe64aafb2480c908.png

40.5. Histogram equalization#

Histogram equalization is a technique for changing the image data while preserving part of its character as represented in its histogram. It uses the cumulative distribution which is defined as the cumulative sum of the histogram of the image. That is, if the nuber of pixels with levels between \(l^\prime\) and \(l^\prime + dl^\prime\) is \(n(l^\prime)\,dl^\prime\), then the cumulative distribution \(N(l)\) is given by:

\[ N(l) = \int_0^l n(l^\prime) dl^\prime \]

In words, \(N(l)\) is the number of pixels with level less than \(l\)

The idea is to change (“stretch”) the histogram bins so that \(N(l) \approx\) constant for values of \(l\) that contain most of the pixels.

Here’s the cumulative distribution for the pdf for a gaussian distribution (bell curve):

x = np.random.randn(100000) # generate samples from normal distribution (discrete data)
fig, ax = plt.subplots(1,1)
#
# cumulative distribution N(l)
#
ax.hist(x,bins=200,density=True, cumulative=True,label="cumulative")
#
# probability density n(l)
#
ax.hist(x,bins=200,density=True,label = "pdf")
ax.set(title="gaussian pdf n(l) and cumulative distribution N(l)",
       xlabel="pixel level",
       ylabel = "n(l) (#pixels/(bin width)) and N(l) (#pixels with level < l)")
ax.legend(loc="best");

../../_images/8b35b9eafd57b0a645580b17b100112e64ba45a43c8f8a833660fcdaa7efb64d.png

For this fake image, almost all the values are between -2 and 2, but the whole data range goes between -4 - 4. If we use the blue curve to map xaxis values between -2 and 2 to y axis values between 0.1-0.99, we are “stretching” the data to fill a range that is proportial to how many of these values actually occur in the image. Values between -4 and -4 are still represented, but they only get levels between about 1% of the levels after the stetch, instead of 25%.

40.5.1. Here’s the cumulative distribution for band 5#

If we stretch the band 5 values to this cumulative distribution, 50% of the data range (0-125) is going to get essentially 100% of the levels between 0-1.

from skimage import exposure
#
# want 256 levels from 0-255
#
bins=256
img_cdf, bins = exposure.cumulative_distribution(b5.data.ravel(), bins)
plt.plot(img_cdf)
plt.grid(True);

../../_images/ab1649d24fbc24ef30682d65544cd8db0d51889f60c7a4abe629cdfa356de58f.png

40.5.2. Stretching step 1: stretch the data in each band and save to a dictionary with the band name as key#

To do the histogram stretch we’ll use the scikit-image equalization module. It uses our boolean mask to ignore np.nan values when it calculates the distribution.

stretched_dict = dict()
keys = ['b2','b3','b4','b5']
images = [b2, b3, b4, b5]
for key, image in zip(keys,images):
    stretched_dict[key] = exposure.equalize_hist(image.data)

40.5.3. Before and after stretching#

Pretty dramatic difference

fig, (ax1, ax2) = plt.subplots(1,2)
ax1.imshow(b3);
ax2.imshow(stretched_dict['b3']);

../../_images/7b4cf380cce058ba662e0fa67ccf5e62d552668e98d41eadb2ae61aca3c29a3c.png

40.5.4. Stretching step 2: check the stretched histograms#

Redo the histogram plots to show the stretched distributions. Note that they still contain information about the correlation between bands, but the distributions are much broader than their unstretched versions. Also note that in each of the joint plot there are clusters of pixels that share similar values in the two bands. These clusters correspond to similar surface types (water, grass fields, golf courses, concrete pavement etc.)

fig = sns.jointplot(
    x=stretched_dict['b3'].ravel(),
    y=stretched_dict['b4'].ravel(),
    xlim=(0, 1),
    ylim=(0.0, 1),
    kind="hex",
    color="#4CB391"
)
fig.fig.suptitle("stretched green (band 3) vs red (band 4)");
axes = fig.fig.get_axes()
axes[0].set(xlabel = "stretched band 3 reflectivity",
            ylabel = "stretched band 4 reflectivity");

../../_images/13d650a7c82c3bfdd8e8e22015f7bc0ae69193efdc4e415b56d36c2749899f63.png

fig = sns.jointplot(
    x=stretched_dict['b4'].ravel(),
    y=stretched_dict['b5'].ravel(),
    xlim=(0, 1),
    ylim=(0.0, 1),
    kind="hex",
    color="#4CB391"
)
fig.fig.suptitle("stretched red (band 4) vs near ir (band 5)");
axes = fig.fig.get_axes()
axes[0].set(xlabel = "stretched band 4 reflectivity",
            ylabel = "stretched band 5 reflectivity");

../../_images/adcad920a2dc9aa43630fe128f30b79f194b6e72d1c4c0c4bf16ff717e0dd025.png

40.7. Repeat with the red, green, blue#

Try a true color image, note that simple stretching doesn’t really work to make photo-realistic picture

#
# just repeat replacing nir b5 with blue b2
#
nrows, ncols = b3.shape
band_values = np.empty([3, nrows, ncols], dtype=np.uint8)
for index, key in enumerate(['b4', 'b3', 'b2']):
    stretched = stretched_dict[key]
    band_values[index, :, :] = img_as_ubyte(stretched)

true_color=xarray.DataArray(band_values,coords=coords,
                            dims=dims,
                            attrs=attr_dict)
true_color.rio.write_crs(b3.rio.crs, inplace=True)
true_color.rio.write_transform(b3.rio.transform(), inplace=True);

40.7.1. bands 4,3,2 – looking strange#

true_color.plot.imshow(figsize=(6,9));

../../_images/0b3ab29acd7417979d4e6485659679066c71d146b4ca4c122f33a70009c1100a.png

40.7.2. compare with the Nasa browse image#

Question: what’s the difference between DataArray.plot.imshow and IPython.display.Image

browse_jpg = list(data_dir.glob("**/*vancouver*jpg"))[0]
Image(browse_jpg,width=300)

../../_images/8b801052e1c22c73c4fe1e830be8c9fb2bc60f966bac54c9b0639a38cb8b5497.jpg

40.8. Summary#

False color images have advantages and disadvantages. They make it easy to see subtle differences between image regions, but the cost is that the stretched histograms now don’t connect directly to the original radiances.

40.9. Assignment 5 part 1: create a false color image for your scene#

Due next Friday

Write a function that takes 3 tif files and returns a rioxarray false color image. Use it to make a band 5,4,3 false color png file of your scene

Making color composite (“false color”) images

Contents

40. Making color composite (“false color”) images#

40.1. Definitions: “24 bit color”, “true color”, “false color”#

40.2. Introduction#

40.2.1. Running this notebook#

40.3. Read in the clipped image#

40.3.1. Give the bands shorter names#

40.4. Plotting joint histograms#

40.5. Histogram equalization#

40.5.1. Here’s the cumulative distribution for band 5#

40.5.2. Stretching step 1: stretch the data in each band and save to a dictionary with the band name as key#

40.5.3. Before and after stretching#

40.5.4. Stretching step 2: check the stretched histograms#

40.6. Write the false color image into 3-dimensional array called “band_values”#

40.6.1. Note that we’ve lost our missing values#

40.6.2. Stretched band 5 (nir)#

40.6.3. Create the dataArray#

40.6.4. Add attributes#

40.6.5. Create the falsecolor dataArray#

40.6.5.1. False color bands 5,4,3#

40.6.6. Save as a png file#

40.6.7. Read it back in to check#

40.7. Repeat with the red, green, blue#

40.7.1. bands 4,3,2 – looking strange#

40.7.2. compare with the Nasa browse image#

40.8. Summary#

40.9. Assignment 5 part 1: create a false color image for your scene#