This notebook introduces new users to working with EASI notebooks and the Open Data Cube.
It will demonstrate the following basic functionality:
A notebook consists of cells that contain either text descriptions or python code for performing operations on data.
Shift
+Enter
.# Formatting for basic plots
%matplotlib inline
%config InlineBackend.rc = {}
import matplotlib.pyplot as plt
# plt.rcParams['figure.figsize'] = [12, 8]
# Formatting pandas table output
import pandas
pandas.set_option("display.max_rows", None)
# Datacube
import datacube
from datacube.utils import masking # https://github.com/opendatacube/datacube-core/blob/develop/datacube/utils/masking.py
from odc.algo import enum_to_bool # https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py
from datacube.utils.rio import configure_s3_access
# Notebook helper tools (in dea_tools or in this repo)
import sys
try:
from dea_tools.plotting import display_map, rgb
except ImportError:
# Local copy of selected dea_tools
if 'tools/' not in sys.path:
sys.path.append('tools/')
from datacube_utils import display_map
rgb = None # Not copied or adapted yet
The Datacube()
API provides search, load and information functions for data products indexed in the EASI ODC database. For further information see https://datacube-core.readthedocs.io/en/latest/.
dc = datacube.Datacube()
# Optional: Access AWS "requester-pays" buckets
# This is necessary for Landsat ("landsatN_c2l2_*") and Sentinel-2 ("s2_l2a") products
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True);
Get all available products in the ODC database and list them along with selected properties.
products = dc.list_products() # Pandas DataFrame
products
name | description | license | default_crs | default_resolution | |
---|---|---|---|---|---|
name | |||||
copernicus_dem_30 | copernicus_dem_30 | Copernicus 30m Digital Elevation Model (GLO-30) | None | None | None |
landsat5_c2l2_sr | landsat5_c2l2_sr | Landsat 5 Collection 2 Level-2 Surface Reflect... | None | None | None |
landsat5_c2l2_st | landsat5_c2l2_st | Landsat 5 Collection 2 Level-2 UTM Surface Tem... | CC-BY-4.0 | None | None |
landsat7_c2l2_sr | landsat7_c2l2_sr | Landsat 7 USGS Collection 2 Surface Reflectanc... | None | None | None |
landsat7_c2l2_st | landsat7_c2l2_st | Landsat 7 Collection 2 Level-2 UTM Surface Tem... | CC-BY-4.0 | None | None |
landsat8_c2l1 | landsat8_c2l1 | Landsat 8 Collection 2 Level-1 (top of atmosph... | CC-BY-4.0 | None | None |
landsat8_c2l2_sr | landsat8_c2l2_sr | Landsat 8 Collection 2 Surface Reflectance, pr... | None | None | None |
landsat8_c2l2_st | landsat8_c2l2_st | Landsat 8 Collection 2 Level-2 UTM Surface Tem... | CC-BY-4.0 | None | None |
landsat9_c2l1 | landsat9_c2l1 | Landsat 9 Collection 2 Level-1 (top of atmosph... | CC-BY-4.0 | None | None |
landsat9_c2l2_sr | landsat9_c2l2_sr | Landsat 9 Collection 2 Surface Reflectance, pr... | CC-BY-4.0 | None | None |
landsat9_c2l2_st | landsat9_c2l2_st | Landsat 9 Collection 2 Level-2 UTM Surface Tem... | CC-BY-4.0 | None | None |
lpdaac_nasadem | lpdaac_nasadem | NASADEM Merged DEM Global 1 arc second V001 | None | None | None |
nasa_aqua_l2_oc | nasa_aqua_l2_oc | NASA MODIS-Aqua L2 Ocean Color, regridded to W... | None | EPSG:4326 | (0.01, 0.01) |
nasa_aqua_l2_sst | nasa_aqua_l2_sst | NASA MODIS-Aqua L2 Sea Surface Temperature, re... | None | EPSG:4326 | (0.01, 0.01) |
s2_l2a | s2_l2a | Sentinel-2a and Sentinel-2b imagery, processed... | None | None | None |
The data arrays available for each product are called "measurements".
measurements = dc.list_measurements() # Pandas DataFrame, all products
measurements.loc['landsat8_c2l2_sr']
name | dtype | units | nodata | aliases | flags_definition | |
---|---|---|---|---|---|---|
measurement | ||||||
coastal | coastal | uint16 | reflectance | 0 | [SR_B1, band_1, B1, coastal_aerosol] | NaN |
blue | blue | uint16 | reflectance | 0 | [SR_B2, band_2, B2] | NaN |
green | green | uint16 | reflectance | 0 | [SR_B3, band_3, B3] | NaN |
red | red | uint16 | reflectance | 0 | [SR_B4, band_4, B4] | NaN |
nir08 | nir08 | uint16 | reflectance | 0 | [SR_B5, band_5, B5, nir] | NaN |
swir16 | swir16 | uint16 | reflectance | 0 | [SR_B6, band_6, B6, swir1] | NaN |
swir22 | swir22 | uint16 | reflectance | 0 | [SR_B7, band_7, B7, swir2] | NaN |
qa_pixel | qa_pixel | uint16 | bit_index | 1 | [pixel_quality, level2_qa, QA_PIXEL, pixel_qa] | {'snow': {'bits': 5, 'values': {'0': 'not_high... |
qa_aerosol | qa_aerosol | uint8 | bit_index | 1 | [sr_aerosol, SR_QA_AEROSOL, aerosol_qa] | {'water': {'bits': 2, 'values': {'0': 'not_wat... |
qa_radsat | qa_radsat | uint16 | bit_index | 1 | [saturation_qa, QA_RADSAT, radsat_qa] | {'qa_radsat': {'bits': [0, 1, 2, 3, 4, 5, 6, 7... |
See the available latitude
/longitude
and time
ranges in the ODC Explorer.
latitude
/longitude
or time
ranges of the query below. # hub.asia.easi-eo.solutions - Lake Tempe, Indonesia
latitude = (-4.2, -3.9)
longitude = (119.8, 120.1)
display_map(longitude, latitude)
Load Landsat-8 surface reflectance data ("landsat8_c2l2_sr") for a given latitude, longitude and time range.
This may take a few minutes while the data are loaded into JupyterLab (so choose a small area and time range!).
The datacube.load()
function returns an xarray.Dataset
object.
The output_crs
and resolution
parameters are dependent on the product
chosen.
Further information on xarray
: http://xarray.pydata.org/en/stable/user-guide/data-structures.html
# A standard datacube.load() call
# Any SQLAlchemy warnings can be ignored.
data = dc.load(
product = 'landsat8_c2l2_sr',
latitude = latitude,
longitude = longitude,
time=('2020-02-01', '2020-04-01'),
output_crs="EPSG:32655", # Target CRS
resolution=(30, -30), # Target resolution
group_by='solar_day', # Group by time method
)
display(data)
<xarray.Dataset> Dimensions: (time: 4, y: 1286, x: 1291) Coordinates: * time (time) datetime64[ns] 2020-02-10T02:10:17.788921 ... 2020-03... * y (y) float64 -5.222e+05 -5.221e+05 ... -4.836e+05 -4.836e+05 * x (x) float64 -2.601e+06 -2.601e+06 ... -2.64e+06 -2.64e+06 spatial_ref int32 32655 Data variables: coastal (time, y, x) uint16 8203 7860 8172 8332 ... 11387 11498 11471 blue (time, y, x) uint16 8579 8101 8434 8662 ... 11459 11543 11542 green (time, y, x) uint16 9795 9583 9859 10183 ... 12887 12939 12823 red (time, y, x) uint16 9550 9115 9580 9974 ... 12978 13125 13015 nir08 (time, y, x) uint16 21181 19938 19005 ... 20020 20151 20301 swir16 (time, y, x) uint16 15521 14502 15444 ... 11572 11694 11635 swir22 (time, y, x) uint16 11529 10801 11726 ... 10941 11000 10982 qa_pixel (time, y, x) uint16 21824 21824 21824 ... 22280 22280 22280 qa_aerosol (time, y, x) uint8 160 160 160 160 160 ... 224 224 224 224 224 qa_radsat (time, y, x) uint16 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 Attributes: crs: EPSG:32655 grid_mapping: spatial_ref
Plot the Short-wave Infrared (SWIR) ("swir22"
) band data at each timestep.
The datacube.load()
function does not apply missing values or scaling attributes. These are left to the user's discretion and requirements.
# Xarray operations
band = 'swir22'
data[band].plot(col="time", robust=True, col_wrap=4);
Many Earth observation products include a "quality" array that can be used to filter the measurement arrays. For example, most quality layers include a "cloud" confidence quality flag that can be use to remove pixels affected by clouds from further analysis. In addition:
nodata
value that defines the "null" or "fill" value used in the array, and;More information on these techniques is covered in other notebooks. In particular, see the datasets/*ipynb
notebooks for specific examples.
Landsat data requires a scale factor to be applied to convert the data to physical reflectance or temperature values. Once converted, Landsat surface reflectance values will have numbers ranging from 0 to 1 and surface temperature values will be in the units of degrees Kelvin. The scaling factor is different for different Landsat "Collections" and it is different for the Surface Reflectance and Surface Temperature products. See https://www.usgs.gov/faqs/how-do-i-use-a-scale-factor-landsat-level-2-science-products.
In the cell below we:
nodata
valueNote the exclusion of cloud (if present) and the change in the colourbar range from the previous figure.
# Choose bands for further processing
bands = ['red', 'green', 'blue', 'nir08', 'swir22']
# Make a mask array for the nodata value
valid_mask = masking.valid_data_mask(data[bands])
# Define the scaling values (landsat8_c2l2_sr)
scale_factor = 0.0000275
add_offset = -0.2
# Make a scaled data array
scaled_data = data[bands] * scale_factor + add_offset
# Make a cloud mask (landsat8_c2l2_sr)
from datacube.utils import masking
# Optional - Show the flag_definition information
# See also http://explorer.asia.easi-eo.solutions/products/landsat8_c2l2_sr#definition-doc
# display( masking.describe_variable_flags(data.qa_pixel) )
# Multiple flags are combined as logical AND (bitwise)
cloud_mask = masking.make_mask(data['qa_pixel'],
clear='clear',
)
# Apply each of the masks
filtered_data = scaled_data.where(valid_mask & cloud_mask)
filtered_data['swir22'].plot(col="time", robust=True, col_wrap=4);
As a simple example, we calculate the Normalized Difference Vegetation Index (NDVI).
# Calculate the NDVI
band_diff = filtered_data.nir08 - filtered_data.red
band_sum = filtered_data.nir08 + filtered_data.red
ndvi = band_diff / band_sum
# Plot the masked NDVI
ndvi.plot(col="time", robust=True, col_wrap=4, vmin=0, vmax=0.7, cmap='RdYlGn');
After processing the data we can then save the output to a file that can then be imported into other applications for further analysis or publication if required.
The file will be saved to your home directory and appear on the File Browser panel to the left. You may need to select the folder
icon to go to the top level (/home/jovyan/
).
Download a file by 'right-click' Download
.
# Xarray can save the data to a netCDF file
ndvi.time.attrs.pop('units', None) # Xarray re-applies this
ndvi.to_netcdf("/home/jovyan/landsat8_sr_ndvi.nc")
# Or export to geotiff using rioxarray.
import rioxarray
ndvi.isel(time=0).rio.to_raster("/home/jovyan/landsat8_sr_ndvi.tif") # Single time slices
This notebook introduced the main steps for querying data (with Datacube
), and filtering, plotting, calculating and saving a "cube" of data (with Xarray
).
There is plenty of detail and options to explore so please work through the other notebooks to learn more and refer back to these notebooks when required. We encourage you to create or bring your own notebooks, and adapt notebooks from other open-license repositories.
if rgb:
rgb(filtered_data.isel(time=2), ['red', 'green', 'blue'])
Recommended level: Basic Python knowledge and familiarity with array manipulations, numpy and xarray. Familiarity with some plotting libraries (e.g., matplotlib) would also help.
There are many options for learning Python from online resources to in-house or facilitated training. Some examples are offered here with no suggestion that EASI endorses any of them.
Background for selected libraries:
Recommended level: Familiarity with notebooks.
The JupyterLab website has excellent documentation including video instructions. We recommend users take a few minutes to orientate themselves with the use and features of JupyterLab.
Recommended level: Basic understanding of concepts such as clone, add, commit and push would help.
Git is a document version control system. It retains a full history of changes to all files (including deleted ones) by tracking incremental changes and recording a history timeline of changes. Changes you make append to the history timeline. Git allows you to copy ("clone") a repository, make changes to files, and "commit" and "push" these changes back to the source repository.
The best way to learn Git is by practice and incrementally; start with simple, common actions and gain more knowledge when required. Some useful Git links are: