AI for QC: op Landsat and roadmap towards Sentinel-2

VH-RODA Workshop 2021 | 20-23 April 2021 | Slide 1

VH

-RO

DA

20

21

on

line

wo

rksh

op

AI for QC: Landsat and roadmap

towards Sentinel-2

Kevin Halsall


VH

-RO

DA

20

21

on

line

wo

rksh

op

Contents

▪ Introduction

▪ Background

▪ Data and Tooling

▪ Machine Learning model development for Landsat

▪ Supervised

▪ Semi-supervised

▪ Future Development Roadmap – towards Sentinel-2


VH

-RO

DA

20

21

on

line

wo

rksh

op

Introduction

Kevin Halsall

Project Manager & Ease QC Product Owner, Telespazio UK

Telespazio UK

15+ years experience performing QC assessments on EO data

Prime contractor for the IDEAS-QA4EO service for ESRIN assessing ESA’s EO

data

Ease QC

Internal programme within Telespazio UK applying Machine Learning techniques

to EO data QC


VH

-RO

DA

20

21

on

line

wo

rksh

op

Background

▪ Traditional EO Data Quality Control activities consist of:

▪ Automated checks (applied to whole dataset)

▪ Detailed human observations (subset of data)

▪ EO data volumes increasing year on year

▪ More satellites

▪ More complex data

▪ Funding and resources cannot increase at the same pace

▪ Could ML techniques support QC assessments to keep up with increases

and/or improve upon existing assessment activities?


VH

-RO

DA

20

21

on

line

wo

rksh

op

Data & Tooling

▪ Development of an ML model can be idealised to a 5 step process

▪ Data Preparation is often the most effort intensive step

▪ If you want a good model you need a lot of good data

1 2 3 4 5

Get Data

Clean, Prepare & Manipulate Data

Train Model

Improve

Test Data


VH

-RO

DA

20

21

on

line

wo

rksh

op

Data & Tooling

▪ Ease QC activities initially applied to Landsat data

▪ Driven by a need for lots of clearly labelled data

▪ QA4EO service undertaking assessment of Landsat 1-5 reprocessed data

▪ 600,000+ data products

▪ Allowed Ease QC team to source suitable, trustworthy, labelled data for

training in-house

▪ Sourcing appropriate data can often be a major issue

▪ Much of the effort of this phase went to the development of a tool

▪ Support the labelling of data and definition of training datasets

▪ Integrate the activities of the QC engineers and ML developers


VH

-RO

DA

20

21

on

line

wo

rksh

op

Data & Tooling

▪ “QCOLT” software application developed to support the project; enables:

▪ QC engineers to assess and label data

▪ Included features to make QC assessments more efficient

▪ ML developers to

▪ Define and select suitable training datasets based on assessments

▪ View results of models once applied to the data

▪ QC engineers can assess flagged data

▪ ML models can be re-developed based on new assessments

▪ Tightly integrated process

▪ Does not itself do any ‘machine learning’

▪ Supports all elements of the cycle

“Quality Control Optical Learning Tool”


VH

-RO

DA

20

21

on

line

wo

rksh

op

Data & Tooling

▪ Customised GUI permitting:

▪ visual inspection of the data

▪ inspection of metadata & automated QC check information

▪ anomaly assignment

“Quality Control Optical Learning Tool”


VH

-RO

DA

20

21

on

line

wo

rksh

op

Machine Learning Model Development

▪ First proof of concept ML activity focused on ‘Supervised’ model development

▪ Model trained to detect a single anomaly type

▪ Anomaly criteria:

▪ #1: Visible in the product image

▪ #2: Deterministic detection unfeasible

▪ Convolutional Neural Network type used

▪ Training data consists of examples of the anomaly and those of ‘good’ data

▪ Only limited number of anomalous examples

▪ ‘Chipping’ technique employed to increase data sample size

▪ 25 positive produces → 3,686 synthesised data products

Supervised models

“Scan start” anomaly


VH

-RO

DA

20

21

on

line

wo

rksh

op


▪ Completed model infers a ‘soft classifier’ for each product assessed

▪ % probably of a product having the anomaly (not True/False)

▪ Scan Start anomaly model was run over 39,000 Landsat-3 products

▪ The 5% and 95% probability levels were

selected to define 3 confidence levels

▪ 0-5% - anomaly free

▪ 95-100% - anomaly detected

▪ (5%-95%) classified as ‘undecided’

▪ Active Learning iterations

▪ Data reassessed

▪ Model retrained & improved

Supervised models results

Active Learning


VH

-RO

DA

20

21

on

line

wo

rksh

op


▪ Model is trained to recognise ‘normal’ data & detect anomalous data

▪ Potentially able to detect multiple types of anomaly

▪ Anomalies detected by the model are not classified

▪ Require further identification by the QC engineer

▪ Much more complex than a supervised model

▪ Sub-divided into two separate workflows

Semi-supervised “model”


VH

-RO

DA

20

21

on

line

wo

rksh

op








▪ Workflow A: Complexity reduction

▪ Subdivide dataset into clusters

▪ Uses pre-existing ML models

▪ Resnet50

▪ K-means



VH

-RO

DA

20

21

on

line

wo

rksh

op








▪ Workflow B: Binary Classifier

▪ Detects anomalies in each cluster separately

▪ Convolutional Neural Network Auto-Encoder

▪ Support Vector Machine



VH

-RO

DA

20

21

on

line

wo

rksh

op


▪ Promising preliminary results

▪ Highly dependent upon the content of the training data sets

▪ Good performance achieved on anomalies highly represented in the

training data

▪ Model demonstrated success at detecting ‘Scan start’ anomalies

▪ Supervised model enabled identification of 20k+ examples

▪ Less success with less represented anomaly

▪ Improving the performance of the model will require:

▪ Obtaining more labelled data

▪ Redevelopment of the training network and model architecture

Semi-supervised model results


VH

-RO

DA

20

21

on

line

wo

rksh

op

Development Roadmap

▪ Focus on Landsat driven by availability of accurately labelled training data

▪ Similarities between Landsat and Sentinel-2 to be explored

▪ More data for training

▪ One model applicable to both (+ other similar instruments)

▪ Updates required:

▪ Existing models/processes need to be modified for new data

▪ Need to access a sufficient amount of training data (cloud e.g. DIAS)

▪ Training data needs to be appropriately labelled

▪ Incorporate external QC analyses into ML development process

▪ Other improvements

▪ Incorporate full band data

▪ Assessments performed on RGB data to reduce complexity


VH

-RO

DA

20

21

on

line

wo

rksh

op

Summary

▪ The quality and quantity of the training data is extremely important

▪ Required in sufficient volumes

▪ Requires accurate labelling

▪ Effort expended in preparing the data cannot be understated

▪ Techniques/tools to support this part of the process very important

▪ Supervised model success demonstrates feasibility of using ML for EO data QC

▪ Dedicated effort required for detection of single anomaly type

▪ Results can be very effective – particularly for bulk assessments

▪ Semi-supervised ‘model’ has potential to be more generally applicable

▪ Detects multiple anomalies, potentially across similar instrument types

▪ Highly complex model requiring more development and data


VH

-RO

DA

20

21

on

line

wo

rksh

op

Thank you for your attention

Kevin Halsall

Telespazio UK

[email protected]

mailto:[email protected]

Documents

AI for QC: op Landsat and roadmap towards Sentinel-2