Human Performance Metrics for ATM Validation Brian Hilburn NLR Amsterdam, The Netherlands

Human Performance Metrics for ATM Validation

Brian HilburnNLRAmsterdam, The Netherlands

Overview

• Why consider Human Performance?

• How / When is HumPerf considered in validation?

• Difficulties in studying HumPerf

• Lessons Learnt

• Toward a comprehensive perspective…

(example data)

Traffic Growth in Europe

0

2

4

6

8

10

12

19751980198519901995200020052010

Actual Traffic

Traffic Forecast (H)

Traffic Forecast (M)

Traffic Forecast (L)Mo

vem

en

ts (

mill

ion

s)

Accident Factors

Unexpected human (ab)use of equipment etc.

New types of errors and failures

Costs of “real world” data are high

New technologies often include new & hidden risks

Operator error vs Designer error

Transition(s) and change(s) are demanding

Implementation (and failure) is very expensive!

Why consider HUMAN metrics?

Titanic

Three Mile Island

Space shuttle

Bhopal

Cali B-757

Paris A-320

FAA/IBM ATC

Famous Human Factors disasters

When human performance isn’t considered...

…...!!!!!!

What is being done to cope?Near and medium term solutions

RVSM

BRNAV

FRAP

Civil Military airspace integration

Link 2000

Enhanced surveillance

ATC tools

ATM: The Building Blocks

Displays (eg CDTI)

Tools (eg CORA)

Procedures (eg FF-MAS Transition)

Operational concepts (eg Free Flight)

Controlled Flight Free Flight

Monitoring in Free Flight: Ops Con drives the ATCo’s task!

NLR Free flight validation studies

Human factors design & measurements

Ops Con + displays + procedures + algorithms

Retrofit automation & displays– TOPAZ: no safety impairment….– no pilot workload increase with..– 3 times present en-route traffic – delay, fuel & emission savings

ATC controller impact(s)– collaborative workload reduction

Info at NLR website

The aviation system test bed

Data links

Two way Radio

Experiment Scenario Manager

scenario

'events'

scenario

'events'Syste

mdata

Human

data

Human

data

System

data

0

5

123

4

987

6

Message OUT

POLUPTON

DOGGA/KIPPA

ANGEL

FAMBO

RISKIS

ON

17:29:02

RISKIS

ON

VAW - BAW118

Contial Risk Display

Plan View Display

Message IN

17:22: TO ELEVEN17:20: TO ELEVEN17:20: TO ELEVEN

17:24: TO NINE17:24: TO NINE

CHANGE FROM 310 TO 220CHANGE FROM 350 TO 330

Evaluating ATCo Interaction with New Tools

Human Factors trials

ATCos + Pilots

Real time sim

Subjective data

Objective data also

Objective Measures

Heart Rate

Respiration

Scan pattern

Pupil diameter

Blink rate

Scan randomness

Integrated with subjective instruments...

HEART Analysis Toolkit

Correlates of Pupil Diameter

EmotionAgeRelaxation / AlertnessHabituationBinocular summationIncentive (for easy problems)Testosterone levelPolitical attitudeSexual interestInformation processing load

Light reflexDark reflexLid closure reflexVolitional controlAccommodationStressImpulsivenessTasteAlcohol level

Pupil Diameter by Traffic Load

RIVER

IBE 326IBE 326

AMC 282

AMC 282

05

10

15

Time line

Hand-off

Datalink

Traffic

Pre-acceptance

Arrival management tool

Communication tool

Automation: assistance or burden?Conflict detection & resolution tools

Low Traffic

Visual scan trace, 120 sec.

Visual scan trace, 120 sec

High Traffic

Positive effect of automation on ‘heart rate variability’

00,20,40,60,8

11,21,41,6

Manual Detection Resolution

| Z

| s

core

Low traffic

High traffic

Positive effect of automation on ‘pupil size’

0

0,2

0,4

0,6

0,8

1


Siz

e in

dex

Low traffic

High traffic

Better detection of ‘unconfirmed’ ATC data up-links

0

10

20

30

40

50


Sec

on

ds

Low traffic

High traffic

No (!) positive effect on subjective workload

0

5

10

15

20


Low traffic

High traffic

Objective vs Subjective Measures

“Catch 22” of introducing automation:

I’ll use it if I trust it. But I cannot trust it until I use it!

40

20

30

10

Estim

atio

n Er

ror (

%)

Low Traffic

High Traffic

Manual Auto

Automation & Traffic Awareness

Converging data: The VINTHEC approach

Team Situation Awareness

EXPERIMENTAL

correlate behavioural markers w physio

ANALYTICAL

Game Theory Predictive Model of Teamwork

VS

16

2

109 8

11

15

13 14

12

7 5 6

17

18

34

1

4.8

5.4

Note: Average number of fixations below 0.5 have

not been displayed

0.8

0.9

0.9 1.2

0.6

Free Routing: Implications and challenges

Implications:

Airspace definition

Automation tools

Training

ATCo working methods

Ops proceduresChallenges:

OperationalTechnicalPoliticalHuman

Factors

FRAP

Sim 1: Monitoring for FR Conflicts

ATS Routes

Direct Routing Airways plus direct routes

Free Routes

Structure across sectorsAirport

Airport

Airp

ort

12

14

16

18

20

22

24

26

ATS Direct FR

Air Route Condition

No Conf

Conflict

Re

sp

on

se

tim

e (s

ecs

)

Sim 1: Conf Detection Response Time

Studying humans in ATM validation

Decision making biases-- ATC = skilled, routine, stereotyped

Reluctance-- Organisational / personal (job threat)

Operational rigidity -- unrealistic scenarios

Transfer problems-- Skills hinder interacting w system

Idiosyncratic performance-- System is strategy tolerant

Inability to verbalise skilled performance-- Automaticity

Moving from CONSTRUCT to CRITERION:

Evidence from CTAS Automation Trials

Manual Detection

10

30

20

40

Low traffic

High traffic

Est

imat

ion

erro

r, pe

rcen

tage

Resolution

Time-of-flight estimation error, by traffic load and automation

level.

Controller Resolution Assistant (CORA)

• EUROCONTROL Bretigny (F) POC: Mary Flynn

• Computer-based tools (e.g. MTCD, TP, etc.)

• Near-term operational

• Two phases CORA 1: identify conflicts, controller solves CORA 2: system provides advisories

CORA: The Challenges

Technical challenges…

Ops challenges…

HF challenges

• Situation Awareness

• Increased monitoring demands

• Cognitive overload

• mis-calibrated trust

• Degraded manual skills

• New selection / training requirements

• Loss of job satisfaction

CORA: Experiment

• Controller preference for resolution order

• Context specificity

• Time benefits (Response Time) of CORA

Construct Operationalised Definition

Result

SA |ATA-ETA| Auto x Traf

Workload PupDiamTX - PupDiam base Datalink displayreduces WL

Dec Making/ Response bias Intent benefitsStrategies

Vigilance RT to Alerts FF = CF

Attitude Survey responses FF OK, but needintent info

Synthesis of results

Validation strategy

Full Mission Simulation– Address human behaviour in the working context

Converging data sources (modelling, sim (FT,RT), etc)

Comprehensive data (objective and subjective)

Operationalise terms (SA, WL)

Assessment of strategies– unexpected behaviours, or covert Dec Making

strategies

Human Performance Metrics:Potential Difficulties

Participant reactivity

Cannot probe infrequent events

Better links sometimes needed to operational issues

Limits of some (eg physiological) measures– intrusiveness– non-monotonicitytask dependence wrt – reliability, sensitivity– time-on-task, motor artefacts

Partial picture – motivational, social, organisational aspects

Using HumPerf Metrics

Choose correct population

Battery of measures for converging evidence

Adequate training / familiarisation

Recognise that behaviour is NOT inner process

More use of cog elicitation techniques

Operator (ie pilot / ATCo) preferences – Weak experimentally, but strong organisationally?

Validation metrics: Comprehensive and complementary

Subj measures easy, cheap, face valid

Subj measures can tap acceptance (wrt new tech)

Objective and subjective can dissociate

Do they tap different aspects (eg of workload)?– Eg training needs identified

Both are necessary, neither sufficient

Operationalise HF validation criteria

HF world (SA, Workload) vs

Ops world (Nav accuracy, efficiency)

Limits dialogue between HF and Ops world

Moving from construct (SA) to criterion (traffic prediction accuracy)

Summing Up: Lessons Learnt

Perfect USER versus perfect TEST SUBJECT (experts?)

Objective vs Subjective Measures– both necessary, neither sufficient

Operationalise terms: pragmatic, bridge worlds

Part task testing in design; Full mission validation

Knowledge elicitation: STRATEGIES

Summing Up (2)...

• Why consider Human Performance?» New ATM tools etc needed to handle demand» Humans are essential link in system

• How / When is HumPerf considered in validation?» Often too little too late…

• Lessons Learnt» Role of objective versus subjective measures» Choosing the correct test population» Realising the potential limitations of “experts”

• Toward a comprehensive perspective…» Bridging the experimental and operational worlds

Thank You...

for further information:

Brian Hilburn

NLR Amsterdam

tel: +31 20 511 36 42

[email protected]

www.nlr.nl

Documents

Human Performance Metrics for ATM Validation Brian Hilburn NLR Amsterdam, The Netherlands