1 Decision Support for Port of Entry Inspection Fred S. Roberts DyDAn Center Rutgers University

1

Decision Support for Port of Entry Inspection

Fred S. RobertsDyDAn CenterRutgers University

2

Homeland Security: What Can Discrete

Science Do?

Fred RobertsProfessor of Mathematics, Rutgers UniversityDirector, DHS Center of Excellence for Discrete Dynamical Systems (DyDAn)

3

What is Discrete Science?

•Deals with data and information

•Massive, heterogeneous, dynamic

•Seeks patterns

•Seeks departure from patterns

•Is built on a foundation of the mathematical sciences:

Math, Computer Science, Statistics

•Develops powerful computer algorithms

4

Methods of discrete science have become important tools for homeland security, especially when combined with powerful, modern computer methods for analysis and simulation.

5

Are you Serious?? What Can Mathematics Do For Us?

6

7

.

After Pearl Harbor: Mathematical methods and mathematical scientists played a vitally important role in the US World War II effort.

8

Critical War-Effort Contributions Included:

•Code breaking.

•Creation of the mathematical sciences-based field of Operations Research:

logistics

optimal scheduling

inventory

strategic planning

Enigma machine

9

But: Homeland Security is Different.

Can Mathematics Really Help?

5 + 2 = ?1, 2, 3, …

10

Port of Entry Inspection Algorithms

•Goal: Find ways to intercept illicit nuclear materials and weapons

destined for the U.S. via the maritime transportation system

•Goal: “inspect all containers arriving at ports”

•Even carefully inspecting 8% of containers in Port of NY/NJ might bring international trade to a halt (Larrabbee 2002)

11

Port of Entry Inspection Algorithms•Aim: Develop decision support algorithms that will help us to “optimally” intercept illicit materials and weapons subject to limits on delays, manpower, and equipment

•Find inspection schemes that minimize total “cost” including “cost” of false alarms (“false positives”) and failed alarms (“false negatives”)

Mobile Vacis: truck-mounted gamma ray imaging system

12

Port of Entry Inspection Algorithms•My work on port of entry inspection has gotten me and my students to some remarkable places.

Me on a Coast Guardboat in a tour of theharbor in Philadelphia

Thanks to Capt. David Scott, Captain of Port,for taking us on thetour

13


Me on a tour of thePort of Newark/Elizabeth

Thanks to Kathleen Haageof Customs and BorderProtection for hosting andto Maher and Port NewarkContainer Terminals fortheir behind-the-sceneshospitality

14


Thanks to James Elyfor hosting me on a tour of PNNL’s Radiation Portal Monitoring Project inconnection with our newDNDO project on sensormanagement for nucleardetection.

15

Sequential Decision Making Problem• Stream of containers arrives at a port• The Decision Maker’s Problem:

• Which to inspect?• Which inspections next based on previous

results?• Approach:

– “decision logics”– combinatorial optimization methods– Builds on ideas of Stroud and Saeger at Los AlamosNational Laboratory– Need for new modelsand methods

16

Sequential Diagnosis Problem

•Such sequential diagnosis problems arise in many areas:

–Communication networks (testing connectivity, paging cellular customers, sequencing tasks, …)–Manufacturing (testing machines, fault diagnosis, routing customer service calls, …)–Medicine (diagnosing patients, sequencing treatments, …)

17

Sequential Decision Making Problem•Containers arriving to be classified into categories.•Simple case: 0 = “ok”, 1 = “suspicious”

•Inspection scheme: specifies which inspections are to be made based on previous observations

18

Sequential Decision Making Problem•0’s and 1’s suggest binary digits (bits)•Bit String: A sequence of bits:

0001, 1101, …•Boolean Function: A function that assigns to each bit string a 0 or a 1.

Bit String x B(x)00 101 0 B(00) = 1, B(10) = 010 011 1

19

Sequential Decision Making Problem

•Bit strings are of course crucial carriers of information in modern computers•They are used to encode detailed instructions and are translated to a sequence of on-off instructions for switches in the computer.•We will use them differently.

20

Counting Bit Strings and Boolean Functions

•To understand the difficulty of the container inspection problem, we will need to count bit strings and Boolean functions.•Product Rule of Counting: If something can happen in n1 ways, and no matter how the first thing happens, a second thing can happen in n2 ways, then the two things together can happen in n1 x n2 ways.

21


•Product Rule of Counting (More Generally): If something can happen in n1 ways, and no matter how the first thing happens, a second thing can happen in n2 ways, and no matter how the first two things happen, a third thing can happen in n3 ways, and …, then all the things together can happen in n1 x n2 x n3 x … ways.

22


•How many bit strings are there of 3 bits?

•There are 2 choices for the first bit.•No matter how it is chosen, there are 2 choices for the second bit.•No matter how the first 2 bits are chosen, there are 2 choices for the third bit.

•Get 2 x 2 x 2 = 23 = 8

23


•How many bit strings are there of 3 bits?

000, 001, 010, 011, 100, 101, 110, 111

How many bit strings are there of 5 bits?

25

How many bit strings are there of n bits?

2n

24


•How many Boolean functions are there of 3 variables?

25



•There are 23 = 8 bit strings of 3 variables• For each such bit string, the Boolean function assigns it one of 2 different values (0 or 1).

26



•There are 23 = 8 bit strings of 3 variables• For each such bit string, the Boolean function assigns it one of 2 different values (0 or 1).

•There are 28 = such Boolean functions. 223

27


•How many Boolean functions are there of n variables?

28



•There are 2n bit strings of n variables• For each such bit string, the Boolean function assigns it one of 2 different values (0 or 1).

29



•There are 2n bit strings of n variables• For each such bit string, the Boolean function assigns it one of 2 different values (0 or 1).

•There are such Boolean functions. 22n

30


How big is ?

When n = 4, this is 65,536

22n

31


The problem of making a detailed design of a digital computer usually involves finding a practical circuit implementation of certain functional behavior. A computer device implements a Boolean function.In the early days of computing, the goal was to create a catalogue listing a good circuit implementation of each Boolean function.With n = 4, this would require 65,536 entries!

32


•Of course, luckily, by symmetry, some of these examples are “equivalent”For example, if we interchange two variables.

•In the early days of computing (1951), a group from the Harvard Computation Laboratory painstakingly listed all 65,536 Boolean functions of 4 variables and determined which were equivalent. •(There were 222 types.)

33

Sequential Decision Making ProblemFor Container Inspection

•Containers have attributes, each in a number of states

•Sample attributes:–Levels of certain kinds of chemicals or biological materials–Whether or not there are items of a certain kind in the cargo list–Whether cargo was picked up in a certain port

34


•Currently used attributes:–Does ship’s manifest set off an “alarm”?–What is the neutron or Gamma emission count? Is it above threshold?–Does a radiograph image come up positive?–Does an induced fission test come up positive?

Gamma ray detector

35


•We can imagine many other attributes• Research at DyDAn is concerned with general algorithmic approaches.•We seek a methodology not tied to today’s technology.•Detectors are evolving quickly.

36


•Simplest Case: Attributes are in state 0 or 1 (absent or present)

•Then: Container is a bit string like 011001

•So: Classification is a decision function F that assigns each binary string to a category.

011001 F(011001)

If attributes 2, 3, and 6 are present, assign container to category F(011001).

37

Sequential Decision Making Problem•If there are two categories, 0 and 1 (“safe” or suspicious”), the decision function F is a Boolean function.

Example: F(000) = F(111) = 1, F(abc) = 0 otherwise

This classifies a container as positive iff it has none of the attributes or all of them.

1 =

38


•What if there are three categories, 0, ½, and 1?.

Example: F(000) = 0, F(111) = 1, F(abc) = 1/2 otherwise

This classifies a container as positive if it has all of the attributes, negative if it has none of the attributes, and uncertain if it has some but not all of the attributes.

•I won’t discuss this case. •Exercise: How many such decision functions are there if there are three attributes 0, ½, 1?

39


•Given a container, test its attributes until know enough to calculate the value of F.

•An inspection scheme tells us in which order to test the attributes to minimize cost.

•Even this simplified problem is hard computationally.

40

Sequential Decision Making Problem•This assumes F is known.•Simplifying assumption: Attributes are independent.•At any point we stop inspecting and output the value of F based on outcomes of inspections so far.•Complications: May be precedence relations in the components (e.g., can’t test attribute a4 before testing a6. •Or: cost may depend on attributes tested before.•F may depend on variables that cannot be directly tested or for which tests are too costly.

41

Sequential Decision Making Problem•Such problems are hard computationally.•There are many possible Boolean functions F.•Even if F is fixed, problem of finding a good classification scheme (to be defined precisely below) is NP-complete – it is hard in a precise computer science sense. •Several classes of Boolean functions F allow for efficient inspection schemes:

- k-out-of-n systems- Certain series-parallel systems- Read-once systems- “regular” systems- Horn systems

42

Sensors and Inspection Lanes•n types of sensors measure presence or absence of the n attributes. •Many copies of each sensor.•Complication: different characteristics of sensors.•Entities come for inspection.•Which sensor of a given type to use?•Think of inspection lanes and waiting on line for inspection•Besides efficient inspection schemes, could decrease costs by:

–Buying more sensors–Change allocation of containers to sensor lanes.

43

Trees•A tree for us is a directed graph.•It has nodes (vertices).•Directed edges or arcs headfrom a vertex to another.•There are no “cycles”(you can’t double back onyourself)•We will deal with rooted trees: One node is a root.•All arcs point downwards in our diagrams, starting from the root.•If each node has two or zero outgoing (downwards) arcs, we have a binary tree. •Nodes with no outgoing arcs are called leaves.

44

Binary Decision Tree Approach

•Sensors measure presence/absence of attributes: so 0 or 1•Use two categories: 0, 1 (safe or suspicious)

•Binary Decision Tree: –Nodes are sensors or categories–Two arcs exit from each sensor node, labeled left and right.–Take the right arc when sensor says the attribute is present, left arc otherwise

45


•Reach category 1 from the root only through the path a0 to a1 to 1.

•Container is classified in category 1 iff it has both attributes a0 and a1 .

•Corresponding Boolean function:• F(11) = 1, F(10) = F(01) = F(00) = 0.

Figure 1

46


•Reach category 1 from the root only through the path a1 to a0 to 1.

•Container is classified in category 1 iff it has both attributes a0 and a1 .

•Corresponding Boolean function:• F(11) = 1, F(10) = F(01) = F(00) = 0. •Note: Different tree, same function

Figure 1

47


•Reach category 1 from the root only through the path a0

to 1 or a0 to a1 to 1.

•Container is classified in category 1 iff it has attribute a0 or attribute a1 .

•Corresponding Boolean function:• F(11) = 1, F(10) = F(01) = 1, F(00) = 0.

Figure 1

48


•Can we find another binary tree that calculates the sameBoolean function?

•Sure, just interchangenodes a0 and a1

Figure 1

49

Binary Decision Tree Approach•Reach category 1 from the root by:a0 L to a1 R a2 R 1 ora0 R a2 R1

•Container classified in category 1 iff it hasa1 and a2 and not a0 or a0 and a2 and possibly a1.

•Corresponding Boolean function:• F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise.

Figure 2

50

Binary Decision Tree Approach•This binary decision tree corresponds to the same Boolean function

F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise.

However, it has one less observation node ai. So, it is more efficient if all observations are equally costly and equally likely.

Figure 3

51

Binary Decision Tree Approach•So we have seen that a given Boolean function may correspond to different binary decision trees.•How do we find a low-cost or least-cost binary decision tree corresponding to a Boolean function?

52

Binary Decision Tree Approach•Even if the Boolean function F is fixed, the problem of finding the “least cost” binary decision tree for it is very hard (NP-complete).

•For small n = number of attributes, can try to solve it by trying all possible binary decision trees corresponding to the Boolean function F.

•Even for n = 4, not practical. (n = 4 at Port of Long Beach-Los Angeles)

Port of Long Beach

53

Binary Decision Tree ApproachPromising Approaches:

•Heuristic algorithms, approximations to optimal.•Special assumptions about the Boolean function F. •For “monotone” Boolean functions, integer programming formulations give promising heuristics.•Stroud and Saeger (Los Alamos National Lab) enumerate all “complete, monotone” Boolean functions and calculate the least expensive corresponding binary decision trees.•Their method practical for n up to 4, not n = 5.

54

Binary Decision Tree ApproachMonotone Boolean Functions:

•Given two bit strings x1x2…xn, y1y2…yn

•Suppose that xi yi for all i implies that F(x1x2…xn) F(y1y2…yn).•Then we say that F is monotone. •Then 11…1 has highest probability of being in category 1.

55

Binary Decision Tree ApproachMonotone Boolean Functions:

•Given two bit strings x1x2…xn, y1y2…yn

•Suppose that xi yi for all i implies that F(x1x2…xn) F(y1y2…yn).•Then we say that F is monotone. •Example:•n = 4, F(x) = 1 iff x has at least two 1’s.•F(1100) = F(0101) = F(1011) = 1, F(1000) = 0, etc.•Is this monotone?•Yes

56


Incomplete Boolean Functions:

•Boolean function F is incomplete if F can be calculated by finding at most n-1 attributes and knowing the value of the input string on those attributes•Example: F(111) = F(110) = F(101) = F(100) = 1, F(000) = F(001) = F(010) = F(011) = 0. •F(abc) is determined without knowing b (or c).•F is incomplete.

57


Complete, Monotone Boolean Functions:

•Stroud and Saeger: algorithm for enumerating binary decision trees implementing complete, monotone Boolean functions. •Feasible to implement up to n = 4.•Then you can find least cost tree by enumerating all binary decision trees corresponding to a given complete, monotone Boolean function and repeating this for all complete, monotone Boolean functions.

58



•Stroud and Saeger: algorithm for enumerating binary decision trees implementing complete, monotone Boolean functions. •n = 2:

–There are 6 monotone Boolean functions.–Only 2 of them are complete, monotone–There are 4 binary decision trees for calculating these 2 complete, monotone Boolean functions.

59



•n = 3:–9 complete, monotone Boolean functions.–60 distinct binary trees for calculating them–Counting methods more complicated than simple ones we have described before–All counts here are from Stroud and Saeger

60



•n = 4:–114 complete, monotone Boolean functions.–11,808 distinct binary decision trees for calculating them.–(Compare 1,079,779,602 BDTs for all Boolean functions)

61



•n = 5:–6894 complete, monotone Boolean functions–263,515,920 corresponding binary decision trees.

•Combinatorial explosion! •Need alternative approaches; enumeration not feasible!•(Even worse: compare 5 x 1018 BDTs corresponding to all Boolean functions)

62

Cost Functions

•So far, we have figured one binary decision tree is cheaper than another if it has fewer nodes.•This is oversimplified. •There are more complex costs involved than number of sensors in a tree.

63

Cost Functions

•Stroud-Saeger method applies to more sophisticated cost models, not just cost = number of sensors in the BDT.•Using a sensor has a cost:

–Unit cost of inspecting one item with it–Fixed cost of purchasing and deploying it–Delay cost from queuing up at the sensor station

•Preliminary problem: disregard fixed and delay costs. Minimize unit costs.

64

Cost Functions: Delay Costs•Tradeoff between fixed costs and delay costs: Add more sensors cuts down on delays.•More sophisticated models describe the process of containers arriving•There are differing delay times for inspections•Use “queuing theory” to find average delay times under different models

65

Cost Functions

•Unit Cost Complication: How many nodes of the decision tree are actually visited during average container’s inspection? Depends on “distribution” of containers. •Answer can also depend on probability of sensor errors and probability of bomb in a container.

66

Cost Functions:Unit Costs

Tree Utilization

•In our early models, we assume we are given probability of sensor errors and probability of bomb in a container.•This allows us to calculate “expected” cost of utilization of the tree Cutil.

67

Cost Functions

OTHER COSTS:

•Cost of false positive: Cost of additional tests.

–If it means opening the container, it’s expensive.

•Cost of false negative: –Complex issue.–What is cost of a bomb going off in Manhattan?

68

Cost Functions: Sensor Errors•One Approach to False Positives/Negatives:Assume there can be Sensor Errors•Simplest model: assume that all sensors checking for attribute ai have same fixed probability of saying ai is 0 if in fact it is 1, and similarly saying it is 1 if in fact it is 0.•More sophisticated analysis later describes a model for determining probabilities of sensor errors. •Notation: X = state of nature (bomb or no bomb)

Y = outcome (of sensor or entire inspection process).

69

Probability of Error for The Entire TreeState of nature is zero (X = 0), absence of a bomb State of nature is one (X = 1), presence

of a bomb

Probability of false positive (P(Y=1|X=0))

for this tree is given by

Probability of false negative(P(Y=0|X=1))

for this tree is given by

A

B

C

0

1

0 1

A

B

C

0

1

0 1

P(Y=1|X=0) = P(YA=1|X=0) * P(YB=1|X=0) + P(YA=1|X=0) *P(YB=0|X=0)* P(YC=1|X=0)

Pfalsepositive

P(Y=0|X=1) = P(YA=0|X=1) + P(YA=1|X=1) *P(YB=0|X=1)*P(YC=0|X=1)

Pfalsenegative

70

Cost Function used for Evaluating the Decision Trees.

CTot = CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative + Cutil

CFalsePositive is the cost of false positive (Type I error)CFalseNegative is the cost of false negative (Type II error)PFalsePositive is the probability of a false positive occurringPFalseNegative is the probability of a false negative occurringCutil is the expected cost of utilization of the tree.

71

Cost Function used for Evaluating the Decision Trees.

CFalsePositive is the cost of false positive (Type I error)CFalseNegative is the cost of false negative (Type II error)PFalsePositive is the probability of a false positive occurringPFalseNegative is the probability of a false negative occurringCutil is the expected cost of utilization of the tree.

PFalsePositive and PFalseNegative are calculated from the tree. Cutil is calculated from tree and probabilities of bomb in container and probability of sensor errors.CFalsePositive, CFalseNegative are input – given information.

72

Stroud Saeger Experiments• Stroud-Saeger ranked all trees formedfrom 3 or 4 sensors A, B, C and D according to increasing tree costs. • Used cost function defined above. • Values used in their experiments:

– CA = .25; P(YA=1|X=1) = .90; P(YA=1|X=0) = .10;– CB = 10; P(YC=1|X=1) = .99; P(YB=1|X=0) = .01;– CC = 30; P(YD=1|X=1) = .999; P(YC=1|X=0) = .001;– CD = 1; P(YD=1|X=1) = .95; P(YD=1|X=0) = .05;

– Here, Ci = unit cost of utilization of sensor i. • Also fixed were: CFalseNegative, CFalsePositive, P(X=1)

73

Sensitivity Analysis

• When parameters in a model are not known exactly, the results of a mathematical analysis can change depending on the values of the parameters.

• It is important to do a sensitivity analysis: let the parameter values vary and see if the results change.

• So, do the least cost trees change if we change values like probability of a bomb, cost of a false positive, etc?

74

Stroud Saeger Experiments: Our Sensitivity Analysis

• We have explored sensitivity of the Stroud-Saeger conclusions to variations in values of the three parameters:

CFalseNegative, CFalsePositive, P(X=1)

• We estimated high and low values for the parameters.

75

Stroud Saeger Experiments: Our Sensitivity Analysis

– CFalseNegative was varied between 25 million and 10 billion dollars• Low and high estimates of direct and indirect costs

incurred due to a false negative.

– CFalsePositive was varied between $180 and $720

• Cost incurred due to false positive

(4 men * (3 -6 hrs) * (15 – 30 $/hr)– P(X=1) was varied between 1/10,000,000 and

1/100,000

76

Stroud Saeger Experiments: Our Sensitivity Analysisn = 3 (use sensors A, B, C)

• Varied the parameters

CFalseNegative, CFalsePositive, P(X=1)• We chose the value of one of these parameters from the

interval of values• Then explored the highest ranked tree as the other two

parameters were chosen at random in the interval of values.

• 10,000 experiments for each fixed value. • We looked for the variation in the top-ranked tree and

how the top-rank related to choice of parameter values.• Very surprising results.

77

Frequency of Top-ranked Trees when CFalseNegative and CFalsePositive are Varied

• 10,000 randomized experiments (randomly selected values of CFalseNegative and CFalsePositive from the specified range of values) for the median value of P(X=1).

• The above graph has frequency counts of the number of experiments when a particular tree was ranked first or second or third and so on.

• Only three trees (7, 55 and 1) ever came first. 6 trees came second, 10 came third, 13 came fourth.

0 10 20 30 40 50 600

1000

2000

3000

4000

5000

6000

7000

Tree no.

Fre

qu

en

cy

1st2nd3rd4th5th

78

• 10,000 randomized experiments for the median value of CFalsePositive.

• Only 2 trees (7 and 55) ever came first. 4 trees came second. 7 trees came third. 10 and 13 trees came 4th and 5th respectively.

Frequency of Top-ranked Trees when CFalseNegative and P(X=1) are Varied

0 10 20 30 40 50 600

1000

2000

3000

4000

5000

6000

7000

8000

Tree no.

Fre

qu

en

cy

1st2nd3rd4th5th

79

• 10,000 randomized experiments for the median value of CFalseNegative. • Only 3 trees (7, 55 and 1) ever came first. 6 trees came second. 10

trees came third. 13 and 16 trees came 4th and 5th respectively.

Frequency of Top-ranked Trees when P(X=1) and CFalsePositive are Varied

0 10 20 30 40 50 600

1000

2000

3000

4000

5000

6000

7000

Fre

qu

en

cy

Tree no.

1st2nd3rd4th5th

80

Most Frequent Tree Groups Attaining the Top Three Ranks.

• Trees 7, 9 and 10

A

B

C

0

1

0 1

B

A

C

A

1

0 1

0 0

B

A

A

C

1

0 1

0 0

All the three decision trees have been generated from the same Boolean function.Both Tree 9 and Tree 10 are ranked second and third more than 99% of the times when Tree 7 is ranked first.

81

Most Frequent Tree Groups Attaining the Top Three Ranks

• Trees 55, 57 and 58

A

1

C

B

1

0 1

B

1

C

A

1

0 1

B

1

A

C

1

0 1

All three trees correspond to the same Boolean function.Tree ranked 57 is second 96% of the times and tree 58 is third 79 % of the times when tree 55 is ranked first.

82

Most Frequent Tree Groups Attaining the Top Three Ranks

• Trees 1, 3, and 2

A

B

C

0

0

0 1

A

C

B

0

0

0 1

B

A

C

0

0

0 1

All three trees correspond to the same Boolean function.Tree 3 is ranked second 98% of times and tree 2 is ranked third 80 % of the times when tree 1 is ranked first.

83

Stroud Saeger Experiments: Sensitivity Analysis: 4 Sensors

• Second set of computer experiments: n = 4

(use sensors, A, B, C, D).

• Same values as before.

• Experiment 1: Fix values of two of CFalseNegative,

CFalsePositive, P(X=1) and vary the third through their interval of possible values.

• Experiment 2: Fix a value of one of CFalseNegative,

CFalsePositive, P(X=1) and vary the other two.

• Do 10,000 experiments each time.

• Look for the variation in the highest ranked tree.

84

Stroud Saeger Experiments: Our Sensitivity Analysis: 4 Sensors

• Experiment 1: Fix values of two of CFalseNegative, CFalsePositive, P(X=1) and vary the third.

85

CTot vs CFalseNegative for Ranked 1 Trees (Trees 11485(9651) and 10129(349))

Only two trees ever were ranked first, and one, tree 11485, was ranked first in 9651 out of 10,000 runs.

86

CTot vs CFalsePositive for Ranked 1 Trees (Tree no. 11485 (10000))

One tree, number 11485, was ranked first every time.

87

CTot vs P(X=1) for Ranked 1 Trees (Tree no. 11485(8372), 10129(488), 11521(1056))

Three trees dominated first place. Trees 10201(60), 10225(17) and 10153(7) also achieved first rank but with relatively low frequency.

88

Tree Structure For Top Trees

1

a

b b

d c

d 1

c

d 1

100 1

0 1

Tree number 11485

1

a

b b

c

d 1

c

d 1

100 1

c

0 d

10

Tree number 10129

89


• Experiment 2: Fix the values of one of CFalseNegative, CFalsePositive, P(X=1) and vary the others.

90


• Experiment 2: Fix the values of one of CFalseNegative, CFalsePositive, P(X=1) and vary the others.

• Similar

results

91

Conclusions from Sensitivity Analysis

• Considerable lack of sensitivity to modification in parameters for trees using 3 or 4 sensors.

• Very few optimal trees.

• Very few Boolean functions arise among optimal and near-optimal trees.

• Surprising results.

92

Modeling Sensor Errors•One Approach to Sensor Errors: Modeling Sensor Operation

•Threshold Model:–Sensors have different discriminating power–Many use counts (e.g., Gamma radiation counts)–See if count exceeds threshold–If so, say attribute is present.

93

Modeling Sensor Errors

Threshold Model:•Sensor i has discriminating power Ki, threshold Ti

•Attribute present if counts exceed Ti

•Seek threshold values that minimize the overall cost function, including costs of inspection, false positive/negative•Assume readings of category 0 containers follow a Gaussian distribution and similarly category 1 containers•Simulation approach

94

Probability of Error for Individual Sensors

• For ith sensor, the type 1 (P(Yi=1|X=0)) and type 2 (P(Yi=0|X=1)) errors are modeled using Gaussian distributions. – State of nature X=0 represents absence of a bomb.– State of nature X=1 represents presence of a bomb. i represents the outcome (count) of sensor i. – Σi is variance of the distributions– PD = prob. of detection, PF = prob. of false pos.

Ki

P(i|X=1)P(i|X=0)

Ti

Characteristics of a typical sensori

95

Modeling Sensor ErrorsThe probability of false positive for the ith sensor is computed as:

P(Yi=1|X=0) = 0.5 erfc[Ti/√2]The probability of detection for the ith sensor is computed as:

P(Yi=1|X=1) = 0.5 erfc[(Ti-Ki)/(Σ√2)]

erfc = complementary error function erfc(x) = (1/2,x2)/sqrt()

The following experiments have been done using sensors A, B, C and using:

KA = 4.37; ΣA = 1KB = 2.9; ΣB = 1KC = 4.6; ΣC = 1

We then varied the individual sensor thresholds TA, TB and TC from -4.0 to +4.0 in steps of 0.4. These values were chosen since they gave us an “ROC curve” for the individual sensors over a complete range P(Yi=1|X=0) and P(Yi=1|X=1)

96

Frequency of First Ranked Trees for Variations in Sensor Thresholds

• 68,921 experiments were conducted, as each Ti was varied through its entire range. (n = 3)• The above graph has frequency counts of the number of experiments when a particular tree

was ranked first. There are 15 such trees. Tree 37 had the highest frequency of attaining rank one.

0 10 20 30 40 50 600

2000

4000

6000

8000

10000

12000

14000

16000

18000

Tree no.

Fre

qu

en

cy

97

Modeling Sensor Errors•A number of trees ranking first in other experiments also ranked first here.

•Similar results in case of n = 4.•4,194,481 experiments.•244 different trees were ranked first in at least one experiment.•Trees ranked first in other experiments also frequently appeared first here.

•Conclusion: considerable insensitivity to change of threshold.

98

New Approaches to Optimum Threshold Computation

• Extensive search over a range of thresholds has some practical drawbacks:– Large number of threshold values for every sensor– Large step size– Grows exponentially with the number of sensors

(computationally infeasible for n > 4)

• A non-linear optimization approach proves more satisfactory:– A combination of Gradient Descent and modified

Newton’s methods

99

Problems with Standard Approaches• Gradient Descent Method:

– Too small step size results in large number of iterations to reach the minimum

– Too big step size results in skipping the minimum• Newton’s Method:

– The convergence depends largely on the starting point. This method occasionally drifts in the wrong direction and hence fails to converge.

• Solution: combination of gradient descent and Newton’s methods

• This works well.

100

Results: Threshold Optimization

• Costs of false positive CFalsePositive and false negative CFalseNegative and prior probability of occurrence of a bad container, P(X=1), were fixed as medians of the min and max values given by Stroud and Saeger (same as we used in earlier experiments)

• We were able to converge to a (hopefully-close-to-minimum) cost every time with a modest number of iterations changing thresholds.

101

Results: Threshold Optimization• We were able to converge to a (hopefully-close-to-

minimum) cost every time with a modest number of iterations changing thresholds. For example:– For 3 sensors, it took an average of 0.081 seconds (as

opposed to 0.387 seconds using extensive search) to converge to a cost for all 114 trees

– For 4 sensors, it took an average of 0.196 seconds (as opposed to more than 2 seconds using extensive search) to converge to a cost for all 66,936 trees

• In each case, min cost attained with new algorithm was lower, and often much lower, than that attained with extensive search.

102

Results: Threshold Optimization

Many times the minimum obtained using the optimization method was considerably less than the one from the extensive search technique.

0 20 40 60 80 100

100

150

200

250

300

350

400

450

500

Tree Number

Tot

al C

ost

Tree costs at optimum thresholds

Combined OptimizationExtensive search

103

New Idea: Searching through a Generalized Tree Space

• Sometimes adding more possibilities results in being able to do more efficient searches.

• We expand the space of trees from those corresponding to Stroud and Saeger’s “Complete and Monotonic” Boolean Functions to “Complete and Monotonic” BDTs”.

• Advantages:– Unlike Boolean functions, BDTs may not have to consider all

sensor inputs to give a final decision.– Allow more potentially useful trees to participate in the analysis– Help define an irreducible tree space for search operations

104

Revisiting Monotonicity• Monotonic Decision Trees

– A binary decision tree will be called monotonic if all the left leaves are class “0” and all the right leaves are class “1”.

• Example:a b c F(abc) 0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 0 1 0 1 11 1 0 01 1 1 1

a b b

b c a c c a

0 1 0 1 0 c a 1 0 a 1 c

0 1 1 0 0 1 0 1 b c

c c b b

0 a a 1 0 a a 1

0 1 1 0 1 0 0 1

c c

a b b a

b 0 a 1 0 a b 1

0 1 0 1 1 0 0 1

All these trees correspond to same monotonic Boolean functionOnly one is a monotonic BDT.

105

Revisiting Completeness• Complete Decision Trees

– A binary decision tree will be called complete if every sensor occurs at least once in the tree and, at any non-leaf node in the tree, its left and right sub-trees are not identical.

• Example:a b c F(abc) 0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 0 1 0 1 11 1 0 11 1 1 1

a

b c

c 1 b 1

0 1 0 1

a

c b

b 1 c 1

0 1 0 1

a

c c

b 1 b 1

0 1 0 1

a

b b

c 1 c 1

0 1 0 1

106

The CM Tree Space

No. of attributes

Distinct BDTsTrees From CM Boolean Functions

Complete, Monotonic BDTs

2 74 4 4

3 16,430 60 114

4 1,079,779,602 11,808 66,936

complete, monotonic BDTs

107

Tree Neighborhood and Tree Space• Define tree neighborhood by giving operations for

moving from one tree in CM Tree Space to another.• We have developed an algorithm for finding low-

cost BDTs by searching through CM Tree Space from a tree to one of its neighbors.

108

Search Operations in Tree Space

• Split Pick a leaf node and replace it with a sensor that is

not already present in that branch, and then insert arcs from that sensor to 0 and to 1.

a

b c

0 c d 1

d 1 0 1

0 1

SPLIT

a

b c

0 c d 1

d 1 b 1

0 1 0 1

109

Search Operations• Swap

Pick a non-leaf node in the tree and swap it with its parent node such that the new tree is still monotonic and complete and no sensor occurs more than once in any branch.

a

b c

0 c d 1

d 1 0 1

0 1

SWAP

a

b c

0 d d 1

c 1 0 1

0 1

110

Search Operations• Merge

Pick a parent node of two leaf nodes and make it a leaf node by collapsing the two leaf nodes below it, or pick a parent node with one leaf node, collapse both the parent node and its one leaf node, and shift the sub-tree up in the tree by one level.

a

b c

0 c d 1

d 1 0 1

0 1

MERGE

a

b c

0 d d 1

0 1 0 1

a

b c

0 c d 1

0 1 0 1

111

Search Operations• Replace

Pick a node with a sensor occurring more than once in the tree and replace it with any other sensor such that no sensor occurs more than once in any branch.

a

b c

0 c d 1

d 1 0 1

0 1

REPLACE

a

b c

0 c b 1

d 1 0 1

0 1

112

113

Tree Neighborhood and Tree Space

• Define tree neighborhood by using these four operations for moving from one tree in CM Tree Space to another.

• Irreducibility– Theorem: Any tree in the CM tree space can be

reached from any other tree by using these neighborhood operations repetitively

– An irreducible CM tree space helps “search” for the cheapest trees using neighborhood operations

114

Tree Space Traversal• Naïve Idea: Greedy Search

1. Randomly start at any tree in the CM tree space2. Find its neighboring trees using the above operations3. Move to the neighbor with the lowest cost4. Iterate until we find a minimum

– Problem: The CM Tree space is highly multi-modal (more than one local minimum)!

– Therefore, we implement a stochastic search algorithm with simulated annealing to find the best tree

115

Tree Space Traversal

• Stochastic Search– Randomly start at any tree in CM space– Find its neighboring trees, and evaluate each one for its

total cost– Select next move according to a probability distribution

over the neighboring trees

• To deal with the multimodality of the tree space, we introduce Simulated Annealing:– Make more random jumps initially, gradually decrease the

randomness and finally converge at the overall minimum

116

Results: Searching CM Tree Space• We were able to perform experiments for 3, 4 and 5 sensors,

successfully• Results show improvement compared to the extensive search

method. E.g., for 4 sensors (66,936 trees)– 100 different experiments were performed– Each experiment was started 10 times randomly at some tree and chains

were formed by making stochastic moves in the neighborhood, until we find a local minimum

– Only 4890 trees were examined on average for every experiments– Global minimum was found 82 out of 100 times while the second best

tree was found 10 times– The method found trees that were less costly than those found by earlier

searches of BDTs corresponding to complete, monotonic Boolean functions.

117

Genetic Algorithms-based Approach

• Structure-based neighborhood moves allow very short moves only. Therefore,…

• Techniques like Genetic Algorithms and Evolutionary Techniques may suggest ways for getting more efficiently to better trees, given a population of good trees

118

Genetic Algorithms-based Approach• Started implementing genetic algorithms-based

techniques for tree space traversal• Basically, we try to get “better” trees from the current

population of “good” trees using the basic genetic operations on them:– Selection– Crossover– Mutation

• Here, “better” decision trees correspond to lower cost decision trees than the ones in the current population (“good”).

• Only ~1600 trees had to be examined to obtain the 10 best trees for 4 sensors!

119

Closing Comments• Very few optimal trees; optimality insensitive to changes in

parameters.

• Extensive search techniques become practically infeasible beyond a very small number of sensors

• Our new threshold optimization algorithms provide faster ways to arrive at a low tree cost; cost is lower and often much lower than in extensive search

• Studying an irreducible tree space helps us to “search” for the best trees rather than evaluating all the trees for their cost

• A new stochastic search algorithm allows us to search for optimum inspection schemes beyond 4 sensors successfully

120

Discussion and Future Work•Future Work: Explain why conclusions are so insensitive to variation in parameter values.•Future Work: Explore the structure of the optimal trees and compare the different optimal trees.•Future Work: Develop methods for approximating the optimal tree.

Pallet vacis

121

Discussion and Future Work•Future work: More than two values of an attribute

(present, absent, present with probability > 75%, absent with probability at least 75%) (ok, not ok, ok with probability > 99%, ok with probability between 95% and 99%)

•Future work: In the Boolean function model: inferring the Boolean function from observations (partially defined Boolean functions)

122

Discussion and Future Work•Future work: Need for more complicated cost models; bringing in costs of delays

123

Discussion and Future Work• Future work: Because of the rapid growth in

number of trees in CM Tree Space when the number of sensors grows, it is necessary to try to reduce the number of trees we need to search through.

• A notion of tree equivalence could be incorporated when the number of sensors go beyond 5 or 6

• We hope that incorporating this into our model will enable us to extend our model to a large number of sensors

124

Discussion and Future Work

Future Work: Develop decision support algorithms for the “risk scoring” that takes place before a container reaches port.

We are beginning a collaboration with the Canadian Border Services Agency, which is very interested in our methods.

125

Research Team• Saket Anand, Rutgers, ECE graduate student• Endre Boros, Rutgers, Operations Research• Elsayed Elsayed, Rutgers, Ind. & Systems Engineering• Liliya Fedzhora, Rutgers, Operations Res. grad. student• Paul Kantor, Rutgers, Schl. of Infor. & Library Studies• Abdullah Karaman, Rutgers Ind. & Syst. Eng. grad. student• Alex Kogan, Rutgers, Business School• Paul Lioy, Rutgers/UMDNJ, Environmental and Occupational Health and

Sciences Institute• David Madigan, Rutgers, Statistics• Richard Mammone, Rutgers, Center for Advanced Information Processing• Sushil Mittal, Rutgers ECE graduate student• S. Muthukrishnan, Rutgers, Computer Science• Saumitr Pathek, Rutgers ECE graduate student• Richard Picard, Los Alamos, Statistical Sciences Group• Fred Roberts, Rutgers, DyDAn Center• Kevin Saeger, Los Alamos, Homeland Security• Phillip Stroud, Los Alamos, Systems Engineering and Integration Group• Hao Zhang, Rutgers Ind. & Systems Eng., graduate student

126

Collaborators on this Work:• Saket Anand• David Madigan• Richard Mammone• Sushil Mittal• Saumitr Pathak

Research Support: • Dept. of Homeland Security University Programs• Domestic Nuclear Detection Office• Office of Naval Research• National Science Foundation

Los Alamos National Laboratory:• Rick Picard• Kevin Saeger• Phil Stroud

127

We think that discrete Science can help with homeland security!

Documents

1 Decision Support for Port of Entry Inspection Fred S. Roberts DyDAn Center Rutgers University