63
Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions SQL’s Handling of Nulls: Can It Be Fixed? Leonid Libkin (University of Edinburgh) SEBD, June 2015 sql, nulls, & certain answers 1/42

SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL’s Handling of Nulls: Can It Be Fixed?

Leonid Libkin (University of Edinburgh)

SEBD, June 2015 sql, nulls, & certain answers 1/42

Page 2: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Problematic SQL’s nulls

“. . . this topic cannot be described in a manner that issimultaneously both comprehensive and comprehensible”“Those SQL features are . . . fundamentally at odds with theway the world behaves”

C. Date & H. Darwen, ‘A Guide to SQL Standard’

“If you have any nulls in your database, you’re getting wronganswers to some of your queries. What’s more, you have noway of knowing, in general, just which queries you’re gettingwrong answers to; all results become suspect. You can nevertrust the answers you get from a database with nulls”

C. Date, ‘Database in Depth’

SEBD, June 2015 sql, nulls, & certain answers 2/42

Page 3: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

But nulls must be handled...

◮ Incomplete data is everywhere.

◮ The more data we accumulate, the more incomplete data weaccumulate.

◮ Sources:◮ Traditional (missing data, wrong entries, etc)◮ The Web◮ Integration/translation/exchange of data, etc

◮ The importance of it was recognized early◮ Codd, “Understanding relations (installment #7)”, 1975.

◮ And yet the state is very poor.

SEBD, June 2015 sql, nulls, & certain answers 3/42

Page 4: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL: example 1

Orders Payments

order id title

ord1 ‘SQL Standard’

ord2 ‘Database Systems’

ord3 ‘Logic’

pay id order id amount

p1 ord1 –

p2 – $50

SEBD, June 2015 sql, nulls, & certain answers 4/42

Page 5: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL: example 1

Orders Payments

order id title

ord1 ‘SQL Standard’

ord2 ‘Database Systems’

ord3 ‘Logic’

pay id order id amount

p1 ord1 –

p2 – $50

Query: all payment ids. Written as:

SELECT pay id FROM PaymentsWHERE amount ≥ 50 OR amount < 50

SEBD, June 2015 sql, nulls, & certain answers 4/42

Page 6: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL: example 1

Orders Payments

order id title

ord1 ‘SQL Standard’

ord2 ‘Database Systems’

ord3 ‘Logic’

pay id order id amount

p1 ord1 –

p2 – $50

Query: all payment ids. Written as:

SELECT pay id FROM PaymentsWHERE amount ≥ 50 OR amount < 50

Answer: only p2!

SEBD, June 2015 sql, nulls, & certain answers 4/42

Page 7: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL: example 2

Query: unpaid orders:

SELECT order id FROM OrdersWHERE order id NOT IN (SELECT order id FROM Payments)

Answer:

SEBD, June 2015 sql, nulls, & certain answers 5/42

Page 8: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL: example 2

Query: unpaid orders:

SELECT order id FROM OrdersWHERE order id NOT IN (SELECT order id FROM Payments)

Answer: EMPTY!

SEBD, June 2015 sql, nulls, & certain answers 5/42

Page 9: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL: example 2

Query: unpaid orders:

SELECT order id FROM OrdersWHERE order id NOT IN (SELECT order id FROM Payments)

Answer: EMPTY!

◮ This goes against our intuition: 3 orders, 2 payments.

◮ At least one must be unpaid!

SEBD, June 2015 sql, nulls, & certain answers 5/42

Page 10: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL: example 2

Query: unpaid orders:

SELECT order id FROM OrdersWHERE order id NOT IN (SELECT order id FROM Payments)

Answer: EMPTY!

◮ This goes against our intuition: 3 orders, 2 payments.

◮ At least one must be unpaid!

◮ SQL tells us that |X | > |Y | and X − Y = ∅ are compatible.

◮ This is cast in stone (SQL standard).

SEBD, June 2015 sql, nulls, & certain answers 5/42

Page 11: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Source of problems: 3-valued logic

SQL used 3-valued logic, or 3VL, for databases with nulls.

Comparisons involving nulls evaluate to unknown: for instance, 5 = nullresults in unk.

They are propagated using 3VL rules:

unk ∨ unk = unk unk ∨ true = true unk ∧ unk = unkunk ∧ false = false ¬ unk = unk etc

◮ Committee design from 30 years ago, leads to many problems,

◮ but is efficient and used everywhere

SEBD, June 2015 sql, nulls, & certain answers 6/42

Page 12: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

What does theory have to offer?

The notion of correctness — certain answers.

◮ Answers independent of the interpretation of missing information.

◮ Typically defined as

certain(Q,D) =⋂

Q(D ′)

over all possible worlds D ′ described by D

◮ Standard approach, used in all applications: data integration andexchange, inconsistent data, querying with ontologies, data cleaning,etc.

Question: What is the relationship between SQL and certain answers?

SEBD, June 2015 sql, nulls, & certain answers 7/42

Page 13: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Can they be the same?

No!

SEBD, June 2015 sql, nulls, & certain answers 8/42

Page 14: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Can they be the same?

No!

Complexity argument:

◮ Finding certain answers for relational calculus queries in coNP-hard

◮ SQL is very efficient (DLOGSPACE)

SEBD, June 2015 sql, nulls, & certain answers 8/42

Page 15: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Can they be the same?

No!

Complexity argument:

◮ Finding certain answers for relational calculus queries in coNP-hard

◮ SQL is very efficient (DLOGSPACE)

Now we:

◮ Analyze the behavior of SQL

◮ See what it can do wrong (when right is certain answers)

◮ See what we can do to make it go wrong less often

SEBD, June 2015 sql, nulls, & certain answers 8/42

Page 16: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Wrong behaviors: false negatives and false positives

False negatives: missing some of the certain answers

False positives: giving answers which are not certain

Complexity tells us:

SQL query evaluation cannot avoid both!

SEBD, June 2015 sql, nulls, & certain answers 9/42

Page 17: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Wrong behaviors: false negatives and false positives

False negatives: missing some of the certain answers

False positives: giving answers which are not certain

Complexity tells us:

SQL query evaluation cannot avoid both!

SQL must generate at least one type of errors.

SEBD, June 2015 sql, nulls, & certain answers 9/42

Page 18: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL’s errors

False positives are worse: they tell you something blatantly false ratherthan hide part of the truth

But examples we’ve seen only have false negatives.

Perhaps SQL only generates one type of errors – and milder ones?

Since it is impossible to avoid errors altogether, this wouldn’t be so bad.

And complexity doesn’t rule this out.

SEBD, June 2015 sql, nulls, & certain answers 10/42

Page 19: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

But design by committee does...

Relations: R =A

1S =

A

null

Query:

SELECT R.A FROM RWHERE R.A NOT IN (SELECT R1.A FROM R R1

WHERE R1.A NOT IN (SELECT * FROM S))

Essentially R − (R − S)

Certain answer: ∅ SQL answer: 1

SEBD, June 2015 sql, nulls, & certain answers 11/42

Page 20: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

What we do

Identify the source of the problem:

◮ SQL’s evaluation is based on 3-valued logic

◮ Some of the rules for handling true, false, and unknown are quitearbitrary.

SEBD, June 2015 sql, nulls, & certain answers 12/42

Page 21: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

What we do

Identify the source of the problem:

◮ SQL’s evaluation is based on 3-valued logic

◮ Some of the rules for handling true, false, and unknown are quitearbitrary.

We show that a slight fix of the rules avoids false positives.

SEBD, June 2015 sql, nulls, & certain answers 12/42

Page 22: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

What we do

Identify the source of the problem:

◮ SQL’s evaluation is based on 3-valued logic

◮ Some of the rules for handling true, false, and unknown are quitearbitrary.

We show that a slight fix of the rules avoids false positives.

Idea of the fix: be faithful to 3-valuedness and classify answers not into(certain, the rest) but rather:

certainly true — certainly false — unknown

SEBD, June 2015 sql, nulls, & certain answers 12/42

Page 23: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Evaluation procedures for first-order queries

Given a database D, a query Q(x), a tuple a

Eval(D,Q(a)) ∈ set of truth values

◮ 2-valued logic: truth values are 1 (true) and 0 (false)

◮ 3-valued logic: 1, 0, and 12 (unknown)

Meaning: if Eval(D,Q(a)) evaluates to

◮ 1, we know a ∈ Q(D)

◮ 0, we know a 6∈ Q(D)

◮12 , we don’t know whether a ∈ Q(D) or a 6∈ Q(D)

SEBD, June 2015 sql, nulls, & certain answers 13/42

Page 24: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Not reinventing the wheel...

All evaluation procedures are completely standard for ∨,∧,¬,∀,∃:

Eval(D,Q ∨ Q ′) = max(Eval(D,Q),Eval(D,Q ′))

Eval(Q ∧ Q ′,D) = min(Eval(D,Q),Eval(D,Q ′))

Eval(D,¬Q) = 1 − Eval(D,Q)

Eval(D,∃x Q(x , a)) = max{Eval(D,Q(a′, a)) | a′ ∈ adom(D)}

Eval(D,∀x Q(x , a)) = min{Eval(D,Q(a′, a)) | a′ ∈ adom(D)}

SEBD, June 2015 sql, nulls, & certain answers 14/42

Page 25: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

FO evaluation procedure

We only need to give rules for atomic formulae.

EvalFO(D,R(a)) =

{

1 if a ∈ R

0 if a 6∈ R

EvalFO(D, a = b) =

{

1 if a = b

0 if a 6= b

SEBD, June 2015 sql, nulls, & certain answers 15/42

Page 26: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

SQL evaluation procedure

All that changes is the rule for comparisons.

We write Null(a) if a is a null and NotNull(a) if it is not.

EvalSQL(D, a = b) =

1 if a = b and NotNull(a, b)

0 if a 6= b and NotNull(a, b)12 if Null(a) or Null(b)

SQL’s rule: if one attribute of a comparison is null, the result is unknown.

SEBD, June 2015 sql, nulls, & certain answers 16/42

Page 27: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

What’s wrong with it?

We are too eager to say no or unknown.

If we say no to a result that ought to be unknown, when negationapplies, no becomes yes! And that’s how false positives creep in.

Consider R =A B

1 null

What about (null, null) ∈ R?

SQL says no but correct answer is unknown: what if null is really 1?

SEBD, June 2015 sql, nulls, & certain answers 17/42

Page 28: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

A word about the model

We go a bit beyond SQL in fact — marked nulls.

Semantics: missing values (aka closed world semantics)

A B C

1 2 ⊥1

⊥2 ⊥1 3

⊥3 5 1

2 ⊥3 3

SEBD, June 2015 sql, nulls, & certain answers 18/42

Page 29: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

A word about the model

We go a bit beyond SQL in fact — marked nulls.

Semantics: missing values (aka closed world semantics)

A B C

1 2 ⊥1

⊥2 ⊥1 3

⊥3 5 1

2 ⊥3 3

h(⊥1) = 4h(⊥2) = 3h(⊥3) = 5

=⇒

A B C

1 2 4

3 4 3

5 5 1

2 5 3

SEBD, June 2015 sql, nulls, & certain answers 18/42

Page 30: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

A word about the model

We go a bit beyond SQL in fact — marked nulls.

Semantics: missing values (aka closed world semantics)

A B C

1 2 ⊥1

⊥2 ⊥1 3

⊥3 5 1

2 ⊥3 3

h(⊥1) = 4h(⊥2) = 3h(⊥3) = 5

=⇒

A B C

1 2 4

3 4 3

5 5 1

2 5 3

SQL model: a special case when all nulls are distinct.

SEBD, June 2015 sql, nulls, & certain answers 18/42

Page 31: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Correctness: what do we know?

Eval(Q,D) = {a | Eval(D,Q(a)) = 1}

We want at least simple correctness guarantees

constant tuples in Eval(Q,D) ⊆ certain(Q,D)

where certain(Q,D) is the standard definition of certain answers:

certain(Q,D) =⋂

{Q(D ′) | D ′ = h(D) for a valuation h}

SEBD, June 2015 sql, nulls, & certain answers 19/42

Page 32: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Correctness: what do we know?

Eval(Q,D) = {a | Eval(D,Q(a)) = 1}

We want at least simple correctness guarantees

constant tuples in Eval(Q,D) ⊆ certain(Q,D)

where certain(Q,D) is the standard definition of certain answers:

certain(Q,D) =⋂

{Q(D ′) | D ′ = h(D) for a valuation h}

Sometimes we want/get even more:

constant tuples in Eval(Q,D) = certain(Q,D)

SEBD, June 2015 sql, nulls, & certain answers 19/42

Page 33: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Languages for correctness

UCQ: unions of conjunctive queries, or positive relational algebraπ, σ, ⋊⋉,∪.

FOcertain — UCQs extended with the division operator ÷

◮ but only Q ÷ R queries

◮ meaning: find tuples a that occur in Q(D) together with every tupleb in R

For FOcertain queries,

constant tuples in EvalFO(Q,D) = certain(Q,D)

For UCQs,

constant tuples in EvalSQL(Q,D) ⊆ certain(Q,D)

SEBD, June 2015 sql, nulls, & certain answers 20/42

Page 34: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Towards a good evaluation: unifying tuples

Two tuples t1 and t2 unify if there is a mapping h of nulls to constantssuch that h(t1) = h(t2).

( 1 ⊥ 1 3 )( ⊥′ 2 ⊥′ 3 )

=⇒ ( 1 2 1 3 )

but( 1 ⊥ 2 3 )( ⊥′ 2 ⊥′ 3 )

do not unify.

This can be checked in linear time.

SEBD, June 2015 sql, nulls, & certain answers 21/42

Page 35: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Proper 3-valued procedure

Eval3v(D,R(a)) =

1 if a ∈ R

0 if a does not unify with any tuple in R12 otherwise

Eval3v(D, a = b) =

1 if a = b

0 if a 6= b and NotNull(a, b)12 otherwise

SEBD, June 2015 sql, nulls, & certain answers 22/42

Page 36: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Simple correctness guarantees: no false positives

If a is a tuple without nulls, and Eval3v(D,Q(a)) = 1 thena ∈ certain(Q,D).

Simple correctness guarantees:

constant tuples in Eval3v(Q,D) ⊆ certain(Q,D)

Thus:

◮ Fast evaluation (checking Eval3v(D,Q(a)) = 1 in DLOGSPACE)

◮ Correctness guarantees: no false positives

SEBD, June 2015 sql, nulls, & certain answers 23/42

Page 37: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Strong correctness guarantees: involving nulls

How can we give correctness guarantees for tuples with nulls? By anatural extension of the standard definition.

A tuple without nulls a is a certain answer if

a ∈ Q(h(D)) for every valuation h of nulls.

SEBD, June 2015 sql, nulls, & certain answers 24/42

Page 38: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Strong correctness guarantees: involving nulls

How can we give correctness guarantees for tuples with nulls? By anatural extension of the standard definition.

A tuple without nulls a is a certain answer if

a ∈ Q(h(D)) for every valuation h of nulls.

An arbitrary tuple a is a certain answers with nulls if

h(a) ∈ Q(h(D)) for every valuation h of nulls.

Notation: certain⊥(Q,D)

SEBD, June 2015 sql, nulls, & certain answers 24/42

Page 39: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Certain answers with nulls: properties

certain(Q,D) ⊆ certain⊥(Q,D) ⊆ EvalFO(Q,D)

Moreover:

◮ certain(Q,D) is the set of null free tuples in certain⊥(Q,D)

◮ certain⊥(Q,D) = EvalFO(Q,D) for FOcertain queries

SEBD, June 2015 sql, nulls, & certain answers 25/42

Page 40: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Correctness with nulls: strong guarantees

◮ D – a database,

◮ Q(x) – a first-order query

◮ a – a tuple of elements from D.

Then:

◮ Eval3v(D,Q(a)) = 1 =⇒ a ∈ certain⊥(Q,D)

◮ Eval3v(D,Q(a)) = 0 =⇒ a ∈ certain⊥(¬Q,D)

3-valuedness extended to answers: certainly true, certainly false, don’tknow.

SEBD, June 2015 sql, nulls, & certain answers 26/42

Page 41: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

When things are easy: UCQs with inequalities

Follow the two-valued standard FO procedure, and add one rule:

EvalUCQ6=(D, a 6= b) =

{

1 if a 6= b and NotNull(a, b)

0 otherwise

Then, for UCQs with inequalities

Eval3v(D,Q(a)) = 1 ⇔ EvalUCQ6=(D,Q(a)) = 1

Corollary: correctness guarantees for EvalUCQ 6= over UCQs withinequalities.

SEBD, June 2015 sql, nulls, & certain answers 27/42

Page 42: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Open world assumption (OWA) semantics

Tuples can be added:

A B C

1 2 ⊥1

⊥2 ⊥1 3

⊥3 5 1

2 ⊥3 3

SEBD, June 2015 sql, nulls, & certain answers 28/42

Page 43: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Open world assumption (OWA) semantics

Tuples can be added:

A B C

1 2 ⊥1

⊥2 ⊥1 3

⊥3 5 1

2 ⊥3 3

h(⊥1) = 4h(⊥2) = 3h(⊥3) = 5

=⇒

A B C

1 2 4

3 4 3

5 5 1

2 5 3

7 8 9

17 18 19

SEBD, June 2015 sql, nulls, & certain answers 28/42

Page 44: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Open world assumption (OWA) semantics

Tuples can be added:

A B C

1 2 ⊥1

⊥2 ⊥1 3

⊥3 5 1

2 ⊥3 3

h(⊥1) = 4h(⊥2) = 3h(⊥3) = 5

=⇒

A B C

1 2 4

3 4 3

5 5 1

2 5 3

7 8 9

17 18 19

Observation: Eval3v doesn’t work under OWA

SEBD, June 2015 sql, nulls, & certain answers 28/42

Page 45: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

OWA: problems

◮ We can never be sure that a tuple is not in a relation

◮ We can never be sure a universal ∀ query holds

◮ We can never be sure an existential ∃ query does not hold.

Solution: make the result of evaluation 12 in the worst case, forget 0

SEBD, June 2015 sql, nulls, & certain answers 29/42

Page 46: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

OWA: solution

Evalowa3v (D,R(a)) =

{

1 if a ∈ R12 otherwise

Evalowa3v (D,∃xϕ(x , a)) = max

{

12 , max

a′∈adom{Evalowa

3v (D, ϕ(a′, a)}}

Evalowa3v (D,∀xϕ(x , a)) = min

{

12 , min

a′∈adom{Evalowa

3v (D, ϕ(a′, a)}}

Evalowa3v has correctness guarantees under OWA.

SEBD, June 2015 sql, nulls, & certain answers 30/42

Page 47: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Relational algebra queries

This is how it will be implemented after all.

Why not

Relational algebra Q =⇒ equivalent FO ϕ =⇒ Eval3v(D, ϕ)?

Because the algebra-to-calculus translation works in the 2-valued worldand doesn’t provide 3-valued guarantees.

Also we become dependent on a particular translation.

So we need to work directly on relational algebra queries.

SEBD, June 2015 sql, nulls, & certain answers 31/42

Page 48: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

3-valued implementation of RA with correctness guarantees

Classify tuples into:

◮ certainly true

◮ certainly false

◮ don’t know

To do this, define a translation

Q 7→ (Q+,Q−)

with certainty guarantees, i.e.

Q+(D) ⊆ certain⊥(Q,D) Q−(D) ⊆ certain⊥(Q,D)

SEBD, June 2015 sql, nulls, & certain answers 32/42

Page 49: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Relational algebra translations: basic rules

For a relation R :

◮ R+ = R

◮ R− = {t | t doesn’t unify with anything in R} ⊆ R

For union (intersection is dual)

◮ (Q1 ∪ Q2)+ = Q+

1 ∪ Q+2

◮ (Q1 ∪ Q2)− = Q−

1 ∩ Q−

2

For difference:

◮ (Q1 − Q2)+ = Q+

1 ∩ Q−

2

◮ (Q1 − Q2)− = Q−

1 ∪ Q+2

SEBD, June 2015 sql, nulls, & certain answers 33/42

Page 50: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Slightly trickier rules

Cartesian product:

◮ (Q1 × Q2)+ = Q+

1 × Q+2

◮ (Q1 × Q2)− = Q−

1 × adomarity(Q2) ∪ adomarity(Q1) × Q−

2

Projection:

◮ (πααα(Q))+ = πααα(Q+)

◮ (πααα(Q))− = πααα(Q−) − πααα(adomarity(Q) − Q−)

SEBD, June 2015 sql, nulls, & certain answers 34/42

Page 51: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

The last bit: selection

Translate conditions θ 7→ θ∗:

◮ (A = B)∗ = (A = B).

◮ (A = const)∗ = (A = const).

◮ (A 6= B)∗ = (A 6= B) ∧ NotNull(A,B).

◮ (A 6= const)∗ = (A 6= const) ∧ NotNull(A).

◮ (θ1 ∨ θ2)∗ = θ∗1 ∨ θ∗2.

◮ (θ1 ∧ θ2)∗ = θ∗1 ∧ θ∗2.

Translate selections:

◮ (σθ(Q))+ = σθ∗(Q+)

◮ (σθ(Q))− = Q− ∪ σ(¬θ)∗(adomarity(Q))

SEBD, June 2015 sql, nulls, & certain answers 35/42

Page 52: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Correctness and efficiency

Translations have correctness guarantees.

SEBD, June 2015 sql, nulls, & certain answers 36/42

Page 53: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Correctness and efficiency

Translations have correctness guarantees.

Problem Some of the rules are not very efficient:

(σθ(Q))− = Q− ∪ σ(¬θ)∗(adomarity(Q))

(Q1 × Q2)− = Q−

1 × adomarity(Q2) ∪ adomarity(Q1) × Q−

2

(πααα(Q))− = πααα(Q−) − πααα(adomarity(Q) − Q−)

SEBD, June 2015 sql, nulls, & certain answers 36/42

Page 54: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Efficiency: solution

We defined a family of translations.

Taking a smaller, with respect to containment, query preservescorrectness guarantees.

For example:

(σθ(Q))− = Q−

(Q1 × Q2)− = Q−

1 × Q−

2

(πααα(Q))− = ∅

SEBD, June 2015 sql, nulls, & certain answers 37/42

Page 55: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

Some extensions

◮ A dual approach: no false negatives◮ just take negation twice

◮ Other derived useful operations of relational algebra◮ especially for implementing correlated subqueries◮ semi-join, anti-join◮ explain why SQL’s handling of EXCEPT and nested queries for

implementing difference is not the same

SEBD, June 2015 sql, nulls, & certain answers 38/42

Page 56: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

What is so special about 3VL?

Is it the only possibility?

Perhaps we can find a smarter procedure based on the usual two-valuedlogic that has correctness and efficiency guarantees?

If not, can there be other many-valued logics that give us correctness?And if so, what makes them such?

SEBD, June 2015 sql, nulls, & certain answers 39/42

Page 57: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

We need ≥ 3 values. But there are many choices.

(new results with Marco Console and Paolo Guagliardo)

SEBD, June 2015 sql, nulls, & certain answers 40/42

Page 58: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

We need ≥ 3 values. But there are many choices.

(new results with Marco Console and Paolo Guagliardo)

2 values do not work.

◮ every procedure based on Boolean 2VL gives both false positivesand false negatives.

SEBD, June 2015 sql, nulls, & certain answers 40/42

Page 59: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

We need ≥ 3 values. But there are many choices.

(new results with Marco Console and Paolo Guagliardo)

2 values do not work.

◮ every procedure based on Boolean 2VL gives both false positivesand false negatives.

Lots of many-valued logics will work.

◮ What makes it work: monotonicity of ∧,∨,¬

◮ Many-valued logics have the truth ordering 0 ≤ 12 ≤ 1 and the

knowledge ordering 12 � 0 and 1

2 � 1

◮ ∧,∨,¬ are monotone with respect to �

◮ Every such many-valued logic can be lifted to an efficient andcorrect procedure for all relational calculus queries.

SEBD, June 2015 sql, nulls, & certain answers 40/42

Page 60: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

We need ≥ 3 values. But there are many choices.

(new results with Marco Console and Paolo Guagliardo)

2 values do not work.

◮ every procedure based on Boolean 2VL gives both false positivesand false negatives.

Lots of many-valued logics will work.

◮ What makes it work: monotonicity of ∧,∨,¬

◮ Many-valued logics have the truth ordering 0 ≤ 12 ≤ 1 and the

knowledge ordering 12 � 0 and 1

2 � 1

◮ ∧,∨,¬ are monotone with respect to �

◮ Every such many-valued logic can be lifted to an efficient andcorrect procedure for all relational calculus queries.

3VL. The logic of choice for most – but not all – applications.

SEBD, June 2015 sql, nulls, & certain answers 40/42

Page 61: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

What’s next

Implement it.

◮ Theoretical complexity is good but too many cartesian products.

◮ Hence approximations.

◮ The price of correctness: it will be slower, but how much slower?

◮ Handle at the level of algebra? SQL?

SEBD, June 2015 sql, nulls, & certain answers 41/42

Page 62: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

References

◮ A. Gheerbrant, L. Libkin, C. Sirangelo. Naıve evaluation of queriesover incomplete databases. ACM TODS 39(4): (2014)

◮ (when EvalFO works as-is)

◮ L Libkin, Certain answers as objects and knowledge. KR 2014◮ (a framework for providing correctness guarantees)

◮ L. Libkin. SQL’s three-valued logic and certain answers. ICDT 2015:94-109.

◮ (the new 3VL procedure with correctness guarantees)

◮ M. Console, P. Guagliardo, L. Libkin. Computing certain answersefficiently with many-valued logic.

◮ (new results on many-valued logics)

SEBD, June 2015 sql, nulls, & certain answers 42/42

Page 63: SQL's Handling of Nulls: Can It Be Fixed?€¦ · Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions Problematic SQL’s nulls “... this topic cannot

Intro SQL/CA Evaluation Correctness Relational algebra Role of 3VL Conclusions

References

◮ A. Gheerbrant, L. Libkin, C. Sirangelo. Naıve evaluation of queriesover incomplete databases. ACM TODS 39(4): (2014)

◮ (when EvalFO works as-is)

◮ L Libkin, Certain answers as objects and knowledge. KR 2014◮ (a framework for providing correctness guarantees)

◮ L. Libkin. SQL’s three-valued logic and certain answers. ICDT 2015:94-109.

◮ (the new 3VL procedure with correctness guarantees)

◮ M. Console, P. Guagliardo, L. Libkin. Computing certain answersefficiently with many-valued logic.

◮ (new results on many-valued logics)

Questions?

SEBD, June 2015 sql, nulls, & certain answers 42/42