66
[email protected]. edu Co-verification Experience Juncao Li System Verification Lab Computer Science, PSU 01/05/2010

Co-verification Experience Juncao Li System Verification Lab Computer Science, PSU 01/05/2010

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Co-verification Experience

Juncao LiSystem Verification LabComputer Science, PSU

01/05/2010

[email protected]

Agenda

• Overview• Sealevel PIO-24 Digital I/O Card• Intel 100M PCI Ethernet Adapter

System Verification Lab @CS, Portland State University 2

[email protected]

PIO-24 Digital I/O Card

• Driver code and device manual links– http://www.osronline.com/article.cfm?article=403– https://www.sealevel.com/uploads/manuals/8018S.pdf

• Driver size: 1724 lines of C code• Device model size: 1232 lines of C code• Issues found in the specification (device manual):

2• Bugs found in the driver: 4• Properties proved in the driver: 2

System Verification Lab @CS, Portland State University 3

[email protected]

Intel 100M PCI Ethernet adapter

• Driver code and device manual links– http://msdn.microsoft.com/en-us/library/dd163298.aspx

• The source code is available in WDK releases.• E.g., the SLAM SD one is in “WDK\src_6001\kmdf\pcidrv”

– http://download.intel.com/design/network/manuals/8255X_OpenSDM.pdf

• Driver size: 14406 lines of C code• Device model size: 3518 lines of C code• Issues found in the specification (device manual):

6• Bugs found in the driver: 3• Properties proved in the driver: 5

System Verification Lab @CS, Portland State University 4

[email protected]

Agenda

• Overview• Sealevel PIO-24 Digital I/O Card• Intel 100M PCI Ethernet adapter

System Verification Lab @CS, Portland State University 5

[email protected]

Specification Issues

• Hardware/Software (HW/SW) interface specifications (manuals) usually have problems:– Incompleteness: what should be clarified is

missing– Inconsistency: multi-places do not consistent

with each other

• Spec issues are found when – We model hardware devices according to the

specs– We verify drivers with the hardware models System Verification Lab @CS, Portland State University 6

[email protected]

Specification Issues

• Issue 1 (Incompleteness): – Location: Page 11, Section: Interrupt Read.– Problem:

• The default value of the register ISRQST1 is not specified. ISRQST1 indicates the interrupt pending status and “should” be 0 when the device is powering on.

– This problem is related to the bug “ProperISR2”

• Issue 2 (Inconsistency): – Location: Page 11, Section: I/O Control Code.– Problem:

• The CW1D1 bit of the Control Word register is never used, but the bit is not defined as “Not Used” in the Section, Register Description, at Page 10.

System Verification Lab @CS, Portland State University 7

[email protected]

Verification Statistics

System Verification Lab @CS, Portland State University 8

[email protected]

Bug1: InvalidRead

• Driver will not read any invalid data• About the bug:

– Driver returns an invalid value to user applications without reading any hardware data register

System Verification Lab @CS, Portland State University 9

Initialize the device model

Test harness entry

Predicates, i.e. HW/SW states

HW state variable: no interrupt pending

HW state variable: interrupt enabled

SW state variable, the inconsistency of the two variables directly causes the bug

Entry the callback function:EvtIoDeviceControl

In the callback function

Process the I/O control code

I/O control codeDriver understands “else” as: interrupt is enabled in hardware

HW runs after this statement, where CurrentRequest and AwaitingInt

become inconsistent

The HW Transaction Function

In Hardware

Simulate the environment or the DIO device behaviors

In RunInterrupt

Fires an interrupt

Call the ISR

In the ISR

In Software

If it is this driver’s interrupt

Because the driver routine EvtIoDeviceControl was interrupted, AwaitingInt is

not “TRUE” yet

“data” is not read from the device

Schedule the DPC. This immediately violates the

rule: “CvIsrCallDpc” (see later slides)

ISR returns, EvtIoDeviceControl continues

EvtIoDeviceControl returns

DPC runs

CurrentReqest is not NULL, so prepare the data for the

user application!

The I/O request is completed with STATUS_SUCCESS, but the data was never

read from the hardware register!

[email protected]

Bug2: CvIsrCallDpc

• Before ISR schedules a DPC, the ISR should read hardware volatile registers first

• About the bug: – Driver does not read the hardware data register but still

requests the DPC

• This bug is already illustrated in the error trace of the “InvalidRead” bug– The actual error trace of the CvIsrCallDpc bug is different

because we used another life circle harness and this bug can happen at various places.

System Verification Lab @CS, Portland State University 26

[email protected]

However …

• InvalidRead and CvIsrCallDpc are different rules– Switch the two lines (in DioEvtDeviceControl) can fix the

“InvalidRead” bug:devContext->CurrentRequest = Request;devContext->AwaitingInt = TRUE;

– To fix the “CvIsrCallDpc” bugWe need to move the WdfInterruptQueueDpcForIsr(...) call into the “if(devContext->AwaitingInt) { ... }” block in ISR.

– These fixes have been proved correct by SDV (CoVer).

System Verification Lab @CS, Portland State University 27

[email protected]

Bug3: ProperISR1

• If ISR returns TURE (i.e. acknowledge the interrupt to OS), the device interrupt status should not be active; otherwise, it may cause the interrupt storm.

• There are two scenarios that can cause this bug: – When the interrupt firing condition is configured as “high

level” (resp. “low level”), the interrupt should be repeatedly fired if the input to Port A (least significant bit) stays high (reps. low)

– When the condition is “rising edge” (resp. falling edge), depending on the input frequency to Port A, the interrupt will be repeatedly fired

• Both of the scenarios can cause interrupt stormSystem Verification Lab @CS, Portland State University 28

In ISR

HW state: interrupt fired

Read the interrupt status and clear the register at

the same time

In hardware, interface event function

Clear the interrupt status register on read

HW transaction function runs after the interrupt status register has being cleared

Low level fires the interrupt

The SLIC rule

[email protected]

About This Bug

• We learned a solution from the Ethernet adapter driver:– If the device may fire interrupts freqently– Disable the interrupt first in ISR and enable it later in

DPC after currect request has been serviced

System Verification Lab @CS, Portland State University 35

[email protected]

Bug4: ProperISR2

• ISR should not return TRUE if the interrupt of the device is not active, i.e. it is not this driver's interrupt.

• About the bug: – ISR acknowledges an interrupt even when the interrupt

of the device is disabled– Related to spec issue 1: ISRQST1 doesn’t have a default

value– A scenario for this bug:

• ISRQST1 is initialized to be 1 during the device power-on• OS registers the DIO driver’s ISR to the PCI bus’ interrupt vector

table • Another PCI device fires an interrupt• The DIO driver may have an “opportunity” to acknowledge this

interrupt System Verification Lab @CS, Portland State University 36

[email protected]

Bug4: ProperISR2

• The causes of this bug:– The default value of ISRQST1 is not clearly stated in the

spec– The driver doesn’t reset the device during the device

power-on– The ISR only checks the ISRQST1 register to decide if

there is an interrupt pending (this bug can be avoided, if the interrupt-enabled register is also checked)

• I have to admit, the chance for this bug seems quite low– However, when it happens, no one will notice!

System Verification Lab @CS, Portland State University 37

[email protected]

Agenda

• Overview• Sealevel PIO-24 Digital I/O Card• Intel 100M PCI Ethernet adapter

System Verification Lab @CS, Portland State University 38

[email protected]

Spec Issues – Some Examples

• Issue 1 (Inconsistency): – Location: Page 38 - 39– Problem:

• Table 15 is inconsistent with Table 14 about the types of CU commands.

• Issue 2 (Inconsistency): – Location: Page 136, Section: 8.2 Transmit

Processing– Problem:

System Verification Lab @CS, Portland State University 39

The word “previous” is missing here, otherwise the logic is wrong

[email protected]

Verification Statistics

System Verification Lab @CS, Portland State University 40

[email protected]

Bug1: DoubleCUC

• The driver should not issue a CU command while the Command Unit (CU) is busy (not zero)– This is clearly stated in the device manual (specification)– Spec Page 37, Section: 6.3.2.2 SCB Command Word

• About the bug: – Driver issues a command to CU regardless the result of

the previous device operation (software reset)

System Verification Lab @CS, Portland State University 41

In EvtDeviceD0Entry callback function

HW states are non-deterministically initialized

Issue a software reset

Write to the “PORT” register to issue the

command

Inside the function HwSoftwareReset

In hardware model

Start the software reset process

Wait for the port reset to complete.

In software

Note: the device manual doesn’t promise that in how long a port reset can complete, so wait is NOT enough!

Issue a command when the reset process is not finished

Out from software reset.

Issue a command without waiting for the CU free

The SLIC rule

[email protected]

About this bug

• I was not sure at first• So I checked the Linux driver of the

same device– http://sourceforge.net/projects/e1000/– The driver is also found in (for example):

• linux-2.6.32/drivers/net/e100.c

System Verification Lab @CS, Portland State University 50

Let’s see how the Linux driver handles “software reset”.

It makes sure that the device is in correct state before issuing any command.

The Linux driver always waits before issuing a new command

[email protected]

Bug2: DevD0Entry

• The callback function EvtDeviceD0Entry returns the value (TRUE or FALSE) that correctly represents the hardware state (initialized or failed)– This rule is clearly stated in MSDN

• About the bug: – Driver returns STATUS_SUCCESS even if the operations

on the device have failed– The error trace illustrates that the driver continues its

attempts to initialize the device even after the previous operations have failed

System Verification Lab @CS, Portland State University 53

In EvtDeviceD0Entry callback function

In NICInitializeAdapter

The command is timeout. Failure starts here

Return the failure status

Do some work and return the failure status that is

returned from NICInitializeAdapter()

The return value of MPSetPowerD0Private() is ignored. The initialization

process goes on as if nothing happened

EvtDeviceD0Entry returns STATUS_SCCESS while the device state is a mess

[email protected]

• Once again, how about the Linux driver?

System Verification Lab @CS, Portland State University 61

[email protected]

Corresponding to EvtDeviceD0Entry

System Verification Lab @CS, Portland State University 62

[email protected]

Bug3: DevD0Exit

• The callback EvtDeviceD0Exit returns the value (TRUE or FALSE) that can correctly represent the hardware state (i.e. if the hardware has been properly stopped).– This rule is clearly stated in MSDN

• About the bug: – Driver returns STATUS_SUCCESS even if the operations

on the device have failed.

System Verification Lab @CS, Portland State University 64

[email protected]

Reference• Juncao Li, Fei Xie, Thomas Ball, Vladimir Levin, and Con McGarvey. An Automata-Theoretic

Approach to Hardware/Software Co-verification. To appear in Proc. of International Conference on Foundational Approaches to Software Engineering (FASE)

– Link: http://web.cecs.pdx.edu/~juncao/links/mypapers/cover2010.pdf

System Verification Lab @CS, Portland State University 65

[email protected]

System Verification Lab @CS, Portland State University 66

Questions ?

[email protected]