186
1 Chapter 4 Interrupts and Excep tions Chapter 4 Interrupts and Exceptions

Chap4.Interrupts

Embed Size (px)

Citation preview

Page 1: Chap4.Interrupts

1Chapter 4 Interrupts and Exceptions

Chapter 4Interrupts and Exceptions

Page 2: Chap4.Interrupts

2Chapter 4 Interrupts and Exceptions

Introduction An interrupt is an event that alters the

sequence of instructions executed by a processor In corresponding to electrical signals generated

by HW circuits both inside and outside CPU Interrupts: asynchronous interrupts

Generated by HW devices (e.g., internal timers and I/O devices) at arbitrary times

Exceptions: synchronous interrupts Produced by CPU control unit only after

completion of an executing instruction E.g., divide-by-0, page faults

Page 3: Chap4.Interrupts

3Chapter 4 Interrupts and Exceptions

Role of Interrupt/Exception Signals

When an interrupt/exception signal occurs, CPU Saves current process status (eip and cs) in t

he Kernel Mode stack Places addr of IH into program counter

The code executed in IH is not a process It is a kernel control path that runs on behal

f of the same process

Page 4: Chap4.Interrupts

4Chapter 4 Interrupts and Exceptions

Interrupt/Exception Handler Requirements As short as possible

Deferring as much processing as it can E.g., A block of data arrives on a network line Top-half vs. bottom-half

Nested interrupt handling Should be allowed as much as possible to keep I/O

devices busy Interrupt handlers in Linux need not to be reentrant

When an IH is executing, the corresponding interrupt line is masked out on all processors

The same IH is never invoked concurrently to service a nested interrupt

Maskable interrupts Some critical regions will not allow interrupts Be limited as much as possible

Page 5: Chap4.Interrupts

5Chapter 4 Interrupts and Exceptions

Interrupts and Exceptions

Page 6: Chap4.Interrupts

6Chapter 4 Interrupts and Exceptions

Interrupts Definition

Maskable interrupts All IRQ issued by I/O devices Can be in 2 states: masked or unmasked

Nonmaskable interrupts Critical events such as HW failures Always recognized by CPU

Page 7: Chap4.Interrupts

7Chapter 4 Interrupts and Exceptions

Exceptions Definition Processor-detected exceptions: when CPU detects an

omalous condition while executing an instruction Faults: The saved eip is the addr of the instruction causing fau

lt re-execute same inst after IH Usage: e.g. page fault handler

Traps: saved eip is the addr of inst after the one causing traps Main usage: debugging purpose (e.g. reaching a breakpoint)

Aborts: a serious error that may be unable to determine exact inst causing this error terminate affected process

Programmed exceptions: occur at the request of programmer Triggered by int, int3, into, bound instructions Handled by control unit as traps Often called SW interrupts Usage: to implement system calls and to notify a debugger of

a specific event

Page 8: Chap4.Interrupts

8Chapter 4 Interrupts and Exceptions

Interrupt or Exception Vector

Each interrupt or exception is identified by a number from 0 to 255 Such a number is called its vector

The vectors of nonmaskable interrupts and exceptions are fixed

Maskable interrupts can be altered by programming the Interrupt Controller

Page 9: Chap4.Interrupts

9Chapter 4 Interrupts and Exceptions

IRQs Each HW device controller capable of issuing

interrupts has an output line IRQ All existing IRQ lines are connected to the input

pins of the Interrupt Controller Interrupt Controller (IC) executes

Monitoring IRQ lines, checking for raised signals If a raised signal is detected on an IRQ line

1. Converts signal into a corresponding vector2. Stores vector in an IC I/O port, for CPU to read3. Sends a signal to CPU’s INTR pin (i.e., issues an interrupt)4. CPU recognizes and writes one of Programmable Interrupt

Controller (PIC) I/O ports5. Clear INTR line

Go back to monitoring step

Page 10: Chap4.Interrupts

10Chapter 4 Interrupts and Exceptions

IRQn

Device 1 Device 2

PIC IRQn_interrupt()

do_IRQ(n)

Interrupt serviceroutine 1

Interrupt serviceroutine 2

INT IDT[32+n]

I/O Interrupt HandlingSOFTWARE

(Interrupt Handler)HARDWARE

Page 11: Chap4.Interrupts

11Chapter 4 Interrupts and Exceptions

IRQ Lines The first IRQ line is IRQ0

The # of available IRQ lines is limited to 15 for now Intel default vector for IRQn = n + 32

Mapping between IRQs and vectors can be modified by suitable I/O insts to IC ports

PIC can be told to stop issuing interrupts referring to a given IRQ line Disabled interrupts are not lost but delayed

Selective enabling/disabling IRQs is not the same as global masking/unmasking interrupts When IF flag of eflags register is clear maskable i

nterrupts are temporarily ignored by CPU

Page 12: Chap4.Interrupts

12Chapter 4 Interrupts and Exceptions

Homework Practice

How do you find out your Linux PC IRQ assignment? Ans: go to /proc/interrupts

Page 13: Chap4.Interrupts

13Chapter 4 Interrupts and Exceptions

Exceptions

80x86 issues ~20 different exceptions Each exception type is associated with a

dedicated exception handler For some exceptions, CPU also generates a

HW error code and pushes it in Kernel Mode stack before jumping to exception handler

An exception handler usually sends a Unix signal to the process

Exceptions 20-31 are reserved by Intel

Page 14: Chap4.Interrupts

14Chapter 4 Interrupts and Exceptions

Interrupt Vectors

IRQ vector assignment Vector assignment range: 32-238 128 is reserved for system call exception

Vector range Use

0-19 (0x0 – 0x13) Nonmaskable interrupts and exceptions

20-31 (0x14 – 0x1f) Intel-reserved

32-127 (0x20 – 0x7f) External interrupts (IRQs)128 (0x80) System call exception

129-238 (0x81 – 0xee)

External interrupts (IRQs)

239 (0xef) Local APIC timer interrupt

240-250 (0xf0 – 0xfa)

Reserved by Linux for future use

251 – 255 (0xfb – 0xff)

Interprocessor interrupts

Page 15: Chap4.Interrupts

15Chapter 4 Interrupts and Exceptions

# Exception Handler Signal

0 Divide error divide_error() SIGFPE

1 Debug debug() SIGTRAP

2 NMI nmi() None

3 Breakpoint int3() SIGTRAP

4 Overflow overflow() SIGSEGV

5 Bounds check bounds() SIGSEGV

6 Invalid opcode invalid_op() SIGILL

7 Device not available

device_not_available() SIGSEGV

8 Double fault double_fault() SIGSEGV

9 Coprocessor segment overrun

coprocessor_segment_overrun()

SIGFPE

Page 16: Chap4.Interrupts

16Chapter 4 Interrupts and Exceptions

# Exception Handler Signal

10 Invalid TSS invalid_tss() SIGSEGV

11 Segment not present

segment_not_present() SIGBUS

12 Stack exception stack_segment() SIGBUS

13 General protection

general_protection() SIGSEGV

14 Page Fault page_fault() SIGSEGV

15 Intel reserved None None

16 Floating-point error

coprocessor_error() SIGFPE

17 Alignment check alignment_check() SIGBUS

18 Machine check machine_check() None

19 SIMD floating point

simd_coprocessor_error()

SIGFPE

Page 17: Chap4.Interrupts

17Chapter 4 Interrupts and Exceptions

Review Slide Interrupts? Exceptions? Interrupt handler? Requirements? Maskable vs. nonmaskable interrupts? Processor-detected exceptions?

Faults, traps, aborts Programmed exceptions?

SW interrupts? Interrupt vector? Range? Vector assignment? Interrupt controller processing steps?

Page 18: Chap4.Interrupts

18Chapter 4 Interrupts and Exceptions

Review Slide Intel default vector for IRQn? Disabled interrupts? Masked interrupts? Number of exceptions defined for Intel? Homework #3: User-mode vs. kernel-mo

de stack Required for EOS new students Optional for others. Not graded. 忠毅 : please present your report next week

Page 19: Chap4.Interrupts

19Chapter 4 Interrupts and Exceptions

Interrupt Descriptor Table

Page 20: Chap4.Interrupts

20Chapter 4 Interrupts and Exceptions

Interrupt Descriptor Table IDT associates each interrupt (exception) vector with one interru

pt handler IDT must be properly initialized before kernel enable interrup

ts Each entry in IDT is 8 bytes descriptor

A maximum of 256x8 = 2048 bytes are required to store IDT The register idtr stores base addr of IDT The P bit indicates whether it is currently in memory 3 types of descriptors in IDT (40-43 bits)

Task Gate (Linux does not use it) Interrupt Gate: before jumping to proper segment, CPU clears

IF flag disabling maskable interrupts Trap Gate: before jumping to proper segment, CPU does not

modify IF flag

Page 21: Chap4.Interrupts

21Chapter 4 Interrupts and Exceptions

RESERVED PDPL

0 01 0 1

TSS SEGMENT SELECTOR RESERVED

RESERVED

Task Gate Descriptor

OFFSET(16-31) PDPL

0 11 1 0

SEGMENT SELECTOR OFFSET(0-15)

RESERVED

Interrupt Gate Descriptor

OFFSET(16-31)

SEGMENT SELECTOR OFFSET(0-15)

Trap Gate Descriptor

0 0 0

PDPL

0 11 1 1 RESERVED0 0 0

P 0 0 1 0 1

P 0 1 1 1 0 0 0 0

P 0 1 1 1 1 0 0 0

63 48 47 46 45 44 43 42 41 40 39 38 37 36 32

63 48 47 46 45 44 43 42 41 40 39 38 37 36 32

63 48 47 46 45 44 43 42 41 40 39 32

31 16 15 0

31 16 15 0

31 16 15 0

Page 22: Chap4.Interrupts

22Chapter 4 Interrupts and Exceptions

HW Handling of Interrupts (Exceptions)

In between instructions, control unit (CPU) checks if any interrupt or exception occurs

1. Determines vector i (0<=i<=255) associated with the interrupt (exception)

2. Read i-th entry of IDT3. Obtain IH addr (by entry’s segment selector gdtr GDT

segment base addr)4. Check privilege level by comparing cs’s CPL and IH’s seg

ment’s DPL5. Use the right stack (after checking privilege level)6. If a fault has occurs, load cs and eip with the add of the inst

causing fault7. Saves contents of eflags, cs, and eip in the stack8. Load cs and eip of the IH routine

Page 23: Chap4.Interrupts

23Chapter 4 Interrupts and Exceptions

Interrupt Handler Return Path1. Load cs, eip, and eflags registers with the val

ues stored in the stack If a HW error code has been pushed in the stack o

n top of eip, it must be popped before taking the return path

2. Check if CPL of ISR’s cs == the CPL value of the restored cs. If so, ISR is done.

3. Otherwise, load ss and esp from stack and return to the stack associated with old privilege level

4. Take care of user-mode process return case to avoid using wrong segment selectors

Page 24: Chap4.Interrupts

24Chapter 4 Interrupts and Exceptions

Nested Execution of IHs Linux does not allow process switching during an inter

rupt handler routine But, an interrupt handler may be interrupted by another one The current process does not change during nested IHs

The only kernel exception is Page Fault exception The rest exceptions should only be raised in user mode Otherwise (raised in kernel mode), it caused a kernel panic

Page fault exception handlers may suspend current process (until requested page is in memory) Context switch is possible inside this handler

Interrupts raised by I/O devices do not refer to data structures specific to current process

Page 25: Chap4.Interrupts

25Chapter 4 Interrupts and Exceptions

Nested Execution of IHs

Interrupt handlers cannot allow page fault No exception handler may preempt interrupt handler No context switch will take place inside interrupt han

dler Nested execution of IHs for

To improve throughput of PIC and device controllers Before CPU acks an interrupt, both PIC and a device controll

er are blocked To implement an interrupt model without priority mo

del An interrupt handler can be preempted by another one

Page 26: Chap4.Interrupts

26Chapter 4 Interrupts and Exceptions

IDT Initialization The base addr of IDT should be loaded into idtr before kernel ena

bles interrupts lidt idt_descr # (arch/i386/kernel/head.S) idt_descr: .word IDT_ENTRIES*8-1 # idt contains 256 entries .long idt_table

The int instruction allows a User Mode process to issue any interrupt signal with any vector in 0 and 255 To block illegal int from a user-mode process, set DPL of gate descri

ptor to 0 When an int from a user-mode process, its CPL (3) > DPL (0) “gen

eral protection” exception

In a few cases, a user-mode process must be able to issue a programmed exception set DPL of gate descriptor to 3

Page 27: Chap4.Interrupts

27Chapter 4 Interrupts and Exceptions

Interrupt, Trap, System Gates Intel IDT provides 3 types of interrupt descriptors

Task, Interrupt, Trap gate descriptors Linux’s classification

Interrupt gate (DPL = 0) Cannot be accessed by a user-mode process All Linux interrupt handlers use this one

System gate (DPL = 3) An Intel trap gate that can be accessed by a user process Vectors 3 (int3), 4 (into), 5 (bound), 128 (int $0x80)

Trap gate (DPL = 0) An Intel trap gate that cannot be accessed by a user process Most Linux exception handlers use this one

Page 28: Chap4.Interrupts

28Chapter 4 Interrupts and Exceptions

IDT Operations set_intr_gate (n,addr)

Insert an interrupt gate in the n-th IDT entry Segment selector kernel code’s selector Offset addr, DPL 0

set_system_gate (n,addr) Insert a trap gate in the n-th IDT entry Segment selector kernel code’s selector Offset addr, DPL 3

set_trap_gate (n,addr) Insert a trap gate in the n-th IDT entry Segment selector kernel code’s selector Offset addr, DPL 0

Code trace: trap_init()

Page 29: Chap4.Interrupts

29Chapter 4 Interrupts and Exceptions

IDT Preliminary Initialization IDT is first initialized and used by BIOS Once Linux takes over (protected mode), IDT is initialize

d again by Linux idt_table: 256 entries

During kernel initialization setup_idt() fills all entries in idt_table with ignore_int() arch/i386/kernel/head.S

ignore_int() save registers in stack printk() restore registers from stack

execute iret to resume Second initialization: kernel replaces some entries with r

eal interrupt handlers trap_init()

Page 30: Chap4.Interrupts

30Chapter 4 Interrupts and Exceptions

Review Slide IDT? # of entries in IDT? Size of each

entry? Base addr of IDT?

Types of descriptors in IDT? The only kernel exception? How to block illegal interrupt from a user-

mode process? How to enable a user-mode process issue

a programmed exception? Linux interrupt descriptor classification?

Interrupt gate, System gate, Trap gate?

Page 31: Chap4.Interrupts

31Chapter 4 Interrupts and Exceptions

Review Slide set_intr_gate(), set_system_gate(), set_t

rap_gate()?

Page 32: Chap4.Interrupts

32Chapter 4 Interrupts and Exceptions

Exception Handling

Page 33: Chap4.Interrupts

33Chapter 4 Interrupts and Exceptions

Introduction Most exceptions issued by CPU are interpreted by Linu

x as error conditions A signal is sent to current process If no signal handler is set for that signal, it aborts current proc

ess Special case: page fault exception

Exception handler handling steps: Save registers in Kernel Mode stack Call a high-level C function to handle exception Exit from handler by call ret_from_exception()

Code trace: page_fault exception arch/i386/kernel/entry.S arch/i386/kernel/traps.C

Page 34: Chap4.Interrupts

34Chapter 4 Interrupts and Exceptions

Exception Handler Registrationvoid __init trap_init(void){

…set_trap_gate(0,&divide_error);set_intr_gate(1,&debug);set_intr_gate(2,&nmi);set_system_gate(3,&int3);/* int3-5 can be called from all */set_system_gate(4,&overflow);set_system_gate(5,&bounds);set_trap_gate(6,&invalid_op);set_trap_gate(7,&device_not_available);set_task_gate(8,GDT_ENTRY_DOUBLEFAULT_TSS);set_trap_gate(9,&coprocessor_segment_overrun);set_trap_gate(10,&invalid_TSS);set_trap_gate(11,&segment_not_present);set_trap_gate(12,&stack_segment);set_trap_gate(13,&general_protection);set_intr_gate(14,&page_fault);set_trap_gate(15,&spurious_interrupt_bug);set_trap_gate(16,&coprocessor_error);set_trap_gate(17,&alignment_check);

set_trap_gate(19,&simd_coprocessor_error);

set_system_gate(SYSCALL_VECTOR,&system_call);

set_call_gate(&default_ldt[0],lcall7);set_call_gate(&default_ldt[4],lcall27);

cpu_init();trap_init_hook();

}

Page 35: Chap4.Interrupts

35Chapter 4 Interrupts and Exceptions

Entering/Leaving Exception Handler A high-level C handler often stores error code and vect

or in task_struct and sends a suitable signal to current process

current->tss.error_code = error_code;current->tss.trap_no = vector;force_sig(sig_num, current); Code trace: do_general_protection()

The current process takes care of signal right after termination of exception handler Signal will be processed by process’s signal handler If no handler is available, kernel will handle it and kill process

When exception handler returns, it goes toaddl $8, %espjmp ret_from_exception

Page 36: Chap4.Interrupts

36Chapter 4 Interrupts and Exceptions

Interrupt Handling

Page 37: Chap4.Interrupts

37Chapter 4 Interrupts and Exceptions

Introduction No signal is sent to process for interrupts

Signal is sent to process for exceptions Interrupt handler for a device is part of t

he device’s driver Interrupt types:

I/O interrupts: to handle I/O devices Timer interrupts: Chapter 6

Self-reading material Interprocessor interrupts: to interrupt anoth

er CPU in a MP system

Page 38: Chap4.Interrupts

38Chapter 4 Interrupts and Exceptions

I/O Interrupt Handling An I/O IH should be capable of servicing several device

s at the same time Several devices may share same IRQ Refer to Table 4.3 in next slide

IRQ sharing One interrupt handler executes several ISRs Each ISR is related to a single device sharing this IRQ line Each ISR is executed when an interrupt occurs

IRQ dynamic allocation An IRQ line is associated with a device when accessed E.g. floppy disk device Same IRQ vector may be used by several devices, but not at t

he same time

Page 39: Chap4.Interrupts

39Chapter 4 Interrupts and Exceptions

IRQn

Device 1 Device 2

PIC IRQn_interrupt()

do_IRQ(n)

Interrupt serviceroutine 1

Interrupt serviceroutine 2

INT IDT[32+n]

I/O Interrupt HandlingSOFTWARE

(Interrupt Handler)HARDWARE

Page 40: Chap4.Interrupts

40Chapter 4 Interrupts and Exceptions

Sample: IRQ Assignment to I/O Devices

IRQ INT Device IRQ INT Device

0 32 Timer 10 42 Network interface

1 33 Keyboard 11 43 USB, sound card

2 34 PIC cascading

12 44 PS/2 mouse

3 35 2nd serial port

13 45 Math coprocessor

4 36 1st serial port

14 46 EIDE disk controller 1st

chain

6 38 Floppy disk 15 47 EIDE disk controller 2nd

chain

8 40 System clock

Page 41: Chap4.Interrupts

41Chapter 4 Interrupts and Exceptions

Interrupt Handler Structure Linux divides the actions in an IH into 3 classes

Critical, Noncritical, Noncritical deferrable Critical

E.g. ack an interrupt to PIC so it can take another interrupt at the same IRQ line

Executed in IH, with maskable interrupts disabled Noncritical

E.g. updating data structures accessed only by processor Should be finished quickly Executed in IH, with maskable interrupts enabled

Noncritical deferrable E.g. copying buffer content into addr space of some process Can be delayed for a long time Executed outside IH, called bottom-half section

Page 42: Chap4.Interrupts

42Chapter 4 Interrupts and Exceptions

Interrupt Vectors Some devices be statically connected to

specific IRQ lines Internal timer IRQ0 Salve 8259A PIC IRQ2 External math-coprocessor IRQ13

3 ways to dynamically select a line for IRQ-configurable devices By setting HW jumpers By a utility program shipped with the device By HW protocol executed at system startup

Page 43: Chap4.Interrupts

43Chapter 4 Interrupts and Exceptions

Interrupt Handler Implementation

Page 44: Chap4.Interrupts

44Chapter 4 Interrupts and Exceptions

I/O Interrupt Handler Tasks

1. Save IRQ value and register contents in Kernel Mode stack

2. Sends an ack to PIC that is servicing the IRQ line, allowing it to issue further interrupts

3. Execute ISRs associated with all devices sharing this IRQ

4. Terminating by ret_from_intr()

Page 45: Chap4.Interrupts

45Chapter 4 Interrupts and Exceptions

typedef struct irq_desc {unsigned int status; /* IRQ line status, next slide */hw_irq_controller *handler;struct irqaction *action; /* IRQ action ISR list */unsigned int depth; /* nested irq disables */unsigned int irq_count; /* For detecting broken interrupts */unsigned int irqs_unhandled;spinlock_t lock;

} ____cacheline_aligned irq_desc_t;

extern irq_desc_t irq_desc [NR_IRQS]; // global variable

typedef struct hw_interrupt_type hw_irq_controller;

struct hw_interrupt_type {const char * typename;unsigned int (*startup) (unsigned int irq);void (*shutdown) (unsigned int irq);void (*enable) (unsigned int irq);void (*disable) (unsigned int irq);void (*ack) (unsigned int irq);void (*end) (unsigned int irq);void (*set_affinity) (unsigned int irq, cpumask_t dest);

};

Page 46: Chap4.Interrupts

46Chapter 4 Interrupts and Exceptions

irq_desc0 i 224

irq_desc_t

:

hw_interrupt_type

irqaction irqaction

IRQ Descriptors

Page 47: Chap4.Interrupts

47Chapter 4 Interrupts and Exceptions

IRQ Status Listing/* * IRQ line status. */

#define IRQ_INPROGRESS 1 /* IRQ handler active - do not enter! */#define IRQ_DISABLED 2 /* IRQ disabled - do not enter! */#define IRQ_PENDING 4 /* IRQ pending - replay on enable */#define IRQ_REPLAY8 /* IRQ has been replayed but not acked yet */#define IRQ_AUTODETECT 16 /* IRQ is being autodetected */#define IRQ_WAITING 32 /* IRQ not yet seen - for autodetection */#define IRQ_LEVEL 64 /* IRQ level triggered */#define IRQ_MASKED 128 /* IRQ masked - shouldn't be seen again

*/#define IRQ_PER_CPU 256 /* IRQ is per CPU */

Page 48: Chap4.Interrupts

48Chapter 4 Interrupts and Exceptions

.dataENTRY(interrupt).text

vector=0ENTRY(irq_entries_start).rept NR_IRQS ALIGN1: pushl $vector-256 jmp common_interrupt.data .long 1b.textvector=vector+1.endr

ALIGNcommon_interrupt: SAVE_ALL call do_IRQ jmp ret_from_intr

#define BUILD_INTERRUPT(name, nr) \ENTRY(name) \ pushl $nr-256; \ SAVE_ALL \ call smp_/**/name; \ jmp ret_from_intr;

/* The include is where all of the SMP etc. interrupts come from */

#include "entry_arch.h"

ENTRY(divide_error) pushl $0 # no error code pushl $do_divide_error ALIGNerror_code: pushl %ds pushl %eax xorl %eax, %eax pushl %edx decl %eax # eax = -1 pushl %ecx pushl %ebx cld movl %es, %ecx movl ORIG_EAX(%esp), %esi # get the error code movl ES(%esp), %edi # get the function address movl %eax, ORIG_EAX(%esp) movl %ecx, ES(%esp) movl %esp, %edx pushl %esi # push the error code pushl %edx # push the pt_regs pointer movl $(__USER_DS), %edx movl %edx, %ds movl %edx, %es call *%edi addl $8, %esp jmp ret_from_exception

Page 49: Chap4.Interrupts

49Chapter 4 Interrupts and Exceptions

irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned = {[0 ... NR_IRQS-1] = {

.handler = &no_irq_type,

.lock = SPIN_LOCK_UNLOCKED }};

asmlinkage void __init start_kernel(void){ …

sort_main_extable();trap_init();rcu_init();init_IRQ();… }

void __init init_IRQ(void){

pre_intr_init_hook();

for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {

int vector = FIRST_EXTERNAL_VECTOR + i;

if (i >= NR_IRQS)break;

if (vector != SYSCALL_VECTOR) set_intr_gate(vector, interrupt[i]);

}intr_init_hook();setup_timer(); …

}

void __init pre_intr_init_hook(void){

init_ISA_irqs();}void __init init_ISA_irqs (void){

init_8259A(0);for (i = 0; i < NR_IRQS; i++) { irq_desc[i].status = IRQ_DISABLED; irq_desc[i].action = 0; irq_desc[i].depth = 1;

if (i < 16) { irq_desc[i].handler = &i8259A_irq_type;

} else { irq_desc[i].handler = &no_irq_type; }}

}

static struct hw_interrupt_type i8259A_irq_type = {"XT-PIC",startup_8259A_irq,shutdown_8259A_irq,enable_8259A_irq,disable_8259A_irq,mask_and_ack_8259A,end_8259A_irq,NULL

};

Page 50: Chap4.Interrupts

50Chapter 4 Interrupts and Exceptions

asmlinkage unsigned int do_IRQ(struct pt_regs regs){

int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */irq_desc_t *desc = irq_desc + irq;

irq_enter();kstat_this_cpu.irqs[irq]++;spin_lock(&desc->lock);desc->handler->ack(irq);status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);status |= IRQ_PENDING;

/* we _want_ to handle it */

for (;;) {irqreturn_t action_ret;spin_unlock(&desc->lock);…action_ret = handle_IRQ_event(irq, &r

egs, action);…spin_lock(&desc->lock);desc->status &= ~IRQ_PENDING;

}desc->status &= ~IRQ_INPROGRESS;

out:desc->handler->end(irq);spin_unlock(&desc->lock);irq_exit();return 1;

}

asmlinkage int handle_IRQ_event(unsigned int irq,struct pt_regs *regs, struct irqaction *action)

{int status = 1;/* Force the "do bottom halves" bit */int retval = 0;

if (!(action->flags & SA_INTERRUPT))local_irq_enable(); // RA

do {status |= action->flags;retval |= action->handler(irq,

action->dev_id, regs);action = action->next;

} while (action);

if (status & SA_SAMPLE_RANDOM)add_interrupt_randomness(irq);

local_irq_disable(); // RAreturn retval;

}

Page 51: Chap4.Interrupts

51Chapter 4 Interrupts and Exceptions

Registering Interrupt Service Routine

Drivers can register an IH and enable a given interrupt line via

int int request_irq(unsigned int irq, irqreturn_t (*handler)(int, void *, struct pt_regs *), unsigned long irqflags, const char * devname, void *dev_id);

irq: the interrupt line # to allocate For legacy PC device, this value is hard-coded For most other devices, it is probed or determined dynamically

handler: pointer to actual ISR irqflags: discussed in next slide devname: an ASCII text representation such as “keyboard” dev_id: is used as an unique cookie when this line is shared

A common practice is to pass driver’s device structure

Page 52: Chap4.Interrupts

52Chapter 4 Interrupts and Exceptions

irqflags Options irqflags may be either 0 or a bit mask of one or more o

f following flags SA_INTERRUPT

The given IH is a fast IH: it runs with all interrupts disabled on local processor

By default (w/o this flag), all interrupts are enabled except the interrupt lines of any running handlers

SA_SAMPLE_RANDOM Interrupts generated by this device should contribute to the k

ernel random pool Used on devices with non-deterministic interrupt intervals

SA_SHIRQ The interrupt line cab be shared among multiple ISRs

Page 53: Chap4.Interrupts

53Chapter 4 Interrupts and Exceptions

request_irq Usage To request an interrupt line and install a handler

if (request_irq(irqn, my_interrupt, SA_SHIRQ, “my-device”, dev)) {

printk(KERN_ERR “my_device: cannot register IRQ %d\n”, irqn);return –EIO;

} This call may block, so it cannot be called from interrupt cont

ext or other situations where code cannot block If return 0 handler was successfully installed

To free an interrupt line, callvoid free_irq(unsigned int irq, void *dev_id); If line is not shared, it removes handler and disables the line Otherwise, the line is only disabled at removal of last handler dev_id is used to uniquely identify an interrupt handler This call can be made from process context

Page 54: Chap4.Interrupts

54Chapter 4 Interrupts and Exceptions

int request_irq(unsigned int irq, irqreturn_t (*handler)(int, void *, struct pt_regs *),unsigned long irqflags, const char * devname,void *dev_id)

{int retval;struct irqaction * action;

if (irq >= NR_IRQS) return -EINVAL;if (!handler) return -EINVAL;

action = (struct irqaction *)kmalloc(sizeof(struct irqaction), GFP_ATOMIC);if (!action)

return -ENOMEM;

action->handler = handler;action->flags = irqflags;action->mask = 0;action->name = devname;action->next = NULL;action->dev_id = dev_id;

retval = setup_irq(irq, action);if (retval) kfree(action);return retval;

}

int setup_irq(unsigned int irq, struct irqaction * new){

irq_desc_t *desc = irq_desc + irq;

if (desc->handler == &no_irq_type)return -ENOSYS;

spin_lock_irqsave(&desc->lock,flags);p = &desc->action;if ((old = *p) != NULL) { if (!(old->flags & new->flags & SA_SHIRQ)) {

spin_unlock_irqrestore(&desc->lock,flags);

return -EBUSY; }

do { p = &old->next; old = *p; } while (old); shared = 1;}

*p = new;if (!shared) {

desc->depth = 0;desc->status &= ~(IRQ_DISABLED |

IRQ_AUTODETECT | IRQ_WAITING | IRQ_INPROGRESS);

desc->handler->startup(irq);}spin_unlock_irqrestore(&desc->lock,flags);

register_irq_proc(irq);return 0;

}

Page 55: Chap4.Interrupts

55Chapter 4 Interrupts and Exceptions

Processing Steps in Detail1. A device issues an interrupt by sending an electric signal to the

interrupt controller2. If the interrupt line is enabled (can be disabled), IC sends interr

upt to processor3. If interrupts are not disabled in processor, it immediately stop

s current execution4. It disables interrupt system // RA: where does this take place?5. It jumps to a predefined location memory and executes code

(entry code) by its vector6. Entry code saves IRQ# and current register values on stack and

calls do_IRQ()7. do_IRQ() acks receipt of interrupt and disable interrupt deliver

y on this IRQ line8. do_IRQ() calls handle_IRQ_event() to execute registered ISRs9. do_IRQ() returns to entry code 10. Entry code jumps to ret_from_intr()

Page 56: Chap4.Interrupts

56Chapter 4 Interrupts and Exceptions

.dataENTRY(interrupt).text

vector=0ENTRY(irq_entries_start).rept NR_IRQS ALIGN1: pushl $vector-256 jmp common_interrupt.data .long 1b.textvector=vector+1.endr

ALIGNcommon_interrupt: SAVE_ALL call do_IRQ jmp ret_from_intr

#define BUILD_INTERRUPT(name, nr) \ENTRY(name) \ pushl $nr-256; \ SAVE_ALL \ call smp_/**/name; \ jmp ret_from_intr;

/* The include is where all of the SMP etc. interrupts come from */

#include "entry_arch.h"

ENTRY(divide_error) pushl $0 # no error code pushl $do_divide_error ALIGNerror_code: pushl %ds pushl %eax xorl %eax, %eax pushl %edx decl %eax # eax = -1 pushl %ecx pushl %ebx cld movl %es, %ecx movl ORIG_EAX(%esp), %esi # get the error code movl ES(%esp), %edi # get the function address movl %eax, ORIG_EAX(%esp) movl %ecx, ES(%esp) movl %esp, %edx pushl %esi # push the error code pushl %edx # push the pt_regs pointer movl $(__USER_DS), %edx movl %edx, %ds movl %edx, %es call *%edi addl $8, %esp jmp ret_from_exception

Page 57: Chap4.Interrupts

57Chapter 4 Interrupts and Exceptions

asmlinkage unsigned int do_IRQ(struct pt_regs regs){

int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */irq_desc_t *desc = irq_desc + irq;

irq_enter();kstat_this_cpu.irqs[irq]++;spin_lock(&desc->lock);desc->handler->ack(irq);status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);status |= IRQ_PENDING;

/* we _want_ to handle it */

for (;;) {irqreturn_t action_ret;spin_unlock(&desc->lock);…action_ret = handle_IRQ_event(irq, &r

egs, action);…spin_lock(&desc->lock);desc->status &= ~IRQ_PENDING;

}desc->status &= ~IRQ_INPROGRESS;

out:desc->handler->end(irq);spin_unlock(&desc->lock);irq_exit();return 1;

}

asmlinkage int handle_IRQ_event(unsigned int irq,struct pt_regs *regs, struct irqaction *action)

{int status = 1;/* Force the "do bottom halves" bit */int retval = 0;

if (!(action->flags & SA_INTERRUPT))local_irq_enable();

do {status |= action->flags;retval |= action->handler(irq,

action->dev_id, regs);action = action->next;

} while (action);

if (status & SA_SAMPLE_RANDOM)add_interrupt_randomness(irq);

local_irq_disable();return retval;

}

Page 58: Chap4.Interrupts

58Chapter 4 Interrupts and Exceptions

ret_from_intr() It is written in assembly code It first checks whether a reschedule is pe

nding (need_resched) If need_resched and kernel is returning t

o user-space, schedule() is called If need_resched and kernel is returning t

o kernel-space, schedule() is called only if (preempt_count == 0)

Page 59: Chap4.Interrupts

59Chapter 4 Interrupts and Exceptions

Review Slide Which exception does not generate signal to process? Exception handler initialization? Processing step? Types of interrupts?

I/O, timer, interprocessor? IRQ sharing? IRQ dynamic allocation? Linux classification of actions in IH?

Critical, Noncritical, Noncriticial Deferrable 3 ways to select IRQ lie for configurable device?

HW jumpers, utility program, HW protocol Interrupt handler initialization? Processing step?

Page 60: Chap4.Interrupts

60Chapter 4 Interrupts and Exceptions

Review Slide How to register an ISR?

request_irq() usage? Parameters? irqline, routine, flags, devname, dev_id?

Flags usage? SA_INTERRUPT, SA_SAMPLE_RANDOM, SA_SHIRQ

free_irq() usage? RA: Study usage of SA_SAMPLE_RANDOM

How it affects random-number generator Homework #4: IDT Table Initialization

Required for everyone Mail your report to TA before deadline

Page 61: Chap4.Interrupts

61Chapter 4 Interrupts and Exceptions

8259A PIC

Page 62: Chap4.Interrupts

62Chapter 4 Interrupts and Exceptions

8259A PIC History 在 IBM PC 及其相容機上所使用的PIC是 Intel 8259A 晶片 一個 8259A 晶片的可以接最多 8 個中斷源,但由於可以將

2 個或多個 8259A 晶片 cascade ,最多可以到 8 個 所以可以接 64 個中斷源

早期 IBM PC/XT 只有 1 個 8259A ,但設計師們馬上意識到這是不夠的,於是到了IBM PC/AT , 8259A 被增加到 2 個 其中一個稱作 Master ,另外一個為 Slave Slave cascade 連接在 Master 上 如今大多數的 PC 都擁有 2 個 8259A ,最多可以接收

15 個中斷 通過 8259A 可以對單個中斷源進行遮罩

Page 63: Chap4.Interrupts

63Chapter 4 Interrupts and Exceptions

8259A Architecture 一個 8259A 晶片有以下幾

個內部暫存器 Interrupt Mask Register (IMR)

過濾被遮罩的中斷 Interrupt Request Register (IR

R) 暫時放置未被進一步處理的 In

terrupt In Service Register (ISR)

當一個 Interrupt 正在被 CPU處理時,此中斷被放置在 ISR中

Page 64: Chap4.Interrupts

64Chapter 4 Interrupts and Exceptions

More on 8259A PIC

8259A 還有一個單元叫做 Priority Resolver 當多個中斷同時發生時, Priority Resolver 根據它們的優先順序,將高優先順序者優先傳遞給 CPU

Pentium 以及後來的 CPU 將 PIC 集成 Advanced Programmable Interrupt Controller (APIC) 不過為了向前相容,即便有 APIC 的機器也會有 8259A

現在的主機板上, 8259A 都是由南橋晶片提供

Page 65: Chap4.Interrupts

65Chapter 4 Interrupts and Exceptions

Interrupt Control on SMP

當 Intel 考慮如何在 IA-32 上架構 SMP 時,原來的中斷控制器 8259A 就顯得力不從心。

在 SMP 上,必須考慮外部設備來的中斷信號如何傳遞給某個合適的 CPU 以及 IPI ( Inter-Processor Interrupt )問題。

Intel 自 Pentium 之後,在 CPU 中集成了 APIC ,在 SMP 上,主板上有一個(至少一個,有的主板有多個 IO-APIC ,用來更好的分發中斷信號)全局的 APIC

它負責從外設接收中斷信號,再分發到 CPU 上,這個全局的 APIC 被稱作 IO-APIC

Page 66: Chap4.Interrupts

66Chapter 4 Interrupts and Exceptions

8295A Processing Flow (1/2)

1. 當一個中斷請求從 IR0 ~ IR7 中的某條線到達 IMR 時, IMR 首先判斷此 IR 是否被遮罩,若是,則此中斷請求被丟棄;否則,則將其放入 IRR 中

2. 在此中斷請求不能進一步處理之前,它一直被放在 IRR 中。一旦時機已到,Priority Resolver 將從所有被放置於 IRR 中的中斷裡挑選出一個優先順序最高的,將其傳遞給 CPU 處理。 IR 號碼越低的中斷優先順序別越高, (IR0 Timer 有最高優先權 )

3. 8259A 通過發送一個 INTR (Interrupt Request) 信號給 CPU ,通知 CPU 有一個中斷到達。 CPU 收到此信號後,會暫停執行下一條指令,然後發送一個 INTA (Interrupt Acknowledge) 信號給 8259A

4. 8259A 收到這個信號之後,馬上 set ISR 中對應此中斷的 bit ,同時 reset IRR 中相應的 bit ,表示此中斷正在被 CPU 處理,而不是正在等待 CPU

5. 隨後, CPU 會再次發送一個 INTA 信號給 8259A ,要求它告訴 CPU 此中斷請求的中斷向量是什麼,這是一個從 0 ~ 255 的一個數

6. 8259A 根據被設置的起始向量(起始向量通過中斷控制字 ICW2 被初始化)加上中斷請求號碼計算出中斷向量號,並將其放置在 Data Bus 上

Page 67: Chap4.Interrupts

67Chapter 4 Interrupts and Exceptions

8295A Processing Flow (2/2)

CPU 從 Data Bus 上得到這個中斷向量之後,就去 IDT 中找到相應的中斷服務程式 ISR routine

如果 8259A 的 End of Interrupt (EOI) 通知被設為手動模式,那麼當 ISR 處理後,應該發送一個 EOI 給 8259A

8259A 得到 EOI 通知之後, ISR 中對應此中斷請求的 bit 會被 reset

如果 EOI 通知被設定為自動模式,則在收到第 2 個 INTA 信號後, 8259A ISR 中對應於此中斷請求的 bit 就會被 reset

在此期間,如果又有新的中斷請求到達,並被放置於 IRR 中,如果這些新的請求中有比在 ISR 中放置的所有中斷優先順序別還高的話,則這些高優先級別的中斷請求將會被馬上按照上述過程處理;否則,這些中斷將會被放在 IRR 中,直到 ISR 中高優先的中斷被處理結束,也就是說直到 ISR 中高優先級別的 bit 被 reset 為止

Page 68: Chap4.Interrupts

68Chapter 4 Interrupts and Exceptions

IRQ2 / IRQ9 Redirection 為什麼要將 IRQ2 重定向到 IRQ9 上?這是由於相容性問題造成的 到了 IBM PC/AT ,以 cascade 的方式增加了一個 8259A ,這樣可

以多處理 7 種 IRQ。原來的 8259A 被稱作 Master PIC ,新增的被稱作 Slave PIC

由於 CPU 只有 1 條中斷線, Slave PIC 只好 cascade 在 Master PIC 上,佔用 IRQ2 ,但是導致在 IBM PC/XT 上使用 IRQ2 的設備將無法再使用它

為了解決此ㄧ問題,設計者從 Slave PIC 中挑出 IRQ9 ,要求軟體設計者將原來的 IRQ2 重定向到 IRQ9 上,也就是說 IRQ9 的 ISR routine 必須呼叫 IRQ2 的 ISR routine

這樣,原來接在 IRQ2 上的設備現在接在 IRQ9 上,在軟體上只需要增加 IRQ9 的 ISR ,就可以和原有系統相容。而在當時,增加的 IRQ9 ISR 是由 BIOS 所提供,所以從根本上保證了相容。

Page 69: Chap4.Interrupts

69Chapter 4 Interrupts and Exceptions

I/O Port & Address/ * arch/i386/mach-generic/io_ports.h Machine specific IO port address definition

for generic. */

/* i8259A PIC registers */#define PIC_MASTER_CMD 0x20#define PIC_MASTER_IMR 0x21#define PIC_MASTER_ISR

PIC_MASTER_CMD#define PIC_MASTER_POLL

PIC_MASTER_ISR#define PIC_MASTER_OCW3

PIC_MASTER_ISR#define PIC_SLAVE_CMD 0xa0#define PIC_SLAVE_IMR 0xa1

/* i8259A PIC related value */#define PIC_CASCADE_IR 2#define MASTER_ICW4_DEFAULT 0x01#define SLAVE_ICW4_DEFAULT 0x01#define PIC_ICW4_AEOI 2

每一顆 8259A 晶片都有 2 個 I/O ports ,通過其控制 8259A Master 8259A 是 0x20 , 0x2

1 Slave 8259A 是 0xA0 , 0xA1

可向 8259A 寫入 2 種命令 Initialization Command Wor

d (ICW) :對 8259A 晶片初始化

Operation Command Word (OCW) :向 8259A 發佈命令,以對其進行控制

Page 70: Chap4.Interrupts

70Chapter 4 Interrupts and Exceptions

Linux 8259A Interrupt Handler

/* linux-2.6.14.1\arch\i386\kernel\I8259.c */

static struct hw_interrupt_type i8259A_irq_type = {

.typename = "XT-PIC",

.startup = startup_8259A_irq,

.shutdown = shutdown_8259A_irq,

.enable = enable_8259A_irq,

.disable = disable_8259A_irq,

.ack = mask_and_ack_8259A,

.end = end_8259A_irq,};

/* This contains the irq mask for both 8259A irq controllers, */unsigned int cached_irq_mask = 0xffff;

Page 71: Chap4.Interrupts

71Chapter 4 Interrupts and Exceptions

startup_8259A_irq and shutdown_8259A_irq(arch/i386/kernel/i8259.c)

54 unsigned int startup_8259A_irq(unsigned int irq)

55 {

56 enable_8259A_irq(irq);

57 return 0;

58 }

50 #define shutdown_8259A_irq disable_8259A_irq

Page 72: Chap4.Interrupts

72Chapter 4 Interrupts and Exceptions

enable_8259A_irq(arch/i386/kernel/i8259.c)105 void enable_8259A_irq(unsigned int irq) 106 { 107 unsigned int mask = ~(1 << irq); // Mask will be 11101111 11111111b if irq = 12d108 unsigned long flags; 109 110 spin_lock_irqsave(&i8259A_lock, flags); 111 cached_irq_mask &= mask; // 00110011 00111000b (Ori cached_irq_mask) // 11101111 11111111b (mask) // 00100011 00111000b (New cached_irq_mask)112 if (irq & 8) // whether irq >= 8113 outb(cached_slave_mask, PIC_SLAVE_IMR); 114 else 115 outb(cached_master_mask, PIC_MASTER_IMR); 116 spin_unlock_irqrestore(&i8259A_lock, flags); 117 }

Page 73: Chap4.Interrupts

73Chapter 4 Interrupts and Exceptions

disable_8259A_irq(arch/i386/kernel/i8259.c) 91 void disable_8259A_irq(unsigned int irq) 92 { 93 unsigned int mask = 1 << irq; 94 unsigned long flags; 95 96 spin_lock_irqsave(&i8259A_lock, flags); 97 cached_irq_mask |= mask; 98 if (irq & 8) 99 outb(cached_slave_mask, PIC_SLAVE_IMR); 100 else 101 outb(cached_master_mask, PIC_MASTER_IMR); 102 spin_unlock_irqrestore(&i8259A_lock, flags); 103 }

Page 74: Chap4.Interrupts

74Chapter 4 Interrupts and Exceptions

include/asm-i386/mach-default/io_ports.h

15 /* i8259A PIC registers */16 #define PIC_MASTER_CMD 0x2017 #define PIC_MASTER_IMR 0x2118 #define PIC_MASTER_ISR PIC_MASTER_CMD19 #define PIC_MASTER_POLL PIC_MASTER_ISR20 #define PIC_MASTER_OCW3 PIC_MASTER_ISR21 #define PIC_SLAVE_CMD 0xa022 #define PIC_SLAVE_IMR 0xa1

Page 75: Chap4.Interrupts

75Chapter 4 Interrupts and Exceptions

include/asm-i386/i8259.h

4 extern unsigned int cached_irq_mask;

5

6 #define __byte(x,y) (((unsigned char *) &(y))[x])

7 #define cached_master_mask (__byte(0, cached_irq_mask))

8 #define cached_slave_mask (__byte(1, cached_irq_mask))

Page 76: Chap4.Interrupts

76Chapter 4 Interrupts and Exceptions

/* Not all IRQs can be routed through the IO-APIC, eg. on certain (older) * boards the timer interrupt is not really connected to any IO-APIC pin, * it's fed to the master 8259A's IR0 line only. * * Any '1' bit in this mask means the IRQ is routed through the IO-APIC. * this 'mixed mode' IRQ handling costs nothing because it's only used * at IRQ setup time. */

void disable_8259A_irq(unsigned int irq){

unsigned int mask = 1 << irq;unsigned long flags;

// 確定對 master & slave 8259A 的 operation 是 mutual exclusion // for SMP system ?spin_lock_irqsave(&i8259A_lock, flags);

// 設定相對應的 bit 為 1 以 disable 此 IRQ linecached_irq_mask |= mask;

// 判斷是否 irq >= 8if (irq & 8)

// store slave IRQ maskoutb(cached_slave_mask, PIC_SLAVE_IMR);

else// store master IRQ maskoutb(cached_master_mask, PIC_MASTER_IMR);

spin_unlock_irqrestore(&i8259A_lock, flags);}

Page 77: Chap4.Interrupts

77Chapter 4 Interrupts and Exceptions

static void mask_and_ack_8259A(unsigned int irq) // 向 PIC 發送 EOI 表示 Int. Service 結束{

unsigned int irqmask = 1 << irq;unsigned long flags;

spin_lock_irqsave(&i8259A_lock, flags);if (cached_irq_mask & irqmask) // 判斷是否指定的 IRQ line 已經被 mask

// 8259A 在 IMR Reg 中相應位置被設為 1 情況下// 仍向 CPU 發出相應的中斷信號 , 因此是ㄧ個假中斷goto spurious_8259A_irq;

cached_irq_mask |= irqmask;

handle_real_irq:if (irq & 8) { // slave

inb(PIC_SLAVE_IMR); /* DUMMY - (do we need this?) */// mask 此 IRQ lineoutb(cached_slave_mask, PIC_SLAVE_IMR);// 寫入 0x60+(irq&7) 'Specific EOI' 操作 slave IRQ (irq&7)outb(0x60+(irq&7), PIC_SLAVE_CMD); /* 'Specific EOI' to slave */// 再寫入 0x60+PIC_CASCADE_IR 'Specific EOI' 操作 master IRQ2outb(0x60+PIC_CASCADE_IR, PIC_MASTER_CMD); /* 'Specific EOI' to master-IRQ2 */

} else { // master inb(PIC_MASTER_IMR); /* DUMMY - (do we need this?) */outb(cached_master_mask, PIC_MASTER_IMR);outb(0x60+irq,PIC_MASTER_CMD); /* 'Specific EOI to master */

}spin_unlock_irqrestore(&i8259A_lock, flags);return;

Page 78: Chap4.Interrupts

78Chapter 4 Interrupts and Exceptions

spurious_8259A_irq:/** this is the slow path - should happen rarely. */if (i8259A_irq_real(irq))

/* * oops, the IRQ _is_ in service according to the * 8259A - not spurious, go handle it. */goto handle_real_irq;

{static int spurious_irq_mask;/* * At this point we can be sure the IRQ is spurious, * lets ACK and report it. [once per IRQ] */

if (!(spurious_irq_mask & irqmask)) { // 判斷是否已經處理過此 spurous IRQ printk(KERN_DEBUG "spurious 8259A interrupt: IRQ%d.\n", irq);spurious_irq_mask |= irqmask;

}

atomic_inc(&irq_err_count); // 累加 irq_err_count/* * Theoretically we do not have to handle this IRQ, * but in Linux this does not cause problems and is * simpler for us. */// 在 Linux 中 , 按照處理真實 IRQ 方式處理 spurous IRQ 不會有問題goto handle_real_irq;

}}

Page 79: Chap4.Interrupts

79Chapter 4 Interrupts and Exceptions

/* * This function assumes to be called rarely. Switching between * 8259A registers is slow. * This has to be protected by the irq controller spinlock * before being called. */static inline int i8259A_irq_real(unsigned int irq){

int value;int irqmask = 1<<irq;

if (irq < 8) { // master// default 為 IRR Reg, 因此寫入 OCW3 = 0x0B 以切換到 ISR Regoutb(0x0B,PIC_MASTER_CMD); /* ISR register */// 是否此中斷真的在被 CPU 處理value = inb(PIC_MASTER_CMD) & irqmask;outb(0x0A,PIC_MASTER_CMD); /* back to the IRR register */return value;

}// slaveoutb(0x0B,PIC_SLAVE_CMD); /* ISR register */value = inb(PIC_SLAVE_CMD) & (irqmask >> 8);outb(0x0A,PIC_SLAVE_CMD); /* back to the IRR register */return value;

}

Page 80: Chap4.Interrupts

80Chapter 4 Interrupts and Exceptions

static void end_8259A_irq (unsigned int irq){

// 判斷 IRQ 是否被 disable 或 in-progress 中if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)) &&

irq_desc[irq].action)enable_8259A_irq(irq);

}

Page 81: Chap4.Interrupts

81Chapter 4 Interrupts and Exceptions

Interrupt Control Interface

Page 82: Chap4.Interrupts

82Chapter 4 Interrupts and Exceptions

Control Interfaces Purpose: to allow disabling the interrupt syste

m for current CPU or mask out an interrupt line for entire machine

Disable/enable interrupts locally for current processor: local_irq_disable(); local_irq_enable(); local_irq_save(flags); // save and disable local_irq_restore(flags); // restore and enable

Page 83: Chap4.Interrupts

83Chapter 4 Interrupts and Exceptions

Control Interfaces (2) Disable only a specific interrupt line for entire system

disable_irq(unsigned int irq); Wait until any currently executing handler completes

disable_irq_nosync(unsigned int irq); Will not wait

enable_irq(unsigned int irq); If disable_irq() is called twice, only the 2nd enable_irq() will actually ena

ble the interrupt line synchronize_irq(unsigned int irq);

Wait for a specific IH to exit, if executing, before returning Status checking

irqs_disable() returns nonzero if interrupt system on local CPU is disabled, or 0 otherwi

se in_interrupt()

return nonzero if kernel is in interrupt context (including in IH or BH) return zero if kernel is in process context

in_irq() return nonzero if kernel is executing an interrupt handler

Page 84: Chap4.Interrupts

84Chapter 4 Interrupts and Exceptions

disable_irq_nosync (1/2)<LINUX SRC>/kernel/irq/manage.cvoid disable_irq_nosync(unsigned int irq){ // get the IRQ descriptor we are going to

// disable irq_desc_t *desc = irq_desc + irq; unsigned long flags; // acquire lock spin_lock_irqsave(&desc->lock, flags);

Page 85: Chap4.Interrupts

85Chapter 4 Interrupts and Exceptions

disable_irq_nosync (2/2)

// disable IRQ if (!desc->depth++) { desc->status |= IRQ_DISABLED; desc->handler->disable(irq); } // release lock spin_unlock_irqrestore(&desc->lock, flags);}

Page 86: Chap4.Interrupts

86Chapter 4 Interrupts and Exceptions

disable_irq

<LINUX SRC>/kernel/irq/manage.cvoid disable_irq(unsigned int irq) { // get the IRQ descriptor we are going to

// disable irq_desc_t *desc = irq_desc + irq; disable_irq_nosync(irq); // let current IRQ handler to finish if (desc->action) synchronize_irq(irq); }

Page 87: Chap4.Interrupts

87Chapter 4 Interrupts and Exceptions

synchronize_irq

<LINUX SRC>/kernel/irq/manage.c#ifdef CONFIG_SMPvoid synchronize_irq(unsigned int irq) { struct irq_desc *desc = irq_desc + irq; while (desc->status & IRQ_INPROGRESS) cpu_relax(); }

#ifndef CONFIG_SMP# define synchronize_irq(irq)barrier()

Page 88: Chap4.Interrupts

88Chapter 4 Interrupts and Exceptions

enable_irq (1/4)

<LINUX SRC>/kernel/irq/manage.cvoid enable_irq(unsigned int irq){ // get the IRQ descriptor we are going to

// disable irq_desc_t *desc = irq_desc + irq; unsigned long flags; // acquire lock spin_lock_irqsave(&desc->lock, flags);

Page 89: Chap4.Interrupts

89Chapter 4 Interrupts and Exceptions

enable_irq (2/4)

switch (desc->depth) { // cannot enable IRQ when its depth = 0 case 0: WARN_ON(1); break;

Page 90: Chap4.Interrupts

90Chapter 4 Interrupts and Exceptions

enable_irq (3/4)

case 1: { // clear IRQ_DISABLED bit in desc->status unsigned int status = desc->status &

~IRQ_DISABLED;

desc->status = status; if ((status & (IRQ_PENDING | IRQ_REPLAY))

== IRQ_PENDING) { desc->status = status | IRQ_REPLAY; hw_resend_irq(desc->handler,irq); }

Page 91: Chap4.Interrupts

91Chapter 4 Interrupts and Exceptions

enable_irq (4/4)

default: desc->depth--; }

// release lock spin_unlock_irqrestore(&desc->lock, flags);}

Page 92: Chap4.Interrupts

92Chapter 4 Interrupts and Exceptions

hw_resend_irq

#ifdef CONFIG_X86_IO_APICstatic inline void hw_resend_irq(struct

hw_interrupt_type *h, unsigned int i){ if (IO_APIC_IRQ(i)) // write io_apic_vector into APIC send_IPI_self(IO_APIC_VECTOR(i));}#ifndef CONFIG_X86_IO_APICstatic inline void hw_resend_irq(struct

hw_interrupt_type *h, unsigned int i) {}

Page 93: Chap4.Interrupts

93Chapter 4 Interrupts and Exceptions

setup_irq (1/2)int setup_irq(unsigned int irq, struct irqaction

* new){ struct irq_desc *desc = irq_desc + irq; struct irqaction *old, **p; int shared = 0; ... p = &desc->action; if ((old = *p) != NULL) { ... shared = 1; }

Page 94: Chap4.Interrupts

94Chapter 4 Interrupts and Exceptions

setup_irq (2/2) *p = new; if (!shared) { desc->depth = 0; desc->status &= ~(IRQ_DISABLED |

IRQ_AUTODETECT | IRQ_WAITING | IRQ_INPROGRESS);

if (desc->handler->startup) desc->handler->startup(irq); else desc->handler->enable(irq); ... return 0;}

Page 95: Chap4.Interrupts

95Chapter 4 Interrupts and Exceptions

Mask/Unmask IRQs

local_irq_disable() #define local_irq_disable()

__asm__ __volatile__("cli": : :"memory")

local_irq_enable() #define local_irq_enable()

__asm__ __volatile__("sti": : :"memory")

RA: __volatile__

Page 96: Chap4.Interrupts

96Chapter 4 Interrupts and Exceptions

Review Slide IH return value?

IRQ_NONE, IRQ_HANDLED When IRQ line is shared, how an IH acks a requ

ested device? Interrupt context?

Sleep? Stack? I/O IH processing steps? local_irq_disable(), local_irq_enable()? disable_irq()? disable_irq_nosync()? irqs_disa

ble()? in_interrupt()?

Page 97: Chap4.Interrupts

97Chapter 4 Interrupts and Exceptions

Writing Interrupt Service Routine

Page 98: Chap4.Interrupts

98Chapter 4 Interrupts and Exceptions

Introduction A typical declaration of an ISR

static irqreturn_t intr_handler(int irq, void *dev_id, struct pt_regs *regs)

irq: the IRQ line it is servicing dev_id: a generic pointer to the same dev_id given to request

_irq() regs: processor registers prior to servicing the interrupt

Return value IRQ_NONE: ISR detects an interrupt for which its device was n

ot the originator IRQ_HANDLED: Otherwise

At a minimum, most ISRs need to provide acks to the device that they received the interrupt

When a line is shared by multiple ISRs, kernel invokes sequentially each registered handler A HW device should have a status register its ISR can check

Page 99: Chap4.Interrupts

99Chapter 4 Interrupts and Exceptions

Example: RTC Interrupt Service Routine

When RTC driver loads, rtc_init() is invoked to initialize the driver

static int __init rtc_init(void){ …

if (request_irq(rtc_irq, rtc_interrupt, SA_INTERRUPT, "rtc", (void *)&rtc_port)) {

printk(KERN_ERR "rtc: cannot register IRQ %d\n", rtc_irq); return -EIO;

} …}

rtc_interrupt runs with all interrupts disabled rtc_irq = IRQ8 on PC

Page 100: Chap4.Interrupts

100Chapter 4 Interrupts and Exceptions

irqreturn_t rtc_interrupt(int irq, void *dev_id, struct pt_regs *regs){

// Can be an alarm interrupt, update complete interrupt, or a periodic interrupt. // We store the status in the low byte and the number of interrupts received since // the last read in the remainder of rtc_irq_data. spin_lock (&rtc_lock);rtc_irq_data += 0x100;rtc_irq_data &= ~0xff;

if (is_hpet_enabled()) {rtc_irq_data |= (unsigned long)irq & 0xF0;

} else {rtc_irq_data |= (CMOS_READ(RTC_INTR_FLAGS) & 0xF0);

}

if (rtc_status & RTC_TIMER_ON)mod_timer(&rtc_irq_timer, jiffies + HZ/rtc_freq + 2*HZ/100);

spin_unlock (&rtc_lock);

spin_lock(&rtc_task_lock);if (rtc_callback) rtc_callback->func(rtc_callback->private_data);spin_unlock(&rtc_task_lock);wake_up_interruptible(&rtc_wait);kill_fasync (&rtc_async_queue, SIGIO, POLL_IN);

return IRQ_HANDLED;}

Page 101: Chap4.Interrupts

101Chapter 4 Interrupts and Exceptions

Interrupt Context When executing an interrupt handler or

bottom half, kernel is in interrupt context Interrupt context cannot sleep

Process context can This limits the functions which one can call from an

interrupt handler Interrupt context does not receive its own

stack It shares the kernel stack of the process it

interrupts If no process is running, it uses idle task’s stack

Code trace: keyboard ISR (IRQ1) Code trace: mouse ISR (IRQ12)

Page 102: Chap4.Interrupts

102Chapter 4 Interrupts and Exceptions

Mouse & Keyboard Interrupt Handler魏淳航

Page 103: Chap4.Interrupts

103Chapter 4 Interrupts and Exceptions

/proc/interrupt

Page 104: Chap4.Interrupts

104Chapter 4 Interrupts and Exceptions

I8042

PS/2 mouse and keyboard controller

This microcontroller is hidden within the motherboard’s chipset, which integrates many microcontrollers in a single package.

Page 105: Chap4.Interrupts

105Chapter 4 Interrupts and Exceptions

4 8-bits registers Status(read), control(write), input(writ), output(read) register. use IO port 0x60, 0x64

SR(0x SR(0x 64)64)

IR(0x IR(0x 60)60)

0R(0x 0R(0x 60)60)

CR(0x CR(0x 64)64)

8042 8042 chipchip

Page 106: Chap4.Interrupts

106Chapter 4 Interrupts and Exceptions

I8042 Architecture

Page 107: Chap4.Interrupts

107Chapter 4 Interrupts and Exceptions

Initial Steps

1. Init i8042 driver2. Set interface : Serio3. Init mouse driver4. Connect mouse to interface5. Call request irq6. Start mouse

Page 108: Chap4.Interrupts

108Chapter 4 Interrupts and Exceptions

int __init i8042_init(void){…//…//initial controller

i8042_aux_values.irq = I8042_AUX_IRQ;//12i8042_kbd_values.irq = I8042_KBD_IRQ;//1if (!i8042_noaux && !i8042_check_aux(&i8042_aux_values)) {

//check if aux is availableif (!i8042_nomux && !i8042_check_mux(&i8042_aux_values)){

//check if mux is avalilable for (i = 0; i < 4; i++) { i8042_init_mux_values(i8042_mux_values + i, i8042_mux_port + i, i); i8042_port_register(i8042_mux_values + i, i8042_mux_port + i); }

}else{ i8042_port_register(&i8042_aux_values, &i8042_aux_port);}

}

i8042_port_register(&i8042_kbd_values, &i8042_kbd_port);}

drivers\input\serio\i8042.c

Page 109: Chap4.Interrupts

109Chapter 4 Interrupts and Exceptions

Structure of SERIO static struct i8042_values i8042_aux_values = {

.irqen = I8042_CTR_AUXINT,//0x02

.disable = I8042_CTR_AUXDIS,//0x20

.name = "AUX",

.mux = -1,};

static struct serio i8042_aux_port ={

.type = SERIO_8042,

.write = i8042_aux_write,

.open = i8042_open,

.close = i8042_close,

.driver = &i8042_aux_values,

.name = "i8042 Aux Port",

.phys = I8042_AUX_PHYS_DESC,}; //others are NULL

struct serio {void *private;void *driver;char *name;char *phys;unsigned short idbus;unsigned short idvendor;unsigned short idproduct;unsigned short idversion;unsigned long type;unsigned long event;int (*write)(struct serio *, unsigned char);int (*open)(struct serio *);void (*close)(struct serio *);struct serio_dev *dev;struct list_head node;

};

Page 110: Chap4.Interrupts

110Chapter 4 Interrupts and Exceptions

static int __init i8042_port_register(struct i8042_values *values, struct serio *port){

values->exists = 1;

i8042_ctr &= ~values->disable;

if (i8042_command(&i8042_ctr, I8042_CMD_CTL_WCTR)) {//enable mouse or keyboardprintk(KERN_WARNING "i8042.c: Can't write CTR while registering.\n");values->exists = 0;return -1;

}

printk(KERN_INFO "serio: i8042 %s port at %#lx,%#lx irq %d\n", values->name, (unsigned long) I8042_DATA_REG, (unsigned long) I8042_COMMAND_REG, values->irq);

serio_register_port(port);

return 0;}

Page 111: Chap4.Interrupts

111Chapter 4 Interrupts and Exceptions

Add to serio_listvoid __serio_register_port(struct serio *serio){

list_add_tail(&serio->node, &serio_list);serio_find_dev(serio);

}

static void serio_find_dev(struct serio *serio){

struct serio_dev *dev;

list_for_each_entry(dev, &serio_dev_list, node) {if (serio->dev)

break;if (dev->connect)

dev->connect(serio, dev);}

}

Page 112: Chap4.Interrupts

112Chapter 4 Interrupts and Exceptions

Initial Steps

1. Init i8042 driver2. Set interface : Serio3. Init mouse driver4. Connect mouse to interface5. Call request irq6. Start mouse

Page 113: Chap4.Interrupts

113Chapter 4 Interrupts and Exceptions

\drivers\input\mouse\Psmouse-base.c

int __init psmouse_init(void){

psmouse_parse_proto();serio_register_device(&psmouse_dev);return 0;

}

void serio_register_device(struct serio_dev *dev){

struct serio *serio;down(&serio_sem);list_add_tail(&dev->node, &serio_dev_list);list_for_each_entry(serio, &serio_list, node)

if (!serio->dev && dev->connect)dev->connect(serio, dev);

up(&serio_sem);}

static struct serio_dev psmouse_dev = {.interrupt = psmouse_interrupt,.connect = psmouse_connect,.reconnect = psmouse_reconnect,.disconnect = psmouse_disconnect,.cleanup = psmouse_cleanup,

};

Page 114: Chap4.Interrupts

114Chapter 4 Interrupts and Exceptions

psmouse_connect()static void psmouse_connect(struct serio *serio, struct serio_dev *dev){

...if (serio->type!=SERIO_8042) //check if serio type is SERIO_8042

return;if (serio_open(serio, dev)) { //request irq

kfree(psmouse);serio->private = NULL;return;

}if (psmouse_probe(psmouse) < 0) { //Hand Shake

serio_close(serio); //get ack from mouse and device ID (0x00)kfree(psmouse);serio->private = NULL;return;

}psmouse->protocol_handler = psmouse_process_byte;//mouse event handlerpsmouse_activate(psmouse); // reset counter of mouse and enables it

}

Page 115: Chap4.Interrupts

115Chapter 4 Interrupts and Exceptions

serio_open( )-request irq

int serio_open(struct serio *serio, struct serio_dev *dev){

serio->dev = dev;if (serio->open && serio->open(serio)) {

serio->dev = NULL;return -1;

}return 0;

}

static int i8042_open(struct serio *port){struct i8042_values *values = port->driver;if (request_irq(values->irq, i8042_interrupt,SA_SHIRQ, "i8042", i8042_request_irq_cookie)) {

goto irq_fail;}

}

static struct serio i8042_aux_port ={

.type = SERIO_8042,

.write = i8042_aux_write,

.open = i8042_open,

.close = i8042_close,

.driver = &i8042_aux_values,

.name = "i8042 Aux Port",

.phys = I8042_AUX_PHYS_DESC,}; //others are NULL

Page 116: Chap4.Interrupts

116Chapter 4 Interrupts and Exceptions

Mouse Interrupt Handler

1. i8042_interrupt: get data and flags from 8042

2. psmouse_interrupt()3. psmouse_process_byte():handle the pa

ckets

Page 117: Chap4.Interrupts

117Chapter 4 Interrupts and Exceptions

I8042_interrupt

static irqreturn_t i8042_interrupt(int irq, void *dev_id, struct pt_regs *regs){unsigned int dfl;…spin_lock_irqsave(&i8042_lock, flags);str = i8042_read_status();if (str & I8042_STR_OBF)

data = i8042_read_data();spin_unlock_irqrestore(&i8042_lock, flags);

dfl = ((str & I8042_STR_PARITY) ? SERIO_PARITY : 0) | ((str & I8042_STR_TIMEOUT) ? SERIO_TIMEOUT : 0);

…(next page)

If 8042 output buffer have data.Read it and save to “data”

set flag from 8042

Page 118: Chap4.Interrupts

118Chapter 4 Interrupts and Exceptions

I8042_interrupt

if (i8042_aux_values.exists && (str & I8042_STR_AUXDATA)) {serio_interrupt(&i8042_aux_port, data, dfl, regs);goto irq_ret;

}

if (!i8042_kbd_values.exists)goto irq_ret;

serio_interrupt(&i8042_kbd_port, data, dfl, regs);

irq_ret:ret = 1;

}

Check status reg, if data is AUX typeThen we can call mouse interrupt

else :we call keyboard interrupt

Page 119: Chap4.Interrupts

119Chapter 4 Interrupts and Exceptions

I8042_interrupt

rqreturn_t serio_interrupt(struct serio *serio,unsigned char data, unsigned int flags, struct pt_regs *regs)

{…

if (serio->dev && serio->dev->interrupt) ret = serio->dev->interrupt(serio, data, flags, regs);

…return ret;

} static struct serio_dev psmouse_dev = {.interrupt = psmouse_interrupt,.connect = psmouse_connect,.reconnect = psmouse_reconnect,.disconnect = psmouse_disconnect,.cleanup = psmouse_cleanup,

};

Page 120: Chap4.Interrupts

120Chapter 4 Interrupts and Exceptions

Mouse Data PacketsThe standard PS/2 mouse sends movement (and button) information to the host using the following 3-byte packet (4)

Byte2(3) is the amount of movement that has occurred sincethe last movement data packet was sent to the host.

Page 121: Chap4.Interrupts

121Chapter 4 Interrupts and Exceptions

psmouse_interruptstatic irqreturn_t psmouse_interrupt(struct serio *serio,

unsigned char data, unsigned int flags, struct pt_regs *regs){

//check flags//check mouse state…if (psmouse->state == PSMOUSE_ACTIVATED && psmouse->pktcnt && time_after(jiffies, psmouse->last + HZ/2)) { printk(KERN_WARNING "psmouse.c: %s at %s lost synchronization, throwing %d bytes away.\n",psmouse->name, psmouse->phys, psmouse->pktcnt);

psmouse->pktcnt = 0;}psmouse->last = jiffies;psmouse->packet[psmouse->pktcnt++] = data;rc = psmouse->protocol_handler(psmouse, regs);…return IRQ_HANDLED;

}

Page 122: Chap4.Interrupts

122Chapter 4 Interrupts and Exceptions

psmouse_process_byte()

static psmouse_ret_t psmouse_process_byte(struct psmouse *psmouse, struct pt_regs *regs)

{struct input_dev *dev = &psmouse->dev;unsigned char *packet = psmouse->packet;

if (psmouse->pktcnt < 3 + (psmouse->type >= PSMOUSE_GENPS))return PSMOUSE_GOOD_DATA;

Page 123: Chap4.Interrupts

123Chapter 4 Interrupts and Exceptions

psmouse_process_byte()

input_report_key(dev, BTN_LEFT, packet[0] & 1);input_report_key(dev, BTN_MIDDLE, (packet[0] >> 2) & 1);input_report_key(dev, BTN_RIGHT, (packet[0] >> 1) & 1);

input_report_rel(dev, REL_X, packet[1] ? (int) packet[1] - (int) ((packet[0] << 4) & 0x100) : 0);

input_report_rel(dev, REL_Y, packet[2] ? (int) ((packet[0] << 3) & 0x100) - (int) packet[2] : 0);

return PSMOUSE_FULL_PACKET;}

Page 124: Chap4.Interrupts

124Chapter 4 Interrupts and Exceptions

static inline void input_report_key(struct input_dev *dev, unsigned int code, int value)

{input_event(dev, EV_KEY, code, !!value);

}

static inline void input_report_rel(struct input_dev *dev, unsigned int code, int value)

{input_event(dev, EV_REL, code, value);

}choose a handler from dev->h_list to handle the event

Page 125: Chap4.Interrupts

125Chapter 4 Interrupts and Exceptions

Linux kernel - 2.6.14

1. Init i8042 driver2. Set interface : Serio3. Init keyboard driver4. Connect keyboard to interface5. Call request irq6. Start keyboard interrupt

Page 126: Chap4.Interrupts

126Chapter 4 Interrupts and Exceptions

\drivers\input\keyboard\Atkbd.c

int __init atkbd_init(void){

serio_register_device(&atkbd_dev);return 0;

}

void serio_register_device(struct serio_dev *dev){

struct serio *serio;down(&serio_sem);list_add_tail(&dev->node, &serio_dev_list);list_for_each_entry(serio, &serio_list, node)

if (!serio->dev && dev->connect)dev->connect(serio, dev);

up(&serio_sem);}

static struct serio_dev psmouse_dev = {.interrupt = psmouse_interrupt,.connect = psmouse_connect,.reconnect = psmouse_reconnect,.disconnect = psmouse_disconnect,.cleanup = psmouse_cleanup,

};

Page 127: Chap4.Interrupts

127Chapter 4 Interrupts and Exceptions

Start Keyboard Interruptstatic irqreturn_t psmouse_interrupt(struct serio *serio,

unsigned char data, unsigned int flags, struct pt_regs *regs){

//check flags//check keyboard state…

unsigned int code = data; …… value = atkbd->release ? 0 :(1 + (!atkbd_softrepeat && test_bit(atkbd->keyc

ode[code], atkbd->dev.key))); ……

atkbd_report_key(&atkbd->dev, regs, atkbd->keycode[code], value);}

Page 128: Chap4.Interrupts

128Chapter 4 Interrupts and Exceptions

static void atkbd_report_key(struct input_dev *dev, struct pt_regs *regs, int code, int value)

{ …..

input_event(dev, EV_KEY, code, value); ……}

Page 129: Chap4.Interrupts

129Chapter 4 Interrupts and Exceptions

Bottom Half and Deferring Work

Page 130: Chap4.Interrupts

130Chapter 4 Interrupts and Exceptions

Why Bottom Half? IH (top halves) have following properties (requirements)

IH (top half) need to run as quickly as possible IH runs with some (or all) interrupt levels disabled IH are often time-critical and they deal with HW IH do not run in process context and cannot block

No hard and fast rules exist about what work to perform where Research work needed

Bottom halves are to defer work later “Later” is often simply “not now” Often, bottom halves run immediately after interrupt returns They run with all interrupts enabled

Page 131: Chap4.Interrupts

131Chapter 4 Interrupts and Exceptions

A World of Bottom Halves Multiple mechanisms are available for implementing a bottom hal

f softirq, tasklet, work queues

softirq: (available since 2.3) A set of 32 statically defined bottom halves that can run simultaneous

ly on any processor Even 2 of the same type can run concurrently

Used when performance is critical Must be registered statically at compile-time

tasklet: (available since 2.3) Are built on top of softirqs Two different tasklets can run simultaneously on different processors

But 2 of the same type cannot run simultaneously Used most of the time for its ease and flexibility Code can dynamically register tasklets

work queues: (available since 2.5) Queueing work to later be performed in process context

Page 132: Chap4.Interrupts

132Chapter 4 Interrupts and Exceptions

Softirqs Softirqs are rarely used

tasklets are used more of the time Statically allocated at compile-time

Related code: kernel/softirq.cstruct softirq_action{

void (*action)(struct softirq_action *); // function to runvoid *data; // data to pass to function

};static struct softirq_action softirq_vec[32];

In 2.6.7 kernel, only 6 softirqs are usedenum{

HI_SOFTIRQ=0, TIMER_SOFTIRQ, [code trace]NET_TX_SOFTIRQ, NET_RX_SOFTIRQ,SCSI_SOFTIRQ, TASKLET_SOFTIRQ

};

Page 133: Chap4.Interrupts

133Chapter 4 Interrupts and Exceptions

The Softirq Handler The prototype of a softirq handler:

void softirq_handler(struct softirq_action *) Example:

my_softirq = softirq_vec[0]; my_softirq->action(my_softirq); Passing the whole structure will make future change

of softirq_action invincible to every softirq handler A softirq never preempts another softirq

It can only be preempted by an interrupt handler Another softirq (even the same type) can run simult

aneously on another processor

Page 134: Chap4.Interrupts

134Chapter 4 Interrupts and Exceptions

Executing Softirqs

A softirq must be raised before it is executed At a suitable later time, pending softirqs runs

Pending softirqs are checked for and executed in the following places: After processing a HW interrupt By the ksoftirqd kernel thread By code that explicitly checks and executes pendin

g softirqs (e.g. networking subsystem) They all call do_softirq() to execute softirqs

Page 135: Chap4.Interrupts

135Chapter 4 Interrupts and Exceptions

Saving Registers for Exception Handler

struct pt_regs {long ebx;long ecx;long edx;long esi;long edi;long ebp;long eax;int xds;int xes;

long orig_eax;

long eip;int xcs;long eflags;long esp;int xss;

};

IRQn_interrupt:

pushl $n-256

jmp common_interrupt

common_interrupt:

SAVE_ALL

call do_IRQ

jmp $ret_from_intr

cldpush %espush %dspushl %eaxpushl %ebppushl %edipushl %esipushl %edxpushl %ecxpushl %ebxmovl $__KERNEL_DS, %edxmovl %edx, %dsmovl %edx, %es

xss

esp

eflags

xcs

eip

orig_eax

xes

xds

eax

ebp

edi

esi

edx

ecx

ebx

ESP

Page 136: Chap4.Interrupts

136Chapter 4 Interrupts and Exceptions

asmlinkage unsigned int do_IRQ(struct pt_regs regs){

int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */irq_desc_t *desc = irq_desc + irq;struct irqaction * action;unsigned int status;irq_enter();

kstat_this_cpu.irqs[irq]++;spin_lock(&desc->lock);desc->handler->ack(irq);status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);status |= IRQ_PENDING; /* we _want_ to handle it */

for (;;) {irqreturn_t action_ret;spin_unlock(&desc->lock);…action_ret = handle_IRQ_event(irq, &regs, a

ction);…spin_lock(&desc->lock);desc->status &= ~IRQ_PENDING;

}desc->status &= ~IRQ_INPROGRESS;

out:desc->handler->end(irq);spin_unlock(&desc->lock);irq_exit();return 1;

}

asmlinkage int handle_IRQ_event(unsigned int irq,struct pt_regs *regs, struct irqaction *action)

{int status = 1;/* Force the "do bottom halves" bit */int retval = 0;

if (!(action->flags & SA_INTERRUPT))local_irq_enable();

do {status |= action->flags;retval |= action->handler(irq,

action->dev_id, regs);action = action->next;

} while (action);

if (status & SA_SAMPLE_RANDOM)add_interrupt_randomness(irq);

local_irq_disable();return retval;

}

Page 137: Chap4.Interrupts

137Chapter 4 Interrupts and Exceptions

#define irq_exit() \do { \

preempt_count() -= IRQ_EXIT_OFFSET; \if (!in_interrupt() && softirq_pending(smp_processor_id())) \

do_softirq();\

preempt_enable_no_resched(); \} while (0)

static inline int netif_rx_ni(struct sk_buff *skb){ int err = netif_rx(skb); if (softirq_pending(smp_processor_id())) do_softirq(); return err;}

static int ksoftirqd(void * __bind_cpu){

current->flags |= PF_NOFREEZE;set_current_state(TASK_INTERRUPTIBLE);

…. do_softirq();}__set_current_state(TASK_RUNNING);return 0; …

}

asmlinkage void do_softirq(void){

unsigned long flags;struct thread_info *curctx;union irq_ctx *irqctx;u32 *isp;

if (in_interrupt()) return;local_irq_save(flags);if (local_softirq_pending()) {

curctx = current_thread_info();irqctx = softirq_ctx[smp_processor_id

()];irqctx->tinfo.task = curctx->task;irqctx->tinfo.previous_esp =

current_stack_pointer();

/* build the stack frame on the softirq stack */isp = (u32*) ((char*)irqctx + sizeof(*irq

ctx));asm volatile(" xchgl %%ebx,%%esp \n"" call __do_softirq \n"" movl %%ebx,%%esp \n": "=b"(isp): "0"(isp): "memory", "cc", "edx", "ecx", "eax");

}local_irq_restore(flags);

}

Page 138: Chap4.Interrupts

138Chapter 4 Interrupts and Exceptions

do_softirq()游家慶

Page 139: Chap4.Interrupts

139Chapter 4 Interrupts and Exceptions

do_softirq()

Finish the jobs deferred to bottom halves in ISR

1. Get pending list from current CPU’s irq_stat[cpu].member

2. Invoke __do_softirq() if there are some pending jobs

3. Restore local irq and leave do_softirq()

Page 140: Chap4.Interrupts

140Chapter 4 Interrupts and Exceptions

__do_softirq() (1/2) Finish the jobs deferred to bottom halv

es in ISRs1. Get pending list from current CPU’s ir

q_stat[cpu].member2. Disable bottom half3. Clear irq_stat[cpu].member4. Enable irq5. Carry out pending jobs until all jobs are

done

Page 141: Chap4.Interrupts

141Chapter 4 Interrupts and Exceptions

__do_softirq() (2/2)

6. Disable irq7. Get pending list from current CPU’s ir

q_stat[cpu].member(step 3 to 7 could be carried out for up to 10, set in MAX_

SOFTIRQ_RESTART, times as necessary)

8. Defer the remaining pending jobs if kernel thread should stop, invoke another do_softirq() otherwise.

Page 142: Chap4.Interrupts

142Chapter 4 Interrupts and Exceptions

When to invoke do_softirq()? Local_bh_enable macro re-enable the

softirqs do_IRQ() finishes handling an I/O interr

upt smp_apic_timer_interrupt() finishes ha

ndling a local timer interrupt One of the special ksoftirqd_CPUn kern

el threads is awoken A packet is received on a network card

Page 143: Chap4.Interrupts

143Chapter 4 Interrupts and Exceptions

Using Softirqs Currently, only networking and SCSI subsystems direc

tly use softirqs Kernel timers and tasklets are built on top of softirqs Index assignment:

Before using softirqs, you must declare its index at compile time via an enum in slide-64

Softirqs with lower numerical priority execute first Register handler:

Softirq handler is registered at run-time via open_softirq()

void open_softirq(int nr, void (*action)(struct softirq_action*), void *data){

softirq_vec[nr].data = data;softirq_vec[nr].action = action;

}

Page 144: Chap4.Interrupts

144Chapter 4 Interrupts and Exceptions

Using Softirq (2/2) Sofirqs run with interrupt enabled and cannot sleep When a handler runs, softirqs on current processor ar

e disabled Another CPU can execute softirqs Need proper locking in softirqs As a result, most softirq handlers resort to per-processor data

Raising softirq Call: raise_softirq(NEX_TX_SOFTIRQ), for example Softirqs are often raised from within interrupt handlers When done processing interrupts, kernel invokes do_softirq()

Page 145: Chap4.Interrupts

145Chapter 4 Interrupts and Exceptions

Review Slide Why bottom halves? BH available mechanism?

softirqs, tasklets, work queues 2.6.7, # of used softirqs? When and where are pending softirqs checked and ex

ecuted? do_softirq()? open_softirq()? raise_softirq()? HW#5: Study the usage of preempt_count()

Deadline: 03/27 (mail your report to TA) No class on 03/27 In-class presentation on 04/10 by 林凱立 Sample solution

Page 146: Chap4.Interrupts

146Chapter 4 Interrupts and Exceptions

Tasklets Usage

Page 147: Chap4.Interrupts

147Chapter 4 Interrupts and Exceptions

Tasklet Implementation Tasklets are implemented on top of softirqs

HI_SOFTIRQ, TASKLET_SOFTIRQ The former runs prior to the latter

struct tasklet_struct{

struct tasklet_struct *next; // next tasklet in the listunsigned long state; // state of the taskletatomic_t count; // reference counter: 0 == enabled, !0 = disabledvoid (*func)(unsigned long); // handler functionunsigned long data; // args to handler function

};

enum{

TASKLET_STATE_SCHED, /* Tasklet is scheduled for execution */TASKLET_STATE_RUN /* Tasklet is running (SMP only) */

};

#define DECLARE_TASKLET(name, func, data) \struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data }

#define DECLARE_TASKLET_DISABLED(name, func, data) \struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }

Page 148: Chap4.Interrupts

148Chapter 4 Interrupts and Exceptions

Scheduling Tasklets Scheduled tasklets (or raised softirqs) are stored in 2 per-processor structures

tasklet_vec (regular tasklets) tasklet_hi_vec (high-priority tasklets)

Tasklets are scheduled via tasklet_schedule() and tasklet_hi_schedule()

static inline void tasklet_schedule(struct tasklet_struct *t){

if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))__tasklet_schedule(t);

}

void fastcall __tasklet_schedule(struct tasklet_struct *t){

unsigned long flags;

local_irq_save(flags);t->next = __get_cpu_var(tasklet_vec).list;__get_cpu_var(tasklet_vec).list = t;raise_softirq_irqoff(TASKLET_SOFTIRQ);local_irq_restore(flags);

}

Page 149: Chap4.Interrupts

149Chapter 4 Interrupts and Exceptions

Execute Taskletsvoid __init softirq_init(void){

open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);

open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);

}

static void tasklet_action(struct softirq_action *a)

{struct tasklet_struct *list;

local_irq_disable();list = __get_cpu_var(tasklet_vec).list;__get_cpu_var(tasklet_vec).list = NULL;local_irq_enable();

while (list) {struct tasklet_struct *t = list;list = list->next;if (tasklet_trylock(t)) { if (!atomic_read(&t->count)) {

if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))

BUG(); t->func(t->data); tasklet_unlock(t); continue; } tasklet_unlock(t);}

local_irq_disable();t->next = __get_cpu_var(tasklet_vec).list;

__get_cpu_var(tasklet_vec).list = t;__raise_softirq_irqoff(TASKLET_SOFTIRQ);

local_irq_enable();}

}

Page 150: Chap4.Interrupts

150Chapter 4 Interrupts and Exceptions

softireq & tasklets Concurrency Two of the same tasklets never run concurrent

ly

#ifdef CONFIG_SMPstatic inline int tasklet_trylock(struct tasklet_struct *

t){ return !test_and_set_bit(TASKLET_STATE_RUN,

&(t)->state);}#else#define tasklet_trylock(t) 1#endif

Page 151: Chap4.Interrupts

151Chapter 4 Interrupts and Exceptions

Using Tasklet A tasklet can be declared statically or dynamically

DECLARE_TASKLET(name, func, data) DECLARE_TASKLET_DISABLED(name, func, data)

Writing tasklet handler void tasklet_handler(unsigned long data) for example A tasklet handler cannot sleep It runs with all interrupts enabled Two of the same tasklets never run concurrently If the same tasklet is scheduled again before it actually runs

it still runs only once Disable / Kill a tasklet

tasklet_disable() tasklet_disable_nosync() tasklet_kill()

Page 152: Chap4.Interrupts

152Chapter 4 Interrupts and Exceptions

ksoftirqd Most commonly, kernel processes softirqs on return fr

om handling an interrupt In interrupt context

However, softirqs may be raised at very high rates Sometimes, they reactivate themselves It may lead to starvation of user programs

Kernel solution When softirqs grow excessively, kernel wakes up a family of k

ernel threads They runs at lowest possible priority

One thread per processor, named ksoftirqd/n static int ksoftirqd(void * __bind_cpu) [code]

Page 153: Chap4.Interrupts

153Chapter 4 Interrupts and Exceptions

Work Queues

Page 154: Chap4.Interrupts

154Chapter 4 Interrupts and Exceptions

Introduction Work queues defer work into a kernel thread

Runs in process context Schedulable and can sleep These threads are called worker threads

Default worker threads are called events/n n is the processor number Unless there is a need to create its own thread, most drivers d

efer work to default worker thread

struct workqueue_struct {struct cpu_workqueue_struct cpu_wq[NR_CPUS];const char *name;struct list_head list; /* Empty if single thread */

};

Page 155: Chap4.Interrupts

155Chapter 4 Interrupts and Exceptions

More Data Structurestruct cpu_workqueue_struct {

spinlock_t lock;

long remove_sequence; /* Least-recently added (next to run) */long insert_sequence; /* Next to add */

struct list_head worklist;wait_queue_head_t more_work;wait_queue_head_t work_done;

struct workqueue_struct *wq;task_t *thread;

int run_depth; /* Detect run_workqueue() recursion depth */} ____cacheline_aligned;

Page 156: Chap4.Interrupts

156Chapter 4 Interrupts and Exceptions

#define create_workqueue(name) __create_workqueue((name), 0)

struct workqueue_struct *__create_workqueue(const char *name,

int singlethread){

int cpu, destroy = 0;struct workqueue_struct *wq;struct task_struct *p;

wq = kmalloc(sizeof(*wq), GFP_KERNEL);if (!wq) return NULL;memset(wq, 0, sizeof(*wq));

wq->name = name;lock_cpu_hotplug();if (singlethread) {

…} else {

spin_lock(&workqueue_lock);list_add(&wq->list, &workqueu

es);spin_unlock(&workqueue_loc

k);for_each_online_cpu(cpu) {

p = create_workqueue_thread(wq, cpu); ….}

static struct task_struct *create_workqueue_thread(struct workqueue_struct *wq, int cpu)

{struct cpu_workqueue_struct

*cwq = wq->cpu_wq + cpu;struct task_struct *p;

spin_lock_init(&cwq->lock);cwq->wq = wq;cwq->thread = NULL;cwq->insert_sequence = 0;cwq->remove_sequence = 0;INIT_LIST_HEAD(&cwq->worklist);init_waitqueue_head(&cwq->more_work);init_waitqueue_head(&cwq->work_done);

if (is_single_threaded(wq))p = kthread_create(worker_thre

ad, cwq, "%s", wq->name);else

p = kthread_create(worker_thread, cwq, "%s/%d", wq->name, cpu);if (IS_ERR(p))

return NULL;cwq->thread = p;return p;

}

Page 157: Chap4.Interrupts

157Chapter 4 Interrupts and Exceptions

static int worker_thread(void *__cwq){

struct cpu_workqueue_struct *cwq = __cwq;DECLARE_WAITQUEUE(wait, current);struct k_sigaction sa;sigset_t blocked;

current->flags |= PF_NOFREEZE;

set_user_nice(current, -10);

/* Block and flush all signals */sigfillset(&blocked);sigprocmask(SIG_BLOCK, &blocked, NULL);flush_signals(current);

/* SIG_IGN makes children autoreap: see do_notify_parent(). */

sa.sa.sa_handler = SIG_IGN;sa.sa.sa_flags = 0;siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);

set_current_state(TASK_INTERRUPTIBLE);while (!kthread_should_stop()) {

add_wait_queue(&cwq->more_work, &wait);

if (list_empty(&cwq->worklist))schedule();

else__set_current_state

(TASK_RUNNING);remove_wait_queue(&cwq->m

ore_work, &wait);

if (!list_empty(&cwq->worklist))

run_workqueue(cwq);

set_current_state(TASK_INTERRUPTIBLE);}__set_current_state(TASK_RUNNING);return 0;

}

Page 158: Chap4.Interrupts

158Chapter 4 Interrupts and Exceptions

Wait Queues Wait queues have several uses in kernel

especially for interrupt handling, process synchronization, and timing

A process wishing to wait for a specific event places itself in the proper wait queue and relinquishes control

Each wait queue is identified by a wait queue head (wait_queue_head_t) Wait queues are modified by interrupt handlers and major ke

rnel functions Protected by spinlock

Each element is of type wait_queue_t Each entry represents a sleeping process Exclusive processes: selectively woken up Nonexclusive processes: always woken up

Page 159: Chap4.Interrupts

159Chapter 4 Interrupts and Exceptions

Data Structuresstruct __wait_queue_head {

spinlock_t lock;struct list_head task_list;

};typedef struct __wait_queue_head wait_queue_head_t;

struct __wait_queue {unsigned int flags;

#define WQ_FLAG_EXCLUSIVE 0x01struct task_struct * task;wait_queue_func_t func;struct list_head task_list;

};

Page 160: Chap4.Interrupts

160Chapter 4 Interrupts and Exceptions

worker_thread() set_current_state(TASK_INTERRUPTIBLE);

mark it sleeping add_wait_queue(&cwq->more_work, &wait);

adds this thread into a wait queue if (list_empty(&cwq->worklist)) schedule()

do a context switch and sleep else __set_current_state(TASK_RUNNING);

Thread does not go to sleep remove_wait_queue(&cwq->more_work, &wait);

dequeue itself from the wait queue if (!list_empty(&cwq->worklist)) run_workqueue(cwq);

perform deferred work

Page 161: Chap4.Interrupts

161Chapter 4 Interrupts and Exceptions

Work Itemstruct work_struct {

unsigned long pending; // is this work pending?struct list_head entry; // link list of all workvoid (*func)(void *); // handler functionvoid *data; // argument to handlervoid *wq_data; // used internallystruct timer_list timer; // timer used by delay work queues

};

Page 162: Chap4.Interrupts

162Chapter 4 Interrupts and Exceptions

run_workqueue() while (!list_empty(&cwq->worklist)) {

Check out if worklist is empty, if not struct work_struct *work = list_entry(cwq->worklist.n

ext, struct work_struct, entry); Obtain one work item

void (*f) (void *) = work->func; Obtain handler function

void *data = work->data; Obtain argument to this handler function

list_del_init(cwq->worklist.next); Remove the work item

f(data); Execute handler function

Page 163: Chap4.Interrupts

163Chapter 4 Interrupts and Exceptions

Using Work Queues

Create work to defer DECLARE_WORK(xyz, void (*abc)(void *), void *def); It statically creates a work_struct structure named

xyz, with handler abc and data def Write work queue handler

void work_handler(void *data) for example It runs at process context

Schedule work On default event queue: schedule_work(&work); schedule_delayed_work(&work, delay);

Page 164: Chap4.Interrupts

164Chapter 4 Interrupts and Exceptions

static struct workqueue_struct *keventd_wq;int fastcall schedule_work(struct work_struct *w

ork){

return queue_work(keventd_wq, work);}int fastcall schedule_delayed_work(struct work_

struct *work, unsigned long delay){

return queue_delayed_work(keventd_wq, work, delay);

}int fastcall queue_delayed_work(struct workque

ue_struct *wq, struct work_struct *work, unsigned long delay)

{int ret = 0;struct timer_list *timer = &work->timer;if (!test_and_set_bit(0, &work->pending)) {

work->wq_data = wq;timer->expires = jiffies + delay;timer->data = (unsigned long)wor

k;timer->function = delayed_work_t

imer_fn;add_timer(timer);ret = 1;

}return ret;

}

/* We queue the work to the CPU it was submitted, but there is no guarantee that it will be processed by that CPU. */

int fastcall queue_work(struct workqueue_struct *wq, struct work_struct *work)

{int ret = 0, cpu = get_cpu();if (!test_and_set_bit(0, &work->pending)) {

if (unlikely(is_single_threaded(wq)))

cpu = 0;BUG_ON(!list_empty(&work->entr

y)); __queue_work(wq->cpu_wq + cpu, work);

ret = 1;}put_cpu(); return ret;

}void init_workqueues(void){

hotcpu_notifier(workqueue_cpu_callback, 0);keventd_wq = create_workqueue("events");BUG_ON(!keventd_wq);

}

Page 165: Chap4.Interrupts

165Chapter 4 Interrupts and Exceptions

int default_wake_function(wait_queue_t *curr, unsigned mode, int sync, void *key)

{task_t *p = curr->task;return try_to_wake_up(p, mode, sync);

}

#define wake_up(x) __wake_up(x, TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1, NULL)

void fastcall __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, void *key)

{unsigned long flags;spin_lock_irqsave(&q->lock, flags);__wake_up_common(q, mode, nr_exclusive, 0, key);spin_unlock_irqrestore(&q->lock, flags);

}

#define list_for_each_safe(pos, n, head) \for (pos = (head)->next, n = pos->next; pos != (head); \pos = n, n = pos->next)

RA: try_to_wake_up() [TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE, 1 or 0 or nr]

static void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync, void *key)

{struct list_head *tmp, *next;list_for_each_safe(tmp, next, &q->task_list) {

wait_queue_t *curr;unsigned flags;curr = list_entry(tmp, wait_que

ue_t, task_list);flags = curr->flags;if (curr->func(curr, mode, sync,

key) && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break;

}}

#define list_entry(ptr, type, member) \container_of(ptr, type, member)

#define container_of(ptr, type, member) ({\ const typeof( ((type *)0)->member ) *__mp

tr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,me

mber) );})

Page 166: Chap4.Interrupts

166Chapter 4 Interrupts and Exceptions

Summary Choices for bottom halves

softirqs, tasklets, work queues Softirqs provide least serialization

Only used when scalability is a concern Tasklets are used if code is not finely

threaded Work queues process work items in

process context Easiest to use

Page 167: Chap4.Interrupts

167Chapter 4 Interrupts and Exceptions

Disabling Bottom Halves

local_bh_disable() To disable all bottom halves (softirqs and ta

sklets) local_bh_enable()

To enable bottom halves If nested, only the last call enables

Page 168: Chap4.Interrupts

168Chapter 4 Interrupts and Exceptions

local_bh_disable()

local_bh_disable() disables all bottom halves, except workqueue on local CPU

Disable local bottom halves by incrementing preempt_count

local_bh_enable() enables local bottom halves by decreasing preempt_count check if any pending softirq

#define local_bh_disable() \do { preempt_count() += SOFTIRQ_OFFSET; \ barrier(); } while (0)

Page 169: Chap4.Interrupts

169Chapter 4 Interrupts and Exceptions

local_bh_enable()

local_bh_enable() enables local bottom halves by decreasing preempt_count, and optionally run any pending bottom halves

void local_bh_enable(void){

__local_bh_enable(); if (unlikely(!in_interrupt() && local_softirq_pending()))

invoke_softirq();}

Page 170: Chap4.Interrupts

170Chapter 4 Interrupts and Exceptions

Usage of preempt_count

Preemption markers preempt_disable and preempt_enable operate on a defined int. preempt_count, stored in each threadinfo

bits 8-15 are softirq count max # of softirqs: 256

OFFSET SOFTIRQ_OFFSET : 0x00000100 SOFTIRQ_MASK : 0x0000ff00

Page 171: Chap4.Interrupts

171Chapter 4 Interrupts and Exceptions

irq_exit() #define irq_exit() do { \

preempt_count() -= IRQ_EXIT_OFFSET; \ if (!in_interrupt() & softirq_pending(smp_processor_id())) \

do_softirq(); \} while (0)

in_interrupt() examines preempt_count to check if it is in softirq context

local_bh_disable is mostly used in driver

asmlinkage void __do_softirq(void){ pending = local_softirq_pending();

local_bh_disable();…/* handle softirq MAX_SOFTIRQ_RESTART times */…__local_bh_enable();

}

Page 172: Chap4.Interrupts

172Chapter 4 Interrupts and Exceptions

Review Slide tasklet IRQ? DECLARE_TASKLE? DECLARE_TASKLET_DISABLED? tasklet_action()? ksoftirqd()? Work queue usage? workqueue_struct? cpu_workqueue_struct? work_str

uct? worker_thread()? run_workqueue()? schedule_work()? MP1: Provide timer & keyboard ISRs for eos_x86 opera

ting system

Page 173: Chap4.Interrupts

173Chapter 4 Interrupts and Exceptions

Return from Interrupts and Exceptions朱宗賢

Page 174: Chap4.Interrupts

174Chapter 4 Interrupts and Exceptions

Introduction The following things must be handled before

terminating an interrupt or exception handler # of kernel control paths being concurrently

executed If there is just one, CPU switches back to user

mode Pending process switch requests

If TIF_NEED_RESCHED is set, call schedule() Pending signals

If a signal is sent to current process, it must be handled

Page 175: Chap4.Interrupts

175Chapter 4 Interrupts and Exceptions

Related Terminating Functions

ret_from_exception() Terminates all exceptions except 0x80 ones

ret_from_intr() Terminate interrupt handlers

ret_from_sys_call() Terminates system calls (0x80 programmed excepti

on) ret_from_fork()

Terminates fork(), vfork(), or clone() system calls

Page 176: Chap4.Interrupts

176Chapter 4 Interrupts and Exceptions

ret_from_exception:

ret_from_intr:

Nested Kernel control paths?

Virtual v86 mode?

ret_from_fork:

schedule_tail()

System call tracing?

syscall_trace()

ret_from_sys_call:

Need reschedule?

schedule()

Pendingsignals?

Virtual v86 mode?

do_signal()

Restore hardware context

save_v86_state()

yes

no

yes

yes

yes yes

no

no no

no

yes

tracesys_exit:

reschedule:

signal_return:

v86_signal_return

restore_all:

Return from Interrupts and Exceptions

Page 177: Chap4.Interrupts

177Chapter 4 Interrupts and Exceptions

Returning from Interrupt

Return from an interrupt path is much more complicated than the entry path

It is a good place to do other tasks, unrelated to the interrupt, but need to done fairly frequently

These include checking for pending signals or if a reschedule is needed

Page 178: Chap4.Interrupts

178Chapter 4 Interrupts and Exceptions

General Implementation Issue

Number of kernel control paths being concurrenly executed

Pending process switch requests Pending signals

Page 179: Chap4.Interrupts

179Chapter 4 Interrupts and Exceptions

Exiting from Interrupt Handling

Page 180: Chap4.Interrupts

180Chapter 4 Interrupts and Exceptions

Return from System Call

Disable interrupt first. It means that the tests follow are guaranteed to be atomic

Check pending work-to-be-done flags in thread information syscall trace active resumption notification requested signal pending rescheduling necessary

Page 181: Chap4.Interrupts

181Chapter 4 Interrupts and Exceptions

Returning form Exception and Interrupts

We have to determine whether the CPU was already running in kernel mode before the interrupt or not Kernel mode/ user mode / vm86 mode

If so, we are dealing with a nested interrupt and want to terminate the processing of it as quickly as possible

Page 182: Chap4.Interrupts

182Chapter 4 Interrupts and Exceptions

//entry.Sret_from_exception:

preempt_stopret_from_intr:

GET_THREAD_INFO(%ebp)movl EFLAGS(%esp), %eax # mix EFLAGS and CSmovb CS(%esp), %altestl $(VM_MASK | 3), %eaxjz resume_kernel # returning to

ENTRY(resume_userspace) cli # make sure we don't miss an interru

pt# setting need_resched or sigpending# between sampling and the iret

movl TI_flags(%ebp), %ecxandl $_TIF_WORK_MASK, %ecx

# is there any work to be done on# int/exception return?jne work_pendingjmp restore_all

// entry.s# system call handler stubENTRY(system_call)

…syscall_call:call *sys_call_table(,%eax,4)movl %eax,EAX(%esp) # store the return value

syscall_exit:cli # make sure we don't miss an interrupt # setting need_resched or sigpending

# between sampling and the iretmovl TI_flags(%ebp), %ecxtestw $_TIF_ALLWORK_MASK, %cx # current->workjne syscall_exit_work

restore_all:RESTORE_ALL

Page 183: Chap4.Interrupts

183Chapter 4 Interrupts and Exceptions

Deal with Pending Signal Check VM_MASK bit in the flags register

(Kernel / VM86 mode) Call do_notify_resume() There is an extra complication if a signal

was found to be pending while the processor was running in virtual 8086 mode before interrupt It copies saved values from the stack to the v

m86_info filed of the thread structure

Page 184: Chap4.Interrupts

184Chapter 4 Interrupts and Exceptions

Reschedule Current Process

If there is any switch request, the kernel must perform process scheduling; otherwise, control is returned to the current process

If the current process cannot continue after interrupt, then work_resched() will be invoked

Page 185: Chap4.Interrupts

185Chapter 4 Interrupts and Exceptions

Return from Fork

ret_from_fork function is executed by the child process right after its creation through a fork(), vfork(), or clone() system call

schedule_tail(): It is relevant only in the SMP case. It tries to find a suitable CPU on which to run the process just switched out.

Page 186: Chap4.Interrupts

186Chapter 4 Interrupts and Exceptions

// entry.swork_resched:

call scheduleclimovl TI_flags(%ebp), %ecxandl $_TIF_WORK_MASK, %ecxjz restore_alltestb $_TIF_NEED_RESCHED, %cljnz work_resched

work_notifysig: # deal with pending signals and

# notify-resume requeststestl $VM_MASK, EFLAGS(%esp)movl %esp, %eaxjne work_notifysig_v86# returning to kernel-space or# vm86-spacexorl %edx, %edxcall do_notify_resumejmp restore_all

// entry.sENTRY(ret_from_fork)

pushl %eaxcall schedule_tailGET_THREAD_INFO(%ebp)popl %eaxjmp syscall_exit