Boston Rs Meeting Jan09

Embed Size (px)

Citation preview

  • 7/30/2019 Boston Rs Meeting Jan09

    1/45

    D. J. Weidman January 2009

    1

    Daniel J. Weidman, Ph.D.

    Advanced Electron Beams, Wilmington, MA

    IEEE Boston-area Reliability Society MeetingWednesday, January 14, 2009

    Practical Reliability Engineering

    for Semiconductor Equipment

    [email protected]

    The pdf of this version of this presentation may be distributed freely.

    No proprietary information is included.

    If distributing any portion, please include my name as the author.

    Thank you, Dan Weidman

    Copyright 2009, D. J. Weidman

  • 7/30/2019 Boston Rs Meeting Jan09

    2/45

    D. J. Weidman January 2009

    2

    Practical Reliability Engineering for

    Semiconductor Equipment

    Abstract Reliability data can be utilized to allocate efforts for improvements and

    presentation to customers. This talk presents several practical techniques used to

    gather reliability data for these purposes. These techniques are based on basicreliability engineering concepts and are applied in simple ways. Data will be shownfor illustrative purposes without details about specific components or subsystems.This presentation will review definitions of several reliability engineering metrics.Examples will illustrate Pareto plots over various time intervals and availability withplanned and unplanned downtime. Important metrics such as Mean Time BetweenFailure (MTBF) and Mean Time Between Assists / Interrupts (MTBA/I) are used for

    quantifying failure rates.

    Data is collected and analyzed from various sources and tallied in a variety ofways. Repair data can be collected from service technician or customer fieldreports. Reliability data can be collected from in-house or customer-site machines.In-house inventory statistics can indicate which parts are being replaced mostfrequently, by part number or cost.

    Failure Analysis Reports should be communicated within the organization in a waythat is effective. Vendors often have to be engaged to improve reliability ofcomponents or subsystems. Information that will be presented may be applicableto several other industries.

  • 7/30/2019 Boston Rs Meeting Jan09

    3/45

    D. J. Weidman January 2009

    3

    Daniel J. Weidman, Ph.D.

    Dr. Daniel J. Weidman received his Bachelors degree in Physics from MIT in1985. He earned his Ph.D. in Electrical Engineering from the University ofMaryland, College Park. He has authored or co-authored more than 20

    journal articles and technical reports in publications and more than 60conference presentations. He started working with electron beams more than20 years ago, and has since returned to that industry. He brings a freshperspective to reliability engineering in the semiconductor industry, becausehe has no formal training in reliability engineering and he had less than twoyears of experience in the semiconductor industry when he took a position asthe Reliability Engineer at NEXX Systems.

    NEXX Systems is located in Billerica, Massachusetts, and designs and sellssemiconductor manufacturing equipment. Dr. Weidman was the ReliabilityEngineer there for almost five years. Dr. Weidman has resumed working inthe field of electron beams, at Advanced Electron Beams of Wilmington, MA.

    He is the Principal Process Engineer, and his responsibilities includereliability testing of the electron-beam emitters and high-voltage powersupplies.

  • 7/30/2019 Boston Rs Meeting Jan09

    4/45

    D. J. Weidman January 2009

    4

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal and scope of this talk

    Review basic reliability engineering concepts and showhow they can be used successfully

    Applicable to equipment in the semiconductor industry,and other industries

  • 7/30/2019 Boston Rs Meeting Jan09

    5/45

    D. J. Weidman January 2009

    5

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal

    Reliability program Immediate issues

    Reactive reliability engineering

    Proactive reliability engineering

  • 7/30/2019 Boston Rs Meeting Jan09

    6/45

    D. J. Weidman January 2009

    6

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal

    Reliability program Immediate issues

    Reactive reliability engineering

    Proactive reliability engineering

  • 7/30/2019 Boston Rs Meeting Jan09

    7/45

    D. J. Weidman January 2009

    7

    PVD Machine

    Physical Vapor Deposition of thin metal film Wafers carried on trays to minimize handling & time to

    change size

    Up to five metals in a small footprint

  • 7/30/2019 Boston Rs Meeting Jan09

    8/45

    D. J. Weidman January 2009

    8

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal

    Reliability program plan Immediate issues

    Reactive reliability engineering

    Overall process

    Data gathering to record each issue

    Data tallying

    Reliability engineering metrics with examples

  • 7/30/2019 Boston Rs Meeting Jan09

    9/45

    D. J. Weidman January 2009

    9

    Reliability program

    Failure Reporting, Analysis, and Corrective Action System

    reports faultcustomer

    internet

    access

    Field ServiceEngineer receives

    customer report orobserves fault

    FAR

    parts or

    information

    or both

    Immediately: Serviceteam addresses

    customer need

    FSR

    FTA

    FMEA

    Longer term:Reliability

    teaminvestigates

    ECO

    TestTrack

    Implement /integrateidentified

    improvement

    Deliver

    on new machine

    upgrade existing machine

  • 7/30/2019 Boston Rs Meeting Jan09

    10/45

    D. J. Weidman January 2009

    10

    Machine faults from customers About 300 service reports per product line per year

    Copied from FSR database, pasted into Excel, and reviewed.

    9 entries are shownas an example.

  • 7/30/2019 Boston Rs Meeting Jan09

    11/45

    D. J. Weidman January 2009

    11

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal

    Reliability program plan Immediate issues

    Reactive reliability engineering Overall process

    Reliability engineering metrics with examples Fault vs. failure

    Pareto plot

    Uptime and Availability

    MTBF, MTBA, MTBI

    MTTR, MTR

    Additional metrics specific to industry

    Additional metrics

  • 7/30/2019 Boston Rs Meeting Jan09

    12/45

    D. J. Weidman January 2009

    12

    Reliability definitions: faults

    Fault: anything that has gone wrong

    Failure: an equipment problem

    All failures are faults

    Examples: If a transport system stops due to particles that are normal to the process, then it is a failure (and a fault).

    a left wrench inside, then its a fault but not a failure.

    faults failures action

  • 7/30/2019 Boston Rs Meeting Jan09

    13/45

    D. J. Weidman January 2009

    13

    Pareto plots

    Location Function

    Locatio

    n1

    Locatio

    n2

    Locatio

    n5

    Locatio

    n4

    Locatio

    n6

    Locatio

    n7

    Locatio

    n8

    Locatio

    n3

    FunctionC

    FunctionE

    FunctionI

    FunctionH

    Functio

    nF

    FunctionD

    Functio

    nG

    FunctionB

    Functio

    nA

  • 7/30/2019 Boston Rs Meeting Jan09

    14/45

    D. J. Weidman January 2009

    14

    Machine cross-section

    Location 6 Location 6

    for new control system

    sn323 et seq.

    Location 4Location 3Location 2

    Location 5

    Location 8

    Chase: Location 7

    Front end: Location 1

  • 7/30/2019 Boston Rs Meeting Jan09

    15/45

    D. J. Weidman January 2009

    15

    Faults on all machines in one quarter

    Location FunctionLo

    catio

    n2

    Locatio

    n4

    Locatio

    n3

    Locatio

    n7

    Locatio

    n5

    Locatio

    n6

    Locatio

    n8

    Locatio

    n1

    FunctionI

    FunctionC

    FunctionE

    Functio

    nG

    FunctionF

    Functio

    nD

    FunctionH

    Functio

    nB

    FunctionA

  • 7/30/2019 Boston Rs Meeting Jan09

    16/45

    D. J. Weidman January 2009

    16

    Faults on all machines in one quarter

    Top faults shown bylocation and function

    Allows focusing on the biggesttypes of issues

    Few enough issues percategory per quarter toinvestigate each issue

    Note: A shorter interval, suchas monthly,

    Has the advantage of a fasterresponse if a problem arises

    Has the disadvantage ofnoise due to smallersampling (issues shift backand forth)

    location and function

    1C 2I 2C 4I 4C 1I

  • 7/30/2019 Boston Rs Meeting Jan09

    17/45

    D. J. Weidman January 2009

    17

    Faults on all machines in one quarter

    Location 1 & Function C, 7 new subsystem

    new subsystem new dll

    reboot controller

    reboot controller

    component ineffective issue with test wafers

    Most of these faults are notfailures: upgrades of

    subsystem on oldermachines or rebooting

    No predominant issue

    location and function

    1C 2I 2C 4I 4C 1I

  • 7/30/2019 Boston Rs Meeting Jan09

    18/45

    D. J. Weidman January 2009

    18

    Faults on all machines in one quarter

    Location 2 & Function I, 6

    5 of 6 faults: same component

    Validated a known issue andtwo ECOs to address it

    location and function

    1C 2I 2C 4I 4C 1I

  • 7/30/2019 Boston Rs Meeting Jan09

    19/45

    D. J. Weidman January 2009

    19

    Faults on all machines in one quarter

    Locatio

    n1

    Locatio

    n2

    Locatio

    n5

    Locatio

    n4

    Locatio

    n6

    Locatio

    n7

    Locatio

    n8

    Location Function

    Locatio

    n3

    FunctionC

    FunctionE

    FunctionI

    FunctionH

    Functio

    nF

    FunctionD

    Functio

    nG

    FunctionB

    Functio

    nA

  • 7/30/2019 Boston Rs Meeting Jan09

    20/45

    D. J. Weidman January 2009

    20

    Sample size & machine failures by month

    Locatio

    n1

    Locatio

    n2

    Locatio

    n5

    Locatio

    n4

    Locatio

    n6

    Locatio

    n7

    Locatio

    n8

    Locatio

    n3

    Function

    C

    Function

    E

    Fun

    ction

    I

    Function

    H

    Function

    F

    Function

    D

    Function

    G

    Function

    A

    May JuneApril

    May JuneApril

    Function

    B

    Locatio

    n4

    Locatio

    n5

    Locatio

    n1

    Locatio

    n8

    Locatio

    n3

    Locatio

    n6

    Locatio

    n7

    Locatio

    n2

    Locatio

    n4

    Locatio

    n2

    Locatio

    n5

    Locatio

    n7

    Locatio

    n3

    Locatio

    n1

    Locatio

    n6

    Locatio

    n8

    Function

    C

    Fun

    ction

    I

    Function

    D

    Function

    H

    Function

    E

    Function

    F

    Function

    A

    Function

    G

    Function

    B

    Function

    D

    Function

    C

    Fun

    ction

    I

    Function

    E

    Function

    H

    Function

    G

    Function

    B

    Function

    F

    Function

    A

  • 7/30/2019 Boston Rs Meeting Jan09

    21/45

    D. J. Weidman January 2009

    21

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal

    Reliability program plan Immediate issues

    Reactive reliability engineering Overall process

    Reliability engineering metrics with examples Fault vs. failure Pareto plot

    Uptime and Availability

    MTBF, MTBA, MTBI

    MTTR, MTR

    Additional metrics specific to industry

    Additional metrics

  • 7/30/2019 Boston Rs Meeting Jan09

    22/45

    D. J. Weidman January 2009

    22

    Reliability definitions: uptime, etc.

    All time: either uptime or downtime

    Uptime: either operating or idle time

    Uptime (hours) availability (%)

    Downtime: either PM, or Unscheduled Maintenance (Repairs) MTTR (mean time to repair) applies to PM and to UM

    uptime downtime

    operating /productive

    idle /standby

    uptime

    availability

    PM /Scheduled

    UM / Repairs

    downtime

  • 7/30/2019 Boston Rs Meeting Jan09

    23/45

    D. J. Weidman January 2009

    23

    Reliability definitions: SEMI

    Above plot is from SEMI E10 We assume that Total Time is Operations Time

  • 7/30/2019 Boston Rs Meeting Jan09

    24/45

    D. J. Weidman January 2009

    24

    Machine availability, on average

    Machine Availability

    Specification: availability > 85% Typical performance: 90%

    Measured from Field Service Reports

    quarter

    portionofOperation

    stime

  • 7/30/2019 Boston Rs Meeting Jan09

    25/45

    D. J. Weidman January 2009

    25

    Machine availability, on average

    Machine Availability

    Specification: availability > 85% Typical performance: 90%

    Measured from Field Service Reports Machine PM time approximately 7%. Customers report

    At beta Customer, one machine: 94.3% avail. better than 6% PM

    At another customer: 6% PM reported

    portionofOperation

    stime

    quarter

  • 7/30/2019 Boston Rs Meeting Jan09

    26/45

    D. J. Weidman January 2009

    26

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal

    Reliability program plan Immediate issues

    Reactive reliability engineering Overall process

    Reliability engineering metrics with examples Fault vs. failure Pareto plot

    Uptime and Availability

    MTBF, MTBA, MTBI

    MTTR, MTR Additional metrics specific to industry

    Additional metrics

  • 7/30/2019 Boston Rs Meeting Jan09

    27/45

    D. J. Weidman January 2009

    27

    SEMI E10 definitions

    Assist: an unplanned interruption where

    Externally resumed (human operator or host

    computer), and

    No replacement of parts, other than specifiedconsumables, and

    No further variation from specifications of equipmentoperation

    Failure: unplanned interruption that is not an

    assist # of interrupts = # of assists + # of failures

  • 7/30/2019 Boston Rs Meeting Jan09

    28/45

    D. J. Weidman January 2009

    28

    SEMI E10 definitions

    MTBF, MTBA, MTBI

    MTBF = Interval / (number of failures)

    MTBA = Interval / (number of assists)

    MTBI = Interval / (number of interrupts)

    # of interrupts = # of assists + # of failures

    Interval/MTBI = Interval/MTBA + Interval/MTBF

    1/MTBI = 1/MTBA + 1/MTBF

  • 7/30/2019 Boston Rs Meeting Jan09

    29/45

    D. J. Weidman January 2009

    29

    MTBF (Mean Time Between Failure) in hrsper machine each month

    250 hours is specified

    Based on Field Service Reports

    month

  • 7/30/2019 Boston Rs Meeting Jan09

    30/45

    D. J. Weidman January 2009

    30

    MTBF (Mean Time Between Failure) in hrsper machine each quarter

    250 hours is specified

    Field Service Reports indicate we exceed this

    Quarterly less noisy than monthly

    quarter

  • 7/30/2019 Boston Rs Meeting Jan09

    31/45

    D. J. Weidman January 2009

    31

    Customer-measured MTBF due to our

    improvements

    Per machine, averaged over two machines

    1: Adjusted one of the subsystems

    2: Installed an upgraded version of the subsystem in one machine

    4-week rolling average

    spec 250 hours

    week

  • 7/30/2019 Boston Rs Meeting Jan09

    32/45

    D. J. Weidman January 2009

    32

    MTTR (Mean Time To Repair) and MTR

    MTTR is Mean Time to Repair (SEMI E10definition): the average elapsed time (not person

    hours) to correct a failure and return theequipment to a condition where it can perform itsintended function, including equipment test time

    and process test time (but not maintenancedelay).

    MTR is Mean Time to Restore: includes

    maintenance delays.

  • 7/30/2019 Boston Rs Meeting Jan09

    33/45

    D. J. Weidman January 2009

    33

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal

    Reliability program plan Immediate issues

    Reactive reliability engineering Overall process

    Reliability engineering metrics with examples Fault vs. failure: all failures are faults Pareto plot: location and function, sample size of several

    Uptime and Availability: time is up or down

    MTBF, MTBA, MTBI: I = F + A

    MTTR, MTR: working time vs. clock time Additional metrics specific to industry

    Additional metrics

  • 7/30/2019 Boston Rs Meeting Jan09

    34/45

    D. J. Weidman January 2009

    34

    Practical Reliability Engineering for

    Semiconductor Equipment

    Goal

    Reliability program plan Immediate issues

    Reactive reliability engineering Overall process

    Reliability engineering metrics with examples

    Fault vs. failure: all failures are faults

    Pareto plot: location and function, sample size of several

    Uptime and Availability: time is up or down

    MTBF, MTBA, MTBI: I = F + A

    MTTR, MTR: working time vs. clock time Additional metrics specific to industry

    Additional metrics

  • 7/30/2019 Boston Rs Meeting Jan09

    35/45

    D. J. Weidman January 2009

    35

    Broken wafers

    Goals Ideally zero

    In practice, need fewer than 1 in 10k (or 1 in 100k) Broken wafers reported on four different machines

    Qty 1, Dec, Year 1

    Qty 4, Feb, Year 2

    Qty 4, March, Year 2 Qty 1, May, Year 2

    Total broken wafers reported 1 in Year 1

    9 in Year 2 Q1 and Q2 >7,000k wafers/year on all machines within 1 in 300k

    Total reported on newer-style machines: zero

    F il ti

  • 7/30/2019 Boston Rs Meeting Jan09

    36/45

    D. J. Weidman January 2009

    36

    0

    2

    4

    6

    8

    10

    12

    numberoffa

    iluresper

    Nimbusre

    porting

    month

    #offailuresp

    ercustomer

    machiner

    eporting

    Failures vs. timeper customer machine each month

    First two new-series machines being used 24/7.

    First machine with two new major features shipped.

    2 more machines both arrived at customer

    dropped

    from >5

    to

  • 7/30/2019 Boston Rs Meeting Jan09

    37/45

    D. J. Weidman January 2009

    37

    Unplanned downtime

    by machine life

    Unplanned maintenance (UM) based on FSRs only Actual UM is higher

    Data scattered: 1 std dev values themselves

    All machines have reported in time shown (6 quarters)

    Not customer dependent

    PM > UM

    machine life (Q1 warranty start)

    unpla

    nnedmaintenance

  • 7/30/2019 Boston Rs Meeting Jan09

    38/45

    D. J. Weidman January 2009

    38

    Failures by component

    Component

    1

    Component

    2

    Component

    4

    Component

    5

    Component

    6

    Component

    7

    Component

    3

  • 7/30/2019 Boston Rs Meeting Jan09

    39/45

    D. J. Weidman January 2009

    39

    Component failure rate normalized

    Component 8 failure rateis 3 to 5 times the rate of

    other failures Component 1 failures

    addressed by ECOs

    Component 4 to be movedfrom baseline

    Component 9: Eng project

    Compo

    nent8

    Compo

    nent1

    Compo

    nent9

    Compo

    nent5

    Compo

    nent2

    Component10

    Component11

    Compo

    nent4

    Component12

  • 7/30/2019 Boston Rs Meeting Jan09

    40/45

  • 7/30/2019 Boston Rs Meeting Jan09

    41/45

    D. J. Weidman January 2009

    41

    Most expensive shipments

    Includes all shipments

    replacements upgrades

    5 quarters

    Item1

    3

    Item1

    4

    Item1

    6

    Item1

    7

    Item1

    8

    Item1

    9

    Item1

    5

    Item2

    0

    Item2

    1

    Item2

    2

    Item2

    3

  • 7/30/2019 Boston Rs Meeting Jan09

    42/45

    D. J. Weidman January 2009

    42

    All failures

    Two quarters

    123 reported failures, which fell into 65 categories

    One series and another series

    Categories are named by cause not by symptom

    Other faults were not included. If PM was required, then the fault was not counted as a failure.

    Failures occurring twice or more are plotted, which are in 23 categories.

    Failures occurring three times or more were analyzed13 categories. Next slides

    0123456789

    cause

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 7/30/2019 Boston Rs Meeting Jan09

    43/45

    D. J. Weidman January 2009

    43

    Failures from previous slide: analysis

    Failures occurring three times or more

    1. 8

    2. 7

    3. 74. 6+2

    5. 5

    6. 5

    7. 3 (not 4)

    8. 4+1

    9. 4-1

    10. 4

    11. 3

    12. 313. 3

    Status of design

    improvement

    completed, 24in progress, 7

    not started, 33

    Note: status has not been updated.

    Practical Reliability Engineering for

  • 7/30/2019 Boston Rs Meeting Jan09

    44/45

    D. J. Weidman January 2009

    44

    Practical Reliability Engineering for

    Semiconductor Equipment

    Acknowledgements

    Thank you

  • 7/30/2019 Boston Rs Meeting Jan09

    45/45

    D. J. Weidman January 2009

    45

    End of presentation

    Thank you