Upload
casey-michel
View
179
Download
0
Embed Size (px)
DESCRIPTION
ARGOS Investigation
Citation preview
Active Response Gravity Offload System (ARGOS)
Subject Drop Mishap
JSC 13-0006
IRIS S-2013-022-00002
Close Call Mishap
Date of Mishap: January 16, 2013
Date of Report: March 22, 2013
1
Final Report Findings and Recommendation 3/22/2013
Gail Chapline, Joe Anderson, Mary Cerimele, Mike Cooke, Mike Foreman, John Haas, Art Knell, Asher Lieberman, John Ruppert
NASA/JSC ACTIVE RESPONSE GRAVITY OFFLOAD SYSTEM (ARGOS) INVESTIGATION BOARD REPORT
NR
i
Table of Contents
1 Executive Summary ............................................................................................................................................... 1 2 Acknowledgments ................................................................................................................................................ 2 3 Background ........................................................................................................................................................... 3 4 Investigation Board Objectives ............................................................................................................................. 4 5 ARGOS Description ............................................................................................................................................... 4 6 Investigation ....................................................................................................................................................... 10
6.1 Interviews ....................................................................................................................................................... 10 6.2 Mechanical System ......................................................................................................................................... 10 6.3 Hardware Inspection ...................................................................................................................................... 11
6.3.1 Running Torque Measurements............................................................................................................. 12 6.3.2 Incremental Disassembly and Sample Collection ................................................................................... 13 6.3.3 Detailed Inspection of Major Components ............................................................................................ 18
6.4 Fault Tree ........................................................................................................................................................ 24 6.5 Control System ............................................................................................................................................... 26 6.6 Electronic System ........................................................................................................................................... 28
6.6.1 E-Stop System ........................................................................................................................................ 28 6.6.2 Z-Axis Motor Controller .......................................................................................................................... 29 6.6.3 CAN Bus Interface .................................................................................................................................. 29 6.6.4 Computer Running Trick Simulation ...................................................................................................... 29
6.7 Software ......................................................................................................................................................... 30 6.7.1 Background ............................................................................................................................................ 30 6.7.2 ARGOS Hoist Control System Background ............................................................................................. 30 6.7.3 Software Validation ................................................................................................................................ 31 6.7.4 ARGOS Software Configuration Management ....................................................................................... 31 6.7.5 Software Regression Testing .................................................................................................................. 34 6.7.6 Fault Detection Software Logic .............................................................................................................. 34 6.7.7 Test Data Review .................................................................................................................................... 39
6.8 Safety and Hazard Analysis ............................................................................................................................. 40 6.8.1 Safety ..................................................................................................................................................... 40 6.8.2 Hazard Analysis ...................................................................................................................................... 40
6.9 Engineering Processes, Roles and Responsibilities ......................................................................................... 40 7 Findings and Recommendations ......................................................................................................................... 42
7.1 General Findings: ............................................................................................................................................ 42 7.2 Proximate/Root Causes and Contributing Factors ......................................................................................... 42
7.2.1 Proximate Cause .................................................................................................................................... 42 7.2.2 Intermediate Cause ................................................................................................................................ 42 7.2.3 Root Cause ............................................................................................................................................. 43 7.2.4 Contributing Factors ............................................................................................................................... 43
7.3 Specific Findings: ............................................................................................................................................ 43 7.3.1 Findings Specific to Mechanical System Design ..................................................................................... 43 7.3.2 Findings Specific to the Z-axis Controller System ................................................................................... 46 7.3.3 Findings Specific to Software Design ...................................................................................................... 47 7.3.4 Findings Specific to Safety and Hazards ................................................................................................. 48
8 References .......................................................................................................................................................... 50 8.1 Appointment Letter ........................................................................................................................................ 51 8.2 Materials Chemical Analysis Report ............................................................................................................... 53 8.3 Materials Metallurgical Analysis Report ......................................................................................................... 61 8.4 ARGOS Startup Checklist .............................................................................................................................. 106
ii
List of Figures
Figure 1: Handrail Involved in the Incident.................................................................................................................... 3
Figure 2: ARGOS System Picture .................................................................................................................................... 5
Figure 3: Inline Lifting Components ............................................................................................................................... 6
Figure 4: Yates Shock Absorber ..................................................................................................................................... 7
Figure 5: Spring/Damper Festo Muscle ...................................................................................................................... 7
Figure 6: STI Load Cell .................................................................................................................................................... 8
Figure 7:VNCHI Gimbal Assembly .................................................................................................................................. 8
Figure 8: Genie Assembly without Rope ........................................................................................................................ 9
Figure 9: ARGOS Z-Axis Heavy Lift Assembly Overview .............................................................................................. 11
Figure 10: ARGOS Z-Axis Heavy Lift Assembly Exploded View.................................................................................... 12
Figure 11: Torque Measurement Site ......................................................................................................................... 12
Figure 12: Torque Measurements for Various Knob Positions ................................................................................... 13
Figure 13: ARGOS Project Created Fault Tree ............................................................................................................. 24
Figure 14: High Level Control Loop .............................................................................................................................. 26
Figure 15: Electronics System Block Diagram ............................................................................................................. 28
Figure 16: ARGOS Control System Block Diagram ....................................................................................................... 30
Figure 17: Slide from EMU TRR Software .................................................................................................................... 31
Figure 18: ARGOS Startup Checklist from day of incident ........................................................................................... 32
Figure 19: ARGOS Startup Checklist Path Verification (step 54).................................................................................. 32
Figure 20: ARGOS Trick Simulation Control Screen capture from day of incident ...................................................... 33
iii
List of Tables
Table 1: ARGOS Evolution of Mechnaical System .......................................................................................................... 4
Table 2: Listing of Major Components within Focus Area .......................................................................................... 18
Table 3: Trick Simulation Source Code Files ................................................................................................................ 33
Table 4: Fault Detection for the Hazard of Un-commanded Motion .......................................................................... 38
Table 5: Test Parameters Recorded by the ARGOS Trick Simulation ........................................................................... 39
1
1 Executive Summary On January 16, 2013, during a test in the ARGOS with a test participant in a pressurized Modified Advanced Crew Escape Suit (ACES), the test participant was un-intentionally dropped, approximately 12 to 18 inches (30.5 to 45.7 cm) in the vertical (-Z) direction. Although the test subject did not suffer significant injuries, the potential for a serious injury was present. Serious injury can occur at shorter distances as the test subject has less time to react. The un-intended drop could have been as much as 4 to 5 feet (1.2 to 1.5 m), based on the length of cable released. Slight damage to the test structure and a handrail mock-up was also sustained. The Software, Robotics and Simulation Division (ER) initiated an internal investigation, which included running two tests on the hardware with no test subject, interviews with test participants and a preliminary report. A close call was filed to record the incident.
On January 21, 2013, an Engineering Investigation Board was convened by the Director of Engineering to investigate the close call and identify the causes, and any contributing factors relating to the close call. The team was also charged with developing recommendations to prevent a similar incident.
The Investigation Board found that the incident was most likely caused by partial gearbox binding/jamming causing an undesired motor controller response. The motor controller is COTS hardware, and little to no information was provided by the vendor in regard to how it performs its function, essentially a black box in the system control loop. In combination with the hardware design issues found during the investigation, the controller commanded a high velocity downward motion, resulting in the test subject free falling onto a test mockup (handrail).
2
2 Acknowledgments The Board would like to acknowledge the many discussions with Larry Dungan and Paul Valle regarding the ARGOS. Without their help, answers to endless questions, and thorough knowledge of the ARGOs, the Board could not have completed the investigation.
The Board would also like to thank several consultants to the Board; Monty Carroll and Ray Morales for their invaluable expertise on control systems, Duane Pierson and Linda Shackelford, from the Institutional Review Board (IRB) for their discussions regarding human safety and hazardous environments and Irene Piatek and Charlene Curtis for discussions on Engineering Project management work instructions.
3
3 Background On January 16, 2013, during a test in the ARGOS with a test participant in a pressurized Modified Advanced Crew Escape Suit (ACES), the test participant was unintentionally dropped, approximately 12 to 18 inches (30.5 to 45.7 cm) in the vertical (-Z) direction. Although the test participant fell approximately 12 to 18 inches (30.5 to 45.7 cm), 4 to 5 feet (1.2 to 1.5 m) of wire rope was driven off the drum in the vertical (-Z) direction during this event. The test began normally and approximately nine minutes into the test the drop occurred. At the time the participant was translating along a handrail, simulating microgravity, in a horizontal position, body parallel to the ground, see cover page image. The handrail was mounted on pallets, raising the handrail approximately 24 to 36 inches (61 to 91 cm) above the floor. The participant landed on the handrail, permanently bending the handrail, See Figure 1.
Figure 1: Handrail Involved in the Incident
The drop of the participant was followed by a slight roll to the right and then the gimbal mechanism fell on top of the participants back. The entire event took approximately 0.5 seconds. The participant had minor injuries (bruising), with no medical attention required, and minor damage to the test facility was incurred, bent handrail. The incident was classified as a close call. The test was terminated, the suit and ARGOS personnel, assisted the participant out of the ACES suit.
The ARGOS team initiated a preliminary investigation. Two tests were conducted, which were approved by Test Safety Officer and Software, Robotics and Simulation Division Management. With no weight on the system, the ARGOS configuration GUI was launched and the system was enabled to check the motor controller. After enabling, the system drifted downward slowly about 2 inches (5 cm) during the first two seconds. Then it suddenly moved downward rapidly for 15 inches (38 cm) before a manual emergency stop was initiated by an operator. Next, the cable drum was turned manually by hand and was subjectively noted to be stiffer and difficult to rotate than normal. The ARGOS was then manually jogged, unloaded at 30 rpm, motor velocity, both up and down using the ARGOS Configuration GUI in the unsuited gear ratio. Again, the ARGOS engineers noted that during the jog
4
commands, the system exhibited abnormal behavior as follows: 1. Decreased ability to hold constant velocity, 2. Sluggish acceleration and 3. Sluggish deceleration.
During the event the safety stops which were enabled, all failed to stop the test subject from impacting the handrail. If the handrail had not stopped the drop, the test subject could have been dropped as much as 4 to 5 feet (1.2 to 1.5 m).
4 Investigation Board Objectives The Boards primary objective is to gather the facts and identify the cause(s) and contributing factors relating to the ARGOS incident and to recommend appropriate actions to prevent a similar incident from occurring again. The Board was comprised of members from the Engineering Directorate, Safety and Mission Assurance Directorate, Mission Operations Directorate and Crew Office, see appointment letter EA-13-001.
5 ARGOS Description The goal of the Active Response Gravity Offload System (ARGOS), shown in Figure 1, is to develop the technology for a facility to simulate reduced gravity environments found in low earth orbit, in proximity of asteroids, and on lunar and Martian surfaces. ARGOS is used to evaluate unsuited and suited human performance of ambulation and exploration, EVA tasks at different offloads and with different interfaces, including the use of various gimbals and harnesses. The various tasks intended to characterize human performance on ARGOS include treadmill walking, incline walking and jogging; over ground walking; jumping; exploration type EVA tasks; and other dynamic movements of the human.
The project started initially with an X (horizontal translation) and Z-axis (vertical translation/offload) commercial off the shelf (COTS) mechanical system that expanded to a large X and Y (horizontal translation) and Z-axis (vertical translation/offload) system. A custom Z-axis mechanical system was then designed, prototyped, and tested which was then combined with the COTS X and Y-axis mechanical system.
The evolution of ARGOS mechanical systems is shown in Table 1. The NASA Standards for Lifting Devices and Equipment, Doc #: NASA-STD-8719.9, and other industry standards (ASME B30.2, Overhead and Gantry Cranes and ASME B30.5 Mobile and Locomotive Cranes), have been used as guidelines but there is no Voluntary Consensus Standards that specifies the design or operation of a ground based human rated robotic system. Testing of the system started with simple activation of the motor, progressed to static weight testing, utilization of a Stewart platform, and finally a human in the loop. Performance data collected from ARGOS Generation 1 led to development of an improved Generation 2 ARGOS.
COTS X and Z Human Interaction Testing 8/26/2008
COTS XYZ Human Interaction Testing 2/13/2009
Generation 1 Custom Z Non-Human Interaction Testing (Stewart 6-DOF platform gait simulator)
4/9/2009
Generation 1 Custom Z Human Interaction Testing 7/24/2009
Generation 1 Custom Z Human Attached Testing 9/28/2009
COTS XY, Gen 1 Custom Z Human Interaction Testing 1/19/2010
COTS XY, Gen 1 Custom Z Human Attached Testing 4/8/2010
Generation 2 Custom XY Human Interaction Testing 6/20/2011
Generation 2 Custom Z Human Interaction Testing 9/30/2011
Generation 2 Custom XYZ Human Interaction Testing 11/7/2011
Generation 2 Custom XYZ Human Attached Testing 3/1/2012
Table 1: ARGOS Evolution of Mechanical System
5
The Generation 2 ARGOS system has two different gear ratios (Unsuited and Suited). The Unsuited gear ratio provides the capability to offload up to 300 lbf (1334 N) with high dynamic capabilities. The Suited gear ratio provides the capability to offload up to 750 lbf (3336 N) with low dynamic capabilities. The system works by providing a constant force offload through an overhead motion control system. The Generation 2 ARGOS system provides a wider range of capabilities for robotic, rover, and human space flight testing. The following sections provide descriptions of the major sub-systems.
Figure 2: ARGOS System Picture
6
E-stop System
An emergency shutdown can be activated by the following:
Manual activation of the e-stop by the test team. Automatic activation of the e-stop by the motor controller in the event of system fault requiring an
emergency stop.
Automatic activation of the e-stop by the limit switch system. In each direction of travel the system is equipped with two limit switches as required by the NASA crane standard. This e-stop can only occur if the first limit switch has failed.
Subject Force Input
Due to the dynamic movement capabilities of ARGOS, forces can be induced into the person whose weight is being offloaded in the event that an emergency shutdown is required. These forces will not be seen during normal operations. The analysis of potential forces is very difficult and a very conservative worst cases analysis indicates forces could reach approximately 2698 lbf (12kN). The ARGOS team worked with human performance experts to determine the force levels of a world class athlete jumping upward and have the system e-stop activation occur at the worst time, just after leaving the ground. The probability of this is very small and most people or systems could not achieve the required kinetic energy. However this case was considered and the hazard controlled. Figure 3 illustrates the components in the lifting path with the exception of the gimbal assembly.
Figure 3: Inline Lifting Components
The OSHA limits for fall protection at the hook attachment point are 1800lbf (8kN) (OSHA 29 CFR Parts 1910 and 1926). To prevent these forces from transferring into the human, a Yates shock absorber (shown in Figure 4), a COTS product utilized in climbing fall protection, is installed in line with the lifting cable. The Yates part number is 602. The shock absorbers deploy when forces exceed 450 lbf (2kN). The 450 lbf (2kN) is based on the manufacturers design and data which was confirmed with deployment tests. The forces into the human or robot would not exceed 450lbf (2kN) and is one-fourth the allowed OSHA forces. Over the past four years of testing there has not been a deployment of these devices during human testing.
7
Figure 4: Yates Shock Absorber
Series Elastic Actuator (SEA)
A series elastic actuator (SEA) provides spring and dampening in the load path. SEAs adds a spring with a known spring constant in series with manipulators to increase compliance and decrease natural frequency. This spongier manipulator results in better force control allowing improved tuning of the system and increased stability. A COTS product from Festo Inc, shown in Figure 5 is utilized. This product is actually a pneumatic muscle being used in a constant pressure application. This device was evaluated and determined to not be a pressure system. Festo muscle is used in the load path with a load rated choker in parallel. There are two Festo muscle lengths that can be utilized and any combination may be placed above and below the load cell. ARGOS currently uses two Festo muscles in line with the load cell (one above and one below).
Figure 5: Spring/Damper Festo Muscle
Load Cell
An STI load cell, shown in Figure 6, with an amplified output provides the force measurement. The cable is double shielded and the electronics are housed in a metal box to decrease electromagnetic interference. A programmable anti-aliasing filter is utilized as a low pass filter to eliminate aliasing issues between the load cell and a/d converter. The force measurement is sampled every millisecond for input to the control logic which adjusts the output velocity of the motor needed to maintain desired off-load force throughout load disturbance.
8
Figure 6: STI Load Cell
Gimbal
The Versatile Neutral Capability Horizontal Interface (VNCHI), shown in Figure 7, is attached to ARGOS via the Festo muscle and to the suited subject. Other gimbals and harness setups are available/utilized depending upon test objectives.
The VNCHI gimbal design is intended to connect a human test participant to the ARGOS in the horizontal position for microgravity simulation. The intent of the VNCHI gimbal assembly is to have a system that provides roll, pitch, and yaw rotations about the test participants center of gravity (CG) while connected to ARGOS in the horizontal position. The gimbal attaches rigidly to the test participants hang-gliding harness, which the participant lays in securely. There are adjustments to align the participants CG with the lifting path, so the CG is always centered under the ARGOS cable. The gimbal consists of custom Aluminum 6061-T6 and 1515-5 PH Stainless Steel parts with COTS bearings and fasteners.
Figure 7:VNCHI Gimbal Assembly
Emergency Egress
In the event of a power outage or system failure that prevents the function of ARGOS the test participant will be removed from the system by a rolling stair case ladder. If the treadmill is being used at this time, a small stair case ladder will be placed on the treadmill deck and the participant will walk down the ladder. For a power outage with a robotic system the load will be treated as a suspended load and removed after power has returned to the facility.
In the event the test participant becomes injured and is unable to walk down the stair case ladder a Sky Genie variable descent device will be deployed to lower the person to the ground. The Sky Genie was used by the Space Shuttle program for crew member emergency egress from the orbiter. It is shown below in Figure 8. Prior to each use the Sky Genie hardware, rope, and cables are inspected for cuts, frays, broken strands, or other visual damage. The rope is changed out after two years of use, and has a shelf life of 5 years. The attachment point onto the z-axis
9
is rated for a 4945 lbf (22 kN) load as required by OSHA and the vendor documentation. The Sky Genie is attached to the z-axis and lifting path by locking carabiners. The Sky Genie is a controlled descent device and not intended for use as a fall protection system.
Figure 8: Genie Assembly without Rope
For testing with individuals in space suits or other loads where the preference may be to remove the load with the man basket instead of the Sky Genie a 4 x 8 feet (1.2 x 2.4 m) COTS man lift backset attached to the fork lift is used to lower the load to the ground. Personnel in riding in the man basket are required to wear fall protection equipment. When required this equipment and a certified operator are required in the ARGOS area during the testing.
Controller
See Controller System, Section 6.5
Electronics
See Electronics System, Section 6.6
Software
See Software, Section 6.7
Mockups
In the ARGOS test area several floor mockups are used to simulate space station hand rails, bolt torquing, different rock surfaces and interactions. These mockups are moved in/out of the test area as needed. These mockups do include rocks and the hazards associated with handling rocks. The use of hand tools and battery powered drills are part of the tasks conducted with these mockups.
10
6 Investigation 6.1 Interviews Limited interviews were conducted as witness statements were taken by the ARGOS team immediately after the close call. Two interviews were conducted. The first interview was with the subject crew member in the close call. This test run was the first experience for the test subject in the ARGOS. So there was no comparisons he could draw on as far as how the system behaved. He also stated that since he was in a modified ACES suit, with headphones on, he was insulated both physically and from external noises. He said although he was dropped 12 to 18 inches (30.5 to 45.7 cm) onto the handrail, and that the harness fell on top of him, he was not injured. He did experience a fairly good impact on the face plate of the helmet that hit his jaw. He also stated that it was difficult to tell where resistance came from, as far as the suit (pressurized at 4.3 psi (29,650 N/m
2)) or ARGOS when
translating. Just prior to the incident, he was translating along the handrail, using both pull and push, but was not commanding a downward motion.
The second interview was with Safety and Test Operations Division, subject matter expert on lifting requirements in NASA Standard 8719.9. Most of the requirements in this document were believed to have been met by the ARGOS team but there are some exceptions, specifically with the control system design and the limit switch configurations. The Board members and the subject matter expert did agree that ARGOS has unique performance requirements and Chapter 4 was the closest fit lifting system in NASA Standard 8719.9 in terms of providing guidance to the design team.
6.2 Mechanical System For the purposes of this investigation, only the ARGOS Heavy Lift Z-Axis Assembly (ARGOSZAE500) will be discussed. See Figure 9.
The Heavy Lift Assembly is a NASA designed electromechanical system whose basic function is to raise and lower a suspended object or human in response to commands issued from a force feedback control system. The object is suspended via a Hoist Cable wrapped around a spiral cut Drum which can rotate and translate. The rotation of the drum provides the change in object elevation, while the translation (synchronized to the spiral lead) maintains a constant cable exist point and prevents cable layering. The assembly contains redundant fail safe brakes and an integral servomotor brake that will engage to prevent Drum rotation when power is removed.
Connected in series to the Drum, is a constantly meshed two-speed transmission. The transmission makes use of helical cut gear sets to reduce vibration and driveline noise so that disturbance inputs to the force feedback control system are minimized. The transmission contains two manually selectable gear ratios:
1. Unsuited Gear Ratio: This ratio is used for objects whose mass is less than 300 lbf (1334 N) 2. Suited Gear Ratio: This ratio is used for objects whose mass is less than 750 lbf (3336 N)
**The use of Suited/Unsuited does not describe the configuration of the test object.
The gear ratios have no synchronization and require complete offload before selection. The selection mechanism is comprised of a Shift Fork connected to a Clutch Plate with anti-friction nylon pads. The Shift Fork moves the Clutch Plate between the desired gear ratios by use of a spline drive, driven externally and manually by a Gear Selector Knob. Positive indication of transmission engagement is accomplished visually by locking the Gear Selector Knob into position and electronically by end of travel limit switches.
Connected to the transmission is an AC servomotor manufactured by Kollmorgen, driven by an off the shelf motor controller and commanded by a NASA designed control system. The control system receives object position data from an absolute encoder geared off the Drums rotation shaft, an integral AC motor encoder and two Drum end of travel limit switches.
11
Figure 9: ARGOS Z-Axis Heavy Lift Assembly Overview
6.3 Hardware Inspection On January 28, 2013 the ARGOS Heavy Lift Z-Axis Assembly (ARGOSZAE500) was removed from the Heavy Lift X-Axis Assembly (ARGOSSTE502) and placed on a disassembly bench in NASA JSC Building 9. The hardware inspection team was comprised of all Board members and representatives from the Software, Robotics, and Simulation Division (ER). The goal of the inspection was to evaluate the hardware for any signs of binding, seizing or jamming using the following approach:
Prior to disassembly, measure the systems running torque, under no load Perform a complete visual inspection of all assessable rotating parts Develop a focus area, comprised of major components most likely to cause mechanical failures Disassemble the items within the focus area incrementally to allow for visual inspection and photography
Shown below in Figure 10, is an exploded view of the Heavy Lift Assembly and identification of the focus area and its major contributors. Each major component is identified using an item number for reference in subsequent discussions.
12
Figure 10: ARGOS Z-Axis Heavy Lift Assembly Exploded View
6.3.1 Running Torque Measurements
Initial inspection of the hardware using external torque measurements was performed with the unit intact. Using a calibrated dial type toque wrench, both break away and running torque measurements were taken on the output shaft (Drum rotation axis) for various positions of the Gear Selector Knob, See Figure 11. Measurements were performed by Board member, Joe Anderson, with care taken to minimize inertial loading onto the measurement device.
Figure 11: Torque Measurement Site
The position of the Gear Selector Knob was varied between different states to evaluate drag from the selector mechanisms rigging. Measurements for each knob position were repeated a minimum of three times and the averages are presented in Figure 12. The results show the lowest amount of torque was achieved in the neutral position (fewest rotating components) and that the Suited and Unsuited locked gear selections resisted with
13
approximately 30 inlbf (3.4 Nm). A noticeable change in torque measurement was seen when the Unsuited selection was toggled between locked and unlocked. The cause was due to a rigging method that allowed the internal Shift Fork (Item 11) to be preloaded against the rotating Clutch Plate (Item 5) such that frictional drag was introduced into the gear train. Other that than the friction effect noted, no other anomalies were discovered and the gear train rotated smoothly under no load.
Figure 12: Torque Measurements for Various Knob Positions
6.3.2 Incremental Disassembly and Sample Collection
Following the torque measurements, the Board tasked the ER design team (Paul Valle and Dian Poncia) to start disassembly. During the course of the disassembly, an incremental process of component removal followed by visual inspection and material sampling was used. The following collection of images (Sites 1 10) is used to show the areas of the gear box that were noted as critical inspection points and where specific material samples were taken. See Section 8.2 for a detailed chemical analysis of the collected samples. Refer to Figure 10 for item number references.
14
Inspection Site 1: This site contained excess lubricant and particulates on the Clutch Plate (Item 5). This area was of particular interest due to the increased running torque recorded during the pre-disassembly torque tests.
Inspection Site 2: This site was used to obtain a fresh grease sample for use in setting a baseline for subsequent materials evaluation.
Inspection Site 3: This site contained additional particulate debris on the Clutch Plate (Item 5). The Clutch Plate area is of particular interest as it is used to transmit motor loads to the two available gear sets. Due to the close proximity of rotating components and their inherent misalignments, the probability for mechanical interference and debris generation is increased in this area.
15
Inspection Site 4: This site contained grease and residue from the interaction of the Output Gear (Item 1) and the Unsuited Gear (Item 2).
Inspection Site 5: This site contained grease and metallic debris caused by unintended contact between the Suited Gear (Item 4) and the Snap Ring (Item 10).
Inspection Site 6: This site contained grease and particulates from the interaction of the Suited Gear (Item 4) and its adjacent Thrust Washer (Item 7).
16
Inspection Site 7: This site contained grease and metallic particles generated from dithering action between the Shifting Shaft (Item 6), its drive gear and a closeout snap ring. By design these items are keyed to permit torque transmission, however excess clearance and hardness mismatch lead to galling and wear. Inspection of the design identifies a spiral retaining ring to be installed in indicated position, actual hardware had an open snap ring.
Inspection Site 8: This contained a small piece of plastic debris (Delrin) located on the RH Torque Spline (Item 12). This debris was most likely dislodged from the splines nut located on the Shift Fork (Item 11). Inspection of nut shows signs of wear, but no significant failures.
Inspection Site 9: This contained a piece of plastic debris (PVC and Kapton) located on the bottom of the gearbox housing. Debris generation site is unknown and not seen as an incident contributor.
17
Inspection Site 10: This site contained metallic debris generated by the interaction of the Hoist Cable (Item 13) and the spiral cut drum. Post inspection of the drum and cable showed no signs of detrimental wear or erosion.
18
6.3.3 Detailed Inspection of Major Components
After disassembly, the major components from the focus area were sent to the Structural Engineering Division (ES) for a closer examination:
Table 2: Listing of Major Components within Focus Area
As mentioned earlier, items not listed in the table above such as the Drum, Linear Guides, Motor, and Radial Ball Bearings were deemed non contributors to any gearbox faults. The ER division was left in control of the non-listed items, however they were asked to not perform any side investigations. Presented below is the summary of the major findings from the examination of the items listed above. See Section 8.3 for the complete listing of findings. Refer to Figure 10 for item number references.
Item 1, Output Shaft 36T Gear, ARGOSZAD471: The face and outer teeth edges of the Output Gear show significant signs of wearing and chipping due to unintentional contact with Unsuited Gears Dog Plate (Item 2).
19
Item 2, 18T Gear Assembly, ARGOSZAD448 (Unsuited Gear): The face and outer edges of the Unsuited Gears Dog Plate showed signs of unintentional contact with the Output Gear (Item 1).
Item 3, Rush Gear 36T, ARGOSZAD455: The face and outer teeth edges of the Rush Gear 36T show significant signs of wearing and chipping due to unintentional contact with Suited Gears Dog Plate (Item 4). Furthermore, the gears shaft experienced .03 in. (.076 cm) axial free play, further increasing the contact potential.
Item 4, 15T Gear Assembly, ARGOSZAD446 (Suited Gear): The face and outer edges of the Suited Gears Dog Plate showed signs of unintentional contact with the Rush Gear 36T (Item 3).
20
Item 5, Clutch Plate, ARGOSZAD450: The Clutch Plates annular sector shaped cutouts (6X) show signs of uneven loading. Load contact patterns generated by the Dog Plate Teeth are located on the radial face, the inner diameter surface and the outer diameter surface ideally all six radial surfaces would be equally loaded. Uneven loading causes overturning moment loading to exist on both the Unsuited (Item 2) and Suited (Item 4) Gears. Unaccounted for moment loading reduces needle bearing life and causes misalignments leading to the mechanical interferences seen on the Unsuited and Suited Gears (Items 2 & 4), the Output Gear (Item 1) and the Rush Gear 36T (Item 3).
Item 6, Shifting Shaft, ARGOSZAD442: The Shifting Shaft shows signs of the following:
Uneven loading from the Unsuited (Item 2) and the Suited (Item 4) Gear Needle Bearings due to incompatible diameter sizing
Surface Brinelling due to needle bearing edge loading Surface wear due to incapable surface hardness
21
Items 7 and 8, Thrust Washers, 7421K26 & 7421K29: The Thrust Washers used to isolate the Unsuited (Item 2) and the Suited (Item 4) Gears from the Shifting Shaft (Item 6) experienced wear from exposed Dog Plate fasteners.
Item 9, Key, ARGOSZAD494: The Key is used to anti-rotate the Shifting Shaft (Item 6) with respect to its drive gear. The key was hand fit during assembly to a length that allowed it to become lodged under the Unsuited Gear (Item 2). The interference is not a contributor to the incident, since no relative motion occurs between the Shifting Shaft and Unsuited Gear during Unsuited Gear operations. The interference will only be problematic for Suited Gear operations.
22
Item 10, Snap Ring, VS-100: Excessive clearance between Suited Gear (Item 4) needle bearings and the Shift Shaft (Item 6) caused the snap ring to be side loaded with relative motion.
Item 11, Shift Fork, ARGOSZAD465: The Shift Fork contains anti friction pads (nylon) to engage the Clutch Plate (Item 5). During pre-disassembly running torque measurements, it was noted that the Shift Fork was preloaded into the Clutch Plate during Unsuited operations. The effect of this preload is apparent when examining the nylon pad wear patterns.
23
Item 12, RH Torq Spline, ARGOSZAD467: The Torq Spline is used to drive the Shift Fork (Item 11) between the Unsuited (Item 2) and the Suited (Item 4) gear selections showed no signs of failure or wear. Upon examination of the Spline mount design, an unintentional clamp up at Location A as well as an interference at Location B are possible.
Item 13, Hoist Cable, AI 4FZC: The Hoist Cable which interfaces with the Drum and the test participant showed no signs of failure or wear.
24
6.4 Fault Tree The Board was not chartered to create an independent fault tree for this incident. However, the Board did review the fault tree that the ARGOS Project generated as shown in Figure 13.
Rapid Descent of Crewmember in ACES space suit while testing in
micro-g
Crew member impacted hand rail
Caused by
Output drum could not rotate
Manual turning of the drive determined that higher forces than expected were required
Shift fork is misaligned
Other unknown or new failure mode
Stop:
Visual inspection of gear box, damaged metal finish
Incorrect Software
Stop: System software was verified correct during startup and
verified again after incident. (screen shot available)
Trick Software Failure
Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally
ARGOS Came Out of Gear
Stop: After event shift know was still locked in place. Microswitches did not indicate a out of gear
Out of gear check between encoders was not activated
CAN network communication
failed
Stop: NODE Guard error checking did not detect an error. All data is correct
Z-axis electronics box failure
Stop: All data is correct. Load cell and encoder data was correct when checked after the event
Output Shaft Encoder Failure
Stop: All data is correct. Encoder data was correct when checked after the event
Load Cell FailureStop: All data is correct. Load cell data was
correct when checked after the event
Power Outage or Sag
No F16 was received on motor controller and data indicate motor operation
Safety System Failure
ARGOS safety system performed as designed during the event
Shutdown of the motor controller
output stage
Data indicates the motor controller was active during the free fall
Failure of the ARGOS Brakes
ARGOS brakes were not activated during the free fall. They did lock when commanded
Position data indicates no motion in the system
Motor controller increased current to
the motor
Binding of Gear Box Came Free
Feedback from the motor encoder
indicated no motion
Based on data
Could motor controller be tuned
different to prevent?
Motor controller operated as designed/
programmed
Could trick software be modified
High Friction, binding, burr, or
failure of shift fork pads on shift fork to
clutch interface
Inspection of gear box required
Motor commanded downward by
software outside fo the control loop
Z-axis control loop went unstable
Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally
Electronics failed due to EMI
Motor controller and motor failed
due to EMI
Stop: Software froze correctly, data indicates it continued commanding until frozen. After event software was restarted and operated nominally
Electronics had been tested known EMI sources in the building and a custom load cell was developed to prevent
EMI interference. EMI filters are present on all power input lines.
Electronics had been tested known EMI sources in the building
Faulty Cables
No motor controller faults were present indicating cable failure. No can bus failure was present. Data
transmission was correct after event. Motor powered properly after event
Figure 13: ARGOS Project Created Fault Tree
25
The following is a list of general findings from the Projects fault tree:
The fault tree correctly identified the fault, Rapid Descent of Crewmember in ACES space suit while testing micro-g
There are 14 level-1 causes in the tree Of the 14 level-1 causes, the project decided to only work on 2 of the 14 paths which are related to
binding
Our interviews observed that the project is biased to binding being the causal path The project needs some expert facilitation with the development of a fault tree and associated root cause
analysis
26
6.5 Control System The Board was chartered to review the ARGOS z-axis control system and controller to determine if the incident resulted from a controller failure or non-modeled event. Findings and recommendations are to be reported; however, the Board is not chartered to resolve issues nor design a controller. To satisfy these goals the Board met with the ARGOS control engineers and researched pertinent documentation. The investigation results follow.
An executive summary of the Board findings concerning the controller performance concludes there is insufficient data to state the controller response was erroneous or the controller was unstable. A major contributing factor leading to this conclusion is the proprietary control logic for the motor controller/motor therefore no knowledge of what was occurring in this unit during the incident could be ascertained. Also, no control system simulation was developed therefore analyzing off nominal conditions such as binding and its effects could not be performed. Finally, recorded test data was insufficient, did not include outputs that were required to characterize control
response. In the absence of sufficient testing, modeling and vendor information, the rationale for a rapid downward controller command is indeterminate, and will be discussed in this section.
Investigation of the ARGOS z-axis control system and controller is based on the criteria of meeting design, development, testing and evaluation (DDT&E) processes. The DDT&E process elements includes: 1) detailed block diagram of the control system; 2) develop simulations for time domain and stability analysis; 3) define performance and stability requirements; 4) develop test matrix to analyze stability, performance and verify requirements; 5) documentation of all work.
After meeting with the ARGOS control engineers it was determined there is no dedicated control systems document, no detailed block diagrams, no simulation of the control system. A high level description of the control system is presented in document SRSD-11-016 Failure Modes and Effects Analysis (FMEA) Active Response Gravity Offload System Generation 2. Basically the control system (Figure 14) consists of a proportional/derivative (PD) outer loop, inner loop consisting of the motor controller and motor utilizing a proportional, integration, derivative (PID) logic, a load cell registering the force on the cable. The inner loop is a black box with the PID controller being proprietary therefore there is little insight into its makeup. Additional elements include the gearbox, cable drum, encoders, a/d converters, and saturation limiters, hysteresis, latency, and converters.
Figure 14: High Level Control Loop
A detailed block diagram of the control system is a critical early design step and is required for analysis prior to human rating the hardware. This should been done with or without human rating. The block diagram defines all critical control loop parameters needed to complete the design of a system. Without the block diagram the control loop parameters cannot be determined correctly. Even with a Black Box in the loop it is possible to characterize the system (at least to prove stability) with testing. It can then be determined if the safety systems are fast enough to protect the test subject and if the bus speed is sufficient to provide communications between the blocks without causing delays. During discussions with the ARGOS control engineers it was learned the control system was tuned by lowering the gain on the inner loop (black Box) and adjusting the outer loop gains until desired performance was attained. A concern is since the gain is low on inner loop the gain on the outer loop has to be higher to drive the controller during rapid changes in speed. This can lead to saturating the amplifier on the
27
previous stage resulting in a nonlinear response. To fully characterize the control system the inner loop PID knowledge is required or at a minimum construction of a transfer function. The ARGOS team stated they tried but could not develop a model due to non-linear response. It is the Boards recommendation that the ARGOS team contact control system engineers in other Engineering Divisions (Aeroscience and Flight Mechanics and Structural Engineering Divisions) in an effort to develop a model.
An ARGOS simulation of the control system is required to conduct performance analysis with off nominal conditions and failures, Monte Carlo runs, stability and frequency response. The simulation will support the ARGOS certification and provide insight into system response for off nominal conditions. If a simulation had been developed then a reenactment of the incident could have been run to observe the system behavior. Thus pointing to the cause and a potential work around. A simulation will require an ARGOS detailed block diagram with representative modeling of the elements. The ARGOS team decided to build the actual unit and test with it. There is a limit to what can be tested on the ARGOS unit, frequency of test, and data gathering. It is the Boards recommendation that the ARGOS team develop a simulation that can characterize the system by performing frequency response, stability analysis, constrained motion testing, interaction between horizontal and vertical controller, and Monte Carlo runs.
It was difficult to find documentation of the control system how it was developed and finally verified. In some cases detailed documentation did not exist. There was a test matrix for the ARGOS unit; however, since it was applied to the actual unit there were limitations to what could be tested. It is uncertain that all the control system performance and stability requirements were tested on the ARGOS unit. Again the need for a simulation can be argued for. Detailed documentation for all levels of the DDT&E process should be completed. Without this documentation it is nearly impossible to reconstruct the control system and the expected performance.
Applying the above observations to the incident explains the indecisive result. What can be backed out from the data available is the motor was sending a command to move but no motion was seen (possible binding). It is assumed the inner loop PID controller (black box) continued to increase current to the motor until it broke loose. It is possible there was wind up on the integrator term therefore once the binding was overcome it took the system time to respond. This is a guess since the PID block diagram is proprietary. It is also plausible with a binding condition the controller could have been unstable however no way of determining it. Implementing a DDT&E process as outlined above will mitigate or reduce the possibility of this type of event occurring.
28
6.6 Electronic System The ARGOS Z-Axis electronic system consists of 3 subsystems as can be seen in the Figure 15 below.
Interface to Load
MotorMotor Controller
Gear Box
Computer running Trick Simulation
CAN bus
Power Distribution5VDC 3 Phase 208VAC 24VDC 24VDC
Load Cell
Load Cell Amplifier, A/D Converter, and CAN
Converter
E-Stop System
Limit Switches
Brakes
Horizontal System
RS232
CAN bus
A/D Converter and CAN Converter
Gear Selector Switches
Encoder
CAN bus
Control Loop
Safety System
Safety System
Control Loop
Data Collection
Node Guard
WatchDog Timer
Fault Detection
Node Guard Heartbeat
Node Guard Heartbeat via CAN bus
Safety System
Fault Detection
Figure 15: Electronics System Block Diagram
The following sections give a brief description for each of the 3 subsystems within the Z-axis system
6.6.1 E-Stop System
This system is a dedicated safety system which monitors fault status from the Motor Controller and senses the upper and lower crane limit switches and position encoder and performs safety hazard controls (i.e. outputs to the lifting system brakes and disables the motor controller).
29
6.6.2 Z-Axis Motor Controller
This system is a COTS system supplied by the vendor of the Z-axis motor. The system consists of 2 closed loop control systems implemented via complex electronics. One is a PID control loop and the other is a motor current control loop. The COTS system does not provide any electronic mechanism for time synchronizing the 2 internal control loops with the outside world.
The system provides major external status/control interfaces which the overall ARGOS electronic system uses as follows:
Discrete fault output and enable input This interface is used by the E-Stop System to perform emergency stops via the brakes and also disables
the motor controller via the enable input.
6.6.3 CAN Bus Interface
This interface is used by the Trick Simulation to send motor control commands and to receive available status from the motor controller.
This interface is also used to disable the motor controller when faults are detected.
NOTE: Even though the CAN Bus is a very deterministic interface (i.e. time synchronized), the motor control loops are not synchronized with the Trick simulation computer or software.
6.6.4 Computer Running Trick Simulation
The computer which runs the ARGOS software is a COTS computer. There are 5 interfaces which are used to perform the outermost ARGOS control system and data acquisition functions by the Trick Simulation software.
Z-axis Motor Controller Interface
This is the CAN Bus interface as described above in the Motor Controller section.
Z-axis Load Cell Interface
This interface is accomplished via a CAN Bus enabled A/D converter and a load cell amplifier.
Z-axis Gear Selection Switches Position Interface
This interface is accomplished via a CAN Bus enabled A/D converter to read the position of the gear selector.
Z-axis Drum Encoder Interface
The ARGOS system has a position encoder separate from the one internal to the Z-axis Motor Controller. This interface is also a CAN Bus.
X-axis and Y-axis Horizontal System Interface
This computer also interfaces to the X-axis and Y-axis control systems via asynchronous RS-232 digital interfaces
The next section on software will address any computing resource limitations for this computer.
There were no electronic system design deficiencies found by the Board. However, due to the lack of time synchronization between the two internal control loops of the Z-axis Motor Controller and the overall Trick Simulation control loop does present a challenge to the overall control system modeling effort. Please see the Control System section for related control system modeling findings.
30
6.7 Software
6.7.1 Background
The case of a software fault causing the motor to unintentionally drive down at maximum velocity is investigated in this section of the report.
The ARGOS Software is under development by NASA using the Trick Simulation environment to provide force feedback control system functionality as well as certain system safety parameters required to operate the ARGOS. This software is used by the ARGOS console operator to perform system setup, operation and some of the emergency fault detection and response.
The scope of this analysis was to determine the following:
Approved software was in use on ARGOS during the incident Approved software followed the ARGOS configuration management plan and all modifications were
approved per the plan
Regression testing performed on ARGOS software safety functions Software Fault Detection Logic Test data recorded during the incident was representative of the system parameters being measured Findings and recommendations to apply to the ARGOS software development process see Section 8.2.3
6.7.2 ARGOS Hoist Control System Background
The overall ARGOS Hoist control system works to maintain a target offload force in the lifting cable, which results in a reduced gravity (or microgravity) simulation for the test participant. The two key components of this control system are the Trick Simulation ARGOS controller and the Kollmorgen Servostar S620 motor controller, which work in conjunction with various sensors to consist of the overall ARGOS control system (Figure 16). All control system calculations outside of the motor controller make up the ARGOS Controller written in the Trick Simulation Environment.
Figure 16: ARGOS Control System Block Diagram
The ARGOS controller is implemented with the NASA Trick Simulation environment running on a Linux Cent OS workstation. The computer control is running a one millisecond control cycle commanding a Kollmorgen Servostar S620 motor controller over a CAN Bus network. The Trick Simulation provides most of the system integration to read the cable tension, output drum encoder, gear selection switches and communicate with the s620 motor controller. Figure 16 is to be viewed as a high level loop of the controlling components in ARGOS. Multiple controllers are embedded into the ARGOS controller and Motor controller. These are discussed in the controls analysis of the investigation report.
31
6.7.3 Software Validation
The software validation evaluates the ARGOS Controller block described in Figure 16. This software is the NASA developed Trick Simulation performing the force feedback control logic and a number of fault detection scenarios. The ARGOS software falls under the requirements of Configuration Management (CM) Plan for the Active Response Gravity Offload System (ARGOS) (SRSD-08-005.A). The plan outlines requirements for use of version control software to managed released versions of production code. All software modifications on ARGOS are approved through a Test Readiness Review (TRR) prior to any human off-load testing. The software version being run is verified on a daily checklist performed prior to each day of operations on ARGOS.
The approach to verify that approved production software was utilized on the ARGOS during the incident included evaluation of the configuration management plan steps being followed, documentation that the approved software was running, and ensuring source code modifications were approved by a TRR.
The requirements per the ARGOS CM plan (SRSD-08-005.A) include the following:
Software changes are approved through TRR Software release is given a version description and control number and is managed with a software
version control application
A change request is processed by the ARGOS Configuration Control Board
An ARGOS operations daily checklist (Reference 9) ensures that the production software executable is selected when running the ARGOS Control Software.
A common Linux application was utilized to perform a difference check between the ARGOS source code prior to and after the most recent TRR that approved software modifications.
6.7.4 ARGOS Software Configuration Management
Based on interviewing the ARGOS software developer, the most recent change to the ARGOS software was the addition of using the output drum encoder velocity as a biasing term in the controller to maintain the current velocity. This modification was approved by the ARGOS EMU TRR conducted on 11/26/2012. The information provided in the TRR is shown in Figure 17 from the TRR slides. This provided a baseline for checking that the ARGOS daily checklist was updated to the current software version and the version was operating on the ARGOS Control Computer. The daily checklist, Figure 18 and Figure 19, show that the operator confirmed the verification step. The ARGOS computer screen capture after the incident is also consistent that the software path was opened correctly (Figure 20).
NASAJohnson Space Center
Engineering Directorate
NAME:
DATE: PAGE:22
Larry K. DunganSUBJECT:
November 2012
The z-axis software has been updated to improve the realism of the offload simulation New variable allows motor velocity to influence continued motion
of the system ie. Allows the load to coast until the equal and opposite force is
received All safety systems and controls are unchanged
Motor velocity graph was changed from motor RPM to linear velocity
Software has been fully tested with load Software has been revised and released per the ARGOS
CM plan Procedure has been updated for new steps and revision
number
Software
Figure 17: Slide from EMU TRR Software
32
Figure 18: ARGOS Startup Checklist from day of incident
Figure 19: ARGOS Startup Checklist Path Verification (step 54)
33
Figure 20: ARGOS Trick Simulation Control Screen capture from day of incident
The source code used to build the production software was reviewed to determine if the modifications to the software were consistent with the TRR approval. This required a review of all source files to identify changes that were implemented from the previous software version. The changes being evaluated were for the addition of a velocity based variable to be included into the ARGOS Control software and a change to how the motor velocity would be displayed to the operator in terms of load linear velocity rather than motor RPM.
The files reviewed were the following:
Filename Date Modified Description
ARGOSApplication.java 11/15/2012 GUI for velocity control variable
ControlApplication.java 11/15/2012 GUI for velocity control variable
ATM60.hh 11/13/2012 Header file for external encoder
ATM60.cpp 11/14/2012 Configure external encoder to provide velocity data
S620.cpp 11/14/2012 Motor controller
adaptive.tv 11/19/2012
data_record.dr 11/28/2012 Variables saved to log file
input.py 12/13/2012 Setup parameters
S_define 11/20/2012 Main TRICK program
Table 3: Trick Simulation Source Code Files
Each of the modified files were consistent with the modifications approved by the ARGOS EMU TRR. The file, ARGOSApplication.java, included changes to allow for showing the new version number on the screen, the setting a velocity gain variable Kv, and limiting the minimum and maximum values for Kv. The file,
34
ControlApplication.java, allowed for the screen layout to include the new variable. The files, ATM60.hh and ATM60.cpp, control what information is collected from the ARGOS output drum encoder. This sensor is able to provide position and velocity over the CAN bus network and the files were modified to configure the encoder to add the velocity output from the encoder to the CAN network data being used by the Trick Simulation ARGOS
controller. The function readposition() used to read the encoder position was updated to also read the
velocity variable. The file, S620.cpp, was updated in increase a variable synccount used in the MessageInfo() object from 10 to 50. This changed the rate that the s620 motor amplifier would provide actual motor RPM to the Trick Simulation. This motor velocity variable is not used in the ARGOS control algorithm and is only used for troubleshooting. The Velocity variable used in the controller comes from the output drum encoder (ATM60). The file, data_record.dr, controls what the Trick Simulation environment records to a data file. The modifications all reflect the new velocity controller variable. The velocity variable from the motor controller was removed and replaced with the velocity variable from the ATM60 output drum encoder. A conversion to linear velocity is a calculated variable that is logged to the data file. A calculated variable to convert the motor RPM command to a motor linear velocity command is logged to the data file. The velocity gain variable is logged to the data file. The purpose of converting rotational velocity to linear velocity was that the data is more intuitive to ARGOS customers and operators when operating the system and reviewing test data.
The file, input.py, is a test parameter file. This file includes ARGOS system limit parameters that are configured based on the test configuration. These parameters are modified after the TRR and define the operational motion limits for the system for the current configuration and set fault detection thresholds. Monitored virtual physical limits include a virtual soft stop motion limit and a virtual hard limit. The soft limit commands zero velocity to the motor controller and will allow the ARGOS operator to back out of the limit position without a system fault. The virtual hard limit causes the ARGOS Trick simulation to command a motor controller disable, initiating the emergency zero velocity ramp command to approach zero velocity and throw the external brakes. Parameters that control fault detection for the load cell measurement include magnitude of unacceptable error between the target off-load force and the measured force along with a duration. Also, an unacceptable minimum and maximum force value will fault the system and result in the brakes locking the system. These parameters were modified due to changes in the lifting path components and changes in the ARGOS system height.
The file, S_define, is the main Trick Simulation control loop. This source code was modified to include the velocity component of the control system algorithm.
To conclude, the ARGOS software configuration management was followed as defined in the approved Configuration Management (CM) Plan for the Active Response Gravity Offload System (ARGOS) (SRSD-08-005.A. The software has existed as a component of the overall ARGOS development and has not been identified as a specific software project by the Engineering Directorate. Under current NASA process at JSC, the software developed within the Engineering Directorate would be required to follow the process described by EA-WI-035 Software Project Management and Development.
6.7.5 Software Regression Testing
Per the ARGOS CM plan, SRSD-08-005.A, software regression testing is based on the requirements determined by the ARGOS project lead and as approved by test readiness review. The most recent change to the ARGOS software to include a velocity gain parameter did not document requirements for regression testing of the safety functions that perform fault detection logic in the system. The determination to test for controller stability under nominal drive conditions was evaluated with test inputs that included high velocity and impulse inputs into the force feedback system. There was no evidence of previous testing being done to evaluate over constrained operation of the software. This determination was made based on the logic that none of the fault detection logic or system interfaces were modified.
6.7.6 Fault Detection Software Logic
The software was reviewed in collaboration with the ARGOS Controls Engineer to establish whether the close call occurred due to a failure of the existing fault detection logic in the software. The fault detection logic is listed in the following Table 4. The Hazard of Un-commanded motion was evaluated due to the causes shown in the following table. The cause that most likely resulted in the close call was not recognized when developing the
35
software and is likely to be a case of over constraining the physical system such that the motor controller integration term increased motor torque until breakaway occurred and the system ran away without having a chance to recover. The processing capacity of the Trick Simulation computer is greater than the demands of the control system algorithm and fault detection. Trick has built in capability to monitor the control cycles and log when a control frame is delayed beyond the cycle time of the simulation. The ARGOS simulation cycle time is one millisecond. The ARGOS team has stated that a designed rate of missed frames occurs due to devices on the CAN Bus network periodically responding in a duration slightly greater than one millisecond which results in three out of 1000 frames being delayed. This is not due to processing capacity, rather asynchronous hardware clocks. If the processing capacity of any device in the Trick simulation causes more frames to be missed, the Trick Simulation will report these delays. There was no identified evidence of overloaded computer capacity in the close call incident.
36
Hazard Cause Hazard control Description Criteria Result Effect on Load
Gear Slippage/Out of Gear/Encoder Failure
Drive input/output position detection
Gear slip detection between output drum encoder and motor encoder position data mismatch
If the motor encoder and output encoder differ by more than 1.2 revolutions (in terms of the motor)
Shutdown command is sent to motor. Brakes commanded to engage. Trick Simulation enters freeze loop. Output gear slippage message to console
Depending on the rate of the gears, the minimum drop of the test subject is 1.2 rotations of the motor. This distance is increased by the duration that the position data comes in from the motor controller. The output drum position is measured every 4 msec
Comes Out of Gear Gear indication switch
Gear ratio selector indication switches show which gear ratio is engaged by the shift fork
If both switches are depressed, neither switch is depressed, or if the opposite one of the expected gear ratio (set when initially shifted) is indicated
Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output gear indication message to console
Trick Simulation monitors switch positions every 0.25 seconds. If the system goes out of gear completely and the load starts to drop, the gear-slippage logic is more likely to react first based on a higher sample rate and this gear indication logic cycles every 0.25 seconds. If the gear is partialy engaged the switch breaks contact before completely out-of-gear with the dog-teeth remaining fully engaged and the Gear inidication switch logic would command the system to stop and engage the brakes
Drive moves past motion allowed
Virtual Soft Limit The virtual soft limit is designed to prevent a test subject from reaching a hard limit
Absolute output encoder information indicates soft limit position has been reached
If output velocity calculation results in a commanded velocity further into the limit, the Trick Simulation sends a zero velocity to motor controller instead. The software will output a soft limit message to the console
This position is initialized during the ARGOS daily checklist. Positions are set to allow full motion in the vertical direction which will not prevent an impact to the floor. The logic is based on encoder data sampling at (4 msec) and will output the appropriate velocity command on the next one millisecond control cycle once data is received
Drive moves past motion allowed
Virtual Hard Limit The virtual hard limit is the first hard limit (located before the physical hard limit switch)
Absolute output encoder information indicates hard limit position has been reached
Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output hard limit message to console
Trick Simulation logic is based on encoder data sampled at 4 millisecond intervals. The simulation freeze and brakes are commanded to engage on the next one millisecond control cycle
37
Bad input to control system
Load Cell Disconnection
Check for a reasonable load cell force
Trick Simulation will detect if the raw load cell force measurement is ever less than -100lbf (-445 N) or greater than 1000lbf (445 N)
The Trick Simulation will send a shutdown command to the motor controller on the next one millisecond control cycle. Brakes engage. Trick enters freeze loop. Software will output the load cell disconnect message to console
The Trick Simulation logic will enter a freeze loop and send a shutdown command to the motor controller on the next one millisecond control cycle. The motor controller will ramp to zero velocity from the current velocity and engage the brakes
Bad input to control system
Load Cell Disconnection
Check for a reasonable delta between two consecutive data points
If the raw load cell force changes by 125lbf (556 N) in one millisecond (between data points)
Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output relevant load cell disconnect message to console
The Trick Simulation logic will take two millisecond control cycles to detect this fault and will command shutdown on the next one millisecond control cycle. The Trick Simulation logic will enter a freeze loop and send a shutdown command to the motor controller on the next one millisecond control cycle. The motor controller will ramp to zero velocity from the current velocity and engage the brakes
Bad input to control system
Load Cell Disconnection
Check for a reasonable force error over time. The values in this loop were empirically developed during human testing with the ARGOS team
A fast check and a slow check: If the force error exceeds 100 lbf (445 N)and remains above 100lbf (445 N)for 300msec or if the force error exceeds 35lbf (156 N) and remains above 35lbf (156 N) for 500msec. This loop does not run if the participant is inside of a soft limit
Shutdown command sent to motor. Brakes engage. Trick enters freeze loop. Output relevant load cell disconnect message to console
When the the load cell is disconnected the filter and analog to digitcal converter output a noisy force between 0 and 20lbf (89 N). The effect is a low cable tension sent to the control system and the hoist will rapidly rise for 300msec to 500msec prior to entering the Trick simulation freeze loop and commanding the motor controller to ramp to zero velocity and engage the brakes
Bad input to control system
Negative Force The filtered force feeds into the proportional term of the controller. A negative force can result in undesirable behavior
If the filtered force is less than zero
Set the filtered force to zero
This scenario occurs when the load cell is measuring impacts (ie. Foot impact while jumping) that result in impulse measurements. This Trick simulation logic limits the control system response but is more of a stability control than fault detection as it does not engage the brakes or stop the simulation
38
Bad input to control system
High force error To maintain stability during impacts (foot strikes, jumps, etc), cap max filtered force error (feeds into proportional term)
If the filtered force error exceeds 20 lbf (89 N)
Set the filtered force error to 20lbf (89 N)
Proportional term causes under-damped ringing. Limiting this error reduces the amplitude of persistent oscillation
Control System Failure/ Software Exception
Node Guard Exceptions such as floating point exceptions, memory exceptions, etc. in Trick
Trick has exception handling
Shutdown command sent to motor. Brakes engage
Node guard S620 motor controller has node guarding and expects a heartbeat within 100ms If no heartbeat, instant shutdown
Control System Failure/Computer Failure
Node Guard Computer shuts down or software fault causes abnormal exit without executing a normal Trick shutdown routine
S620 motor controller has node guarding and expects a heartbeat within 100ms
Motor controller throws n04 warning and shuts down. Brakes engage
Node guard S620 motor controller has node guarding and expects a heartbeat within 100ms If no heartbeat, instant shutdown
Control System Failure/Communications check. Break/error in CAN network or failure of Trick software CAN
Node Guard S620 motor controller receives velocity commands via CAN network
S620 has node guarding and expects a heartbeat within 100ms
Motor controller throws n04 warning and shuts down. Brakes engage
Node guard S620 motor controller has node guarding and expects a heartbeat within 100ms If no heartbeat, instant shutdown
Control System Failure/ Motor controller velocity control loop gains incorrect
Software input variable sanitization routine
Trick software checks S620 motor controller gain settings at software start
Check if KP=0.2 (proportional gain) and Tn=140 (integral time constant)
Software will not start if settings not correct
This is pre-operational fault logic that will prevent the system from starting.
Table 4: Fault Detection for the Hazard of Un-commanded Motion
39
6.7.7 Test Data Review
Per the ARGOS overview given by the ARGOS test team there are a number of test parameters being logged in a data file every 10msec for analysis after the ARGOS operations are complete. The parameters include:
Variable Units Description
Simulation Time seconds Time since start of simulation
Raw Load Cell Force lbf Tension in lifting cable
Filtered Load Cell Force lbf Force after nonlinear filter - fed into the control loop
Target Offload lbf Force control loop tries to match
Output Encoder Position counts Position of ATM60 absolute encoder on cable drum
Output Encoder Velocity RPM Filtered velocity of absolute encoder on cable drum
Linear Velocity in/s Calculated linear velocity of cable
Commanded Linear Velocity in/s Control loop commanded velocity as linear cable motion
Kpv -- Velocity Gain Variable
Kpf -- Proportional Gain Variable
Kdf -- Derivative Gain Variable
Table 5: Test Parameters Recorded by the ARGOS Trick Simulation
Part of the software review was to develop confidence that the recorded parameters from the ARGOS software represented the system response and all values were within the expected range and capability of the sensing hardware and software data-types.
The value for Raw Load Cell Force (lbf) evaluates the cable tension between the ARGOS hoist and the load. It is a 1000lbf full-scale strain gauge high level output sensor the load cell is 10Vdc full-scale output and is digitized through a 16bit signed CAN bus signal conditioner. The force measurement was within sensor limits throughout the incident and never appears to saturate during the event. The Output Encoder Position is provided by an ATM60 Sick brand absolute encoder. The encoder is an unsigned 32bit integer and has been verified to be well within the rotational limits of the device during the close call operations. The position is initialized during the system start-up to zero the start position. The system position is a signed 32-bit integer that takes the unsigned current position minus the unsigned initial position. The output encoder velocity is a 32 bit signed integer provided in units of revolutions per minute. All of the encoder variables are properly typecast in Trick to prevent overflow of the variables. Both the rotational position and rotational velocity are converted to linear inch units for purposes of the data recording file.
The Kollmorgen Servostar S620 motor amplifier provides motor position as an unsigned integer to the Trick Simulation but it isnt recorded. Additionally the S620 amplifier is commanded with an RPM command to the motor controller. For purposes of data presentation and data recording this value is converted to linear inches per second to provide units familiar to the ARGOS operator.
The ARGOS gear selection switches are triggered by the gear selection and send a discrete value to the Trick Simulation through the CAN bus.
All of the parameters identified for use in the control algorithm and being recorded in the data file were proven to correspond to appropriate programming data-types.
40
6.8 Safety and Hazard Analysis
6.8.1 Safety
The ARGOS Systems Requirements Document makes no reference to JPR 1700.1, JSC Safety and Health Handbook nor EA-WI-023, Project Management of Government Furnished Equipment Flight Projects. The ARGOS was considered a development project that could operate with flexible adherence to requirements. This culture was accepted by Safety and Mission Assurance oversight. The primary NASA safety document applied to ARGOS was NASA Standard 8719.9, Standard for Lifting Devices and Equipment. This standard is heavily referenced and many people viewed ARGOS as a high tech Critical Lift crane. The standard may have been the best fit but ARGOS was not just a Critical Lift and for people. Although this was known at the time, the fact that no standard for a complex human robotic system exists, restricting adherence to the best fit, kept the team from looking for needed requirements and Hazard Controls. The velocity that the system operates at when approaching obstacles is well outside the realm of normal lifting operations.
6.8.2 Hazard Analysis
SRSD-12-007 Hazard Analysis for Gen 2 ARGOS Facility Testing documented the hazards of the ARGOS suspension system. It evolved from its initial use to document the standalone Gen 1 ARGOS system to the current Gen 2 configuration, which eventually included humans in the loop. The ARGOS team regularly updated that facility HA to reflect changes to the system that introduced new hazards. The status of the completed HA was presented at each Test Readiness Review (TRR).
The same is true of the hazards unique to wearing the harness and/or a pressure suit, which were documented separately by the Crew and Thermal Systems in SRSD-12-008 Hazard Analysis for ARGOS for Test Participant Providing an Input into Gen 2 ARGOS. This test subject HA was focused solely on the hazards of the human physiology of being restrained in the harness and/or pressure suit at various orientations. It did not address the ARGOS system performance beyond the harness.
6.9 Engineering Processes, Roles and Responsibilities This section is a discussion about how ARGOS evolved and the Engineering processes, roles and responsibilities that were observed during the investigation. It is not a technical discussion, rather observations of the environment, culture roles and expectations.
During the period that ARGOS was initiated, the EA-WI-023 was written to cover GFE flight projects, in 2012 it was revised, to be much more easily used by projects at all levels. Development, research or low TRL level projects are typically not projectized, as they are viewed within Engineering as not needing the rigor because they are more risky and are undergoing constant change as the hardware is developed. This approach allows for rapid development; build-a-little, test-a-little philosophy to obtain quick results at low cost. Specifically for ARGOS, it was an internal Engineering project that had no program or external customer; it was initiated on a small amount of internal funds. It was not categorized as a project or facility, and did not initially involve test subjects. Safety and Mission Assurance support was included from near the start of ARGOS, with buy-in to the engineering development approach. Throughout the development, the ARGOS team researched the design and selection of the components to a very detailed level, to the very best of their ability. Safety was requested to assist in the right lifting requirements. However, beyond Safety, it was noted that there was limited involvement from outside the division and organizations, and that the Institutional Review Board (IRB) was considered the oversight and external review from a human safety as well as engineering oversight. Each version or addition or change to ARGOS over the 6 to 7 years were reviewed at a Test Readiness Review (TRR) Board, in fact, there were 44 TRRs found for ARGOS. In addition, 19 ER CCBs were found from 2008 to current, and one Engineering Leadership Council topic at the Engineering Directorate, during the same period. Within ER all TRRs are chaired by one branch chief for the entire division. That branch chief also happens to be the branch which developed ARGOS. No reviews beyond a TRR were held. A PDR was held at about 50%, at that time some external review was provided. There were no CDR or other reviews beyond the one PDR. At some point the ARGOS went from a development effort to using humans in the test. The test subjects were varied and diverse, from NASA engineers to retired NASA astronauts, and even outside visitors. It also grew in capability from simulating 1/6 gravity (Lunar environment) to microgravity, and
41
from upright human testing to horizontal microgravity testing. All of these changes added risk and should have triggered additional safety concerns.
On March 15, 2012, there was a similar close call event of an un-intended drop of a test subject, also 12 to 18 inches (30.5 to 45.7 cm). The root cause of this event was attributed to developmental software that was incorrectly executed for the test, rather than the baseline software. In this case the motor amplifier threw a F32 (Software Failure) fault and the brakes were fully engaged to stop the fall.
From the Boards perspective, there were signs that were missed. The start of human in the test, the lack of proper outside review, lack of p