Upload
ranusofi
View
127
Download
10
Embed Size (px)
DESCRIPTION
B414 Restart
Citation preview
Module 14: System Restarts
After completing this module, you will be able to:
List three different ways to restart the Teradata database.
Use the RESTART command.
Describe the impact of …
– Disk(s) failure– Disk array controller(s) failure– BYNET(s) failure– Node failure– AWS failure– VPROC failure
Explain the difference between a PDE dump and a UNIX panic dump.
Types of Restarts
Scheduled Restarts
• Changing system parameters (e.g., DBS Control parameter is updated)
• Software upgrades
• Configuration changes (addition of new AMPs and/or PEs
Unscheduled Restarts
• Power failure (e.g., 8/14/2003 – the North East U.S. and parts of Canada)
• Hardware failure
• Software failure
• Accidents
Restart Processes
1. Spool cylinders are returned to free cylinder list (unused cylinder pool).
2. Before logons are enabled, uncommitted work is rolled back.1st Tables are re-locked for background recovery.
2nd Logons are enabled in cold start.
Scheduled Restarts
Restart Teradata with Use this command Options
Command-line tpareset <comment> -f, -x, -y-d, -l, -Q, -P
DB Console - Supervisor restart tpa <comment> cold, coldwait
vprocmanager restart cold, coldwait
MultiTool (Windows 2000) reset (via GUI choices) GUI menu choices
Example:
# tpareset -f Change of system parameters
To see when restarts occur and brief explanation of how/why for the last week:
LOGON tdpid/systemfe,service; EXEC ALLRESTARTS (DATE - 7,); LOGOFF;
The “tpatrace” command may also be used to see information about restarts.
# tpatrace 3 (shows last 3 restarts)
Restarting Teradata from DB Window
RESTART TPA [, NODUMP ] [, COLD] COMMENT[, DUMP = YES] [, COLDWAIT][, DUMP = NO]
Restart using the “tpareset” Command
# tpareset -f Change of DBSControl parameters
You are about to restart the database on the system 'u4455' Do you wish to continue (yes/no) [no]: yestpareset: TPA reset submitted.
# tpatrace
TPA Initialization Trace for Node 001-01
02/16/2004 08:25:33 -------------------- PDE starting02/16/2004 08:25:35.06 (346) ---- PDE starting.02/16/2004 08:25:35.07 (346) State is NOTPA/START.:02/16/2004 08:25:36.38 (346) State is NOTPA/NETREADY.:02/16/2004 08:25:47.15 (346) State is TPA/START.:02/16/2004 08:25:48.05 (346) State is TPA/VPROCS.:02/16/2004 08:25:49.57 (346) State is TPA/READY.:02/16/2004 08:25:49.65 (346) State is TPA/DONE.02/16/2004 08:25:49.66 (346) Crash ceiling/count = 3/002/16/2004 08:25:49.66 (346) PDE started in 15 seconds.
Example of using the tpareset command:
Example of using the tpatrace command:
Restart Messages and Information
Recovery status information is logged to numerous locations:
Software_Event_Log SMP Console Display /var/adm/streams (UNIX)
SMP Console output following a tpareset:
:Event number 33-10198-00 (severity 40, category 10)Force a TPA restart.:NOTICE: fsgsync.c: PDE: A primary fsg flush started.xcmn_err: Message Date 02/16 - Time 08:25(mm/dd hh:mm):Event number 34-02900-00 (severity 10, category 10)04/02/16 08:25:49 Running DBS Version: 05.01.00.00Event number 34-02900-00 (severity 10, category 10)04/02/16 08:25:49 Running PDE Version: 05.01.00.00:04/02/16 08:25:50 Initializing DBS Vprocs:04/02/16 08:25:56 Configuration is operationalEvent number 34-02900-00 (severity 10, category 10)04/02/16 08:25:56 Starting AMP partitions:04/02/16 08:25:59 Voting for transaction recoveryEvent number 34-02900-00 (severity 10, category 10)04/02/16 08:26:00 Recovery session 1 contains 43 rows on AMP 00000Event number 34-02900-00 (severity 10, category 10)04/02/16 08:26:11 Starting PE partitions:04/02/16 08:26:15 Logons are enabledFeb 16 08:26:15 Teradata DBS Gateway: [455]: error logging started
PDE States
The pdestate command can be used to check the current state of the PDE and Teradata software for a specific node.
# /usr/ntos/bin/pdestatePDE: Parallel Database Extension state is TPA.
PDE has three major operational states:
NULL, NOTPA, and TPA
– NULL/START– NULL/STOPPED – NULL/RESET – NULL
– NOTPA/START – NOTPA/NETCONFIG – NOTPA/NETREADY – NOTPA/RECONCILE– NOTPA
– TPA/START – TPA/VPROCS – TPA/READY– TPA/DONE– TPA
Unscheduled Restarts
Disk Drive Failures
Scenario 1Failure: One disk in a drive group Result: No TPA resetResolution: Replace disk – Array Controllers automatically rebuild the disk
Scenario 2Failure: Two disks in a drive group Result: – TPA reset (1-5 minutes)
– AMP taken offline and marked as Fatal– Fallback tables OK– Non-fallback tables partially available
Resolution: – Replace the two disks– Reformat LUNs or Volumes in the drive group– Perform a table rebuild– Restore non-fallback tables
Scenario 3Failure: Two disks in 2 different drive groups associated with AMPs in the same
cluster – 2 AMPs fail in a clusterResult: Machine haltsResolution: Restore User DBC and tables
Unscheduled Restarts (cont.)
BYNET Failures
Scenario 1
Failure: One BYNET failsResult: – No TPA reset
– All traffic auto-switched to remaining BYNET – Impact on system performance
Resolution: Repair BYNET
Scenario 2
Failure: Both BYNETs fail
Result: Teradata halts and is not available
Resolution: Repair BYNETs
Unscheduled Restarts (cont.)
Node Failure
Scenario
Failure: Node Fails (e.g., O.S. hangs, 2 power supplies fail, memory fails, etc.)Result: – TPA restart (1 - 5 minutes) and vprocs migrate to other
nodes in clique– Possible O.S. reboot (3 - 15 minutes)
Resolution: – Repair node and reboot operating system– Restart Teradata to allow node to rejoin Teradata configuration
Vproc Software Failure
Scenario
Failure: AMP or PE Vproc fails
Result: TPA restart (1 - 5 minutes) and vprocs may be marked offline
Resolution: If necessary, run Scandisk, Checktable, and Rebuild utilities
AWS Failure
Scenario
Failure: AWS fails
Result: No restart of Teradata; AWS is not available to monitor/manage system
Resolution: Reboot or recover AWS
TPA Reset – Crashdumps
UNIX
CollectorTask
Dump Device(/dev/pdedump)
AMP AMP AMP AMP
Crashdump Table
1 2
1. Selective memory and swapped pages are written to “pdedump” space.
2. As part of Teradata restart, a background collector task reads “pdedump” and writes dump information to a Crashdump table in Crashdumps database.
• If the Crashdumps database is out of perm space, the collector task outputs a warning message and retries every 60 minutes to create a crashdump table.
UNIX MP-RAS Commands to determine if dumps are present in “pdedump”:
# pdedumpcheck -v (lists /dev/pdedump dumps that are present)
# fdlcsp - mode clear (clears all dumps from /dev/pdedump)
Allocating Crashdumps Space
DBC
Sys_Calendar SysAdmin SystemFECrashdumps SYSDBA
Allocate approximately 150 – 200 MB of permanent space per node per crashdump.
Example: Four-node system and you want to allocate space for three Crashdumps:
((150 x 4) x 3) = 1800 MB without fallback((150 x 4) x 3) x 2 = 3600 MB with fallback
MODIFY USER Crashdumps AS PERM = 1800E6;
Example of Crashdump name: Crash_20040213_012519_02 (Date) (Time) (Segment #)
Help USER Crashdumps;
Table/View/Macro name Kind Comment
Crash_20040213_012519_02 T PDE:05.01.00.00,TDBMS:05.01.00.00,TGTW:05.01.00.00;
TPA Dump Maintenance
Is the Crashdump needed?
(Contact support center if in doubt.)
DELETE from Crashdumps
Optionally, delete from pdedump device
Options:
– Allow access to system via network
– Archive to file and ftp to support center
– Use DUL and archive to tape
No
Yes
UNIX MP-RAS Operating System Dumps
Complete dump of system memory, including:
• PDE
• Kernel
Crash utility may be used to interpret dump.
Review Questions
1. What is the operating system command to restart Teradata? __________________
2. What is the DB Window supervisor command to restart Teradata? __________________
3. Which of the following choices will cause a Teradata restart? __________________
A. AWS hard drive failure
B. Single drive failure in RAID 1 drive group
C. Two drive failures in same RAID 1 drive group
D. Single SMP power supply failure
E. SMP CPU failure
F. One of BYNETs fails
G. LAN connection to SMP is lost
Module 14: Review Question Answers
1. What is the operating system command to restart Teradata? tpareset
2. What is the DB Window supervisor command to restart Teradata? restart tpa
3. Which of the following choices will cause a Teradata restart? C, E
A. AWS hard drive failure
B. Single drive failure in RAID 1 drive group
C. Two drive failures in same RAID 1 drive group
D. Single SMP power supply failure
E. SMP CPU failure
F. One of BYNETs fails
G. LAN connection to SMP is lost