765

DISTRIBUTED DATABASE MANAGEMENT · PDF fileDISTRIBUTED DATABASE MANAGEMENT SYSTEMS A Practical Approach SAEED K. RAHIMI University of St. Thomas FRANK S. HAUG University of St. Thomas

  • Upload
    vanliem

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

  • DISTRIBUTED DATABASEMANAGEMENT SYSTEMS

    A Practical Approach

    SAEED K. RAHIMIUniversity of St. Thomas

    FRANK S. HAUGUniversity of St. Thomas

    A JOHN WILEY & SONS, INC., PUBLICATION

  • DISTRIBUTED DATABASEMANAGEMENT SYSTEMS

  • DISTRIBUTED DATABASEMANAGEMENT SYSTEMS

    A Practical Approach

    SAEED K. RAHIMIUniversity of St. Thomas

    FRANK S. HAUGUniversity of St. Thomas

    A JOHN WILEY & SONS, INC., PUBLICATION

  • All marks are the properties of their respective owners.

    IBM and DB2 are registered trademarks of IBM.

    JBoss, Red Hat, and all JBoss-based trademarks are trademarks or registered trademarks of RedHat, Inc. in the United States and other countries.

    Linux is a registered trademark of Linus Torvalds.

    Access, ActiveX, Excel, MS, Microsoft, MS-DOS, Microsoft Windows, SQL Server,Visual Basic, Visual C#, Visual C++, Visual Studio, Windows 2000, Windows NT, WindowsServer, Windows Vista, Windows XP, Windows 7, and Windows are either registered trademarksor trademarks of Microsoft Corporation in the United States and/or other countries.

    Oracle is a registered trademark of Oracle Corporation.

    Sun, Java, and all Java-based marks are trademarks or registered trademarks of Sun Microsystems,Inc. in the United States and other countries.

    Sybase is a registered trademark of Sybase, Inc.

    UNIX is a registered trademark of The Open Group.

    Copyright 2010 by IEEE Computer Society. All rights reserved.

    Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

    No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form orby any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except aspermitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the priorwritten permission of the Publisher, or authorization through payment of the appropriate per-copy fee tothe Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax978-646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should beaddressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,(201) 748-6011, fax (201) 748-6008.

    Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts inpreparing this book, they make no representations or warranties with respect to the accuracy orcompleteness of the contents of this book and specifically disclaim any implied warranties ofmerchantability or fitness for a particular purpose. No warranty may be created or extended by salesrepresentatives or written sales materials. The advice and strategies contained herein may not be suitablefor your situation. You should consult with a professional where appropriate. Neither the publisher norauthor shall be liable for any loss of profit or any other commercial damages, including but not limited tospecial, incidental, consequential, or other damages.

    For general information on our other products and services please contact our Customer Care Departmentwithin the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.

    Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,however, may not be available in electronic format.

    Library of Congress Cataloging-in-Publication Data is available.

    ISBN 978-0-470-40745-5

    Printed in the United States of America.

    10 9 8 7 6 5 4 3 2 1

    http://www.copyright.com

  • To my mother, Behjat, and my father, Mohammadthough they were not giventhe opportunity to attend or finish school, they did everything they

    could to make sure that all of their seven children obtained college degrees.S. K. R.

    To my mother who taught me to love reading, learning, and books;my father who taught me to love mathematics, science, and computers;and the rest of my family who put up with me while we wrote this book

    this would not have been possible without you.F. S. H.

  • CONTENTS

    Preface xxv

    1 Introduction 1

    1.1 Database Concepts, 21.1.1 Data Models, 21.1.2 Database Operations, 21.1.3 Database Management, 31.1.4 DB Clients, Servers, and Environments, 3

    1.2 DBE Architectural Concepts, 41.2.1 Services, 41.2.2 Components and Subsystems, 51.2.3 Sites, 5

    1.3 Archetypical DBE Architectures, 61.3.1 Required Services, 61.3.2 Basic Services, 71.3.3 Expected Services, 81.3.4 Expected Subsystems, 91.3.5 Typical DBMS Services, 101.3.6 Summary Level Diagrams, 11

    1.4 A New Taxonomy, 131.4.1 COS Distribution and Deployment, 131.4.2 COS Closedness or Openness, 141.4.3 Schema and Data Visibility, 151.4.4 Schema and Data Control, 16

    1.5 An Example DDBE, 171.6 A Reference DDBE Architecture, 18

    1.6.1 DDBE Information Architecture, 181.6.2 DDBE Software Architecture, 20

    vii

  • viii CONTENTS

    1.6.2.1 Components of the Application Processor, 211.6.2.2 Components of the Data Processor, 23

    1.7 Transaction Management in Distributed Systems, 241.8 Summary, 311.9 Glossary, 32

    References, 33

    2 Data Distribution Alternatives 35

    2.1 Design Alternatives, 382.1.1 Localized Data, 382.1.2 Distributed Data, 38

    2.1.2.1 Nonreplicated, Nonfragmented, 382.1.2.2 Fully Replicated, 382.1.2.3 Fragmented or Partitioned, 392.1.2.4 Partially Replicated, 392.1.2.5 Mixed Distribution, 39

    2.2 Fragmentation, 392.2.1 Vertical Fragmentation, 402.2.2 Horizontal Fragmentation, 42

    2.2.2.1 Primary Horizontal Fragmentation, 422.2.2.2 Derived Horizontal Fragmentation, 44

    2.2.3 Hybrid Fragmentation, 472.2.4 Vertical Fragmentation Generation Guidelines, 49

    2.2.4.1 Grouping, 492.2.4.2 Splitting, 49

    2.2.5 Vertical Fragmentation Correctness Rules, 622.2.6 Horizontal Fragmentation Generation Guidelines, 62

    2.2.6.1 Minimality and Completeness of HorizontalFragmentation, 63

    2.2.7 Horizontal Fragmentation Correctness Rules, 662.2.8 Replication, 68

    2.3 Distribution Transparency, 682.3.1 Location Transparency, 682.3.2 Fragmentation Transparency, 682.3.3 Replication Transparency, 692.3.4 Location, Fragmentation, and Replication

    Transparencies, 692.4 Impact of Distribution on User Queries, 69

    2.4.1 No GDDNo Transparency, 702.4.2 GDD Containing Location InformationLocation

    Transparency, 722.4.3 Fragmentation, Location, and Replication

    Transparencies, 732.5 A More Complex Example, 73

    2.5.1 Location, Fragmentation, and ReplicationTransparencies, 75

    2.5.2 Location and Replication Transparencies, 76

  • CONTENTS ix

    2.5.3 No Transparencies, 772.6 Summary, 782.7 Glossary, 78

    References, 79Exercises, 80

    3 Database Control 83

    3.1 Authentication, 843.2 Access Rights, 853.3 Semantic Integrity Control, 86

    3.3.1 Semantic Integrity Constraints, 883.3.1.1 Relational Constraints, 88

    3.4 Distributed Semantic Integrity Control, 943.4.1 Compile Time Validation, 973.4.2 Run Time Validation, 973.4.3 Postexecution Time Validation, 97

    3.5 Cost of Semantic Integrity Enforcement, 973.5.1 Semantic Integrity Enforcement Cost in Distributed

    System, 983.5.1.1 Variables Used, 1003.5.1.2 Compile Time Validation, 1023.5.1.3 Run Time Validation, 1033.5.1.4 Postexecution Time Validation, 104

    3.6 Summary, 1063.7 Glossary, 106

    References, 107Exercises, 107

    4 Query Optimization 111

    4.1 Sample Database, 1124.2 Relational Algebra, 112

    4.2.1 Subset of Relational Algebra Commands, 1134.2.1.1 Relational Algebra Basic Operators, 1144.2.1.2 Relational Algebra Derived Operators, 116

    4.3 Computing Relational Algebra Operators, 1194.3.1 Computing Selection, 120

    4.3.1.1 No Index on R, 1204.3.1.2 B + Tree Index on R, 1204.3.1.3 Hash Index on R, 122

    4.3.2 Computing Join, 1234.3.2.1 Nested-Loop Joins, 1234.3.2.2 SortMerge Join, 1244.3.2.3 Hash-Join, 126

    4.4 Query Processing in Centralized Systems, 1264.4.1 Query Parsing and Translation, 1274.4.2 Query Optimization, 128

    4.4.2.1 Cost Estimation, 129

  • x CONTENTS

    4.4.2.2 Plan Generation, 1334.4.2.3 Dynamic Programming, 1354.4.2.4 Reducing the Solution Space, 141

    4.4.3 Code Generation, 1444.5 Query Processing in Distributed Systems, 145

    4.5.1 Mapping Global Query into Local Queries, 1464.5.2 Distributed Query Optimization, 150

    4.5.2.1 Utilization of Distributed Resources, 1514.5.2.2 Dynamic Programming in Distributed

    Systems, 1524.5.2.3 Query Trading in Distributed Systems, 1564.5.2.4 Distributed Query Solution Space

    Reduction, 1574.5.3 Heterogeneous Database Systems, 170

    4.5.3.1 Heterogeneous Database SystemsArchitecture, 170

    4.5.3.2 Optimization in Heterogeneous Databases, 1714.6 Summary, 1724.7 Glossary, 173

    References, 175Exercises, 178

    5 Controlling Concurrency 183

    5.1 Terminology, 1835.1.1 Database, 183

    5.1.1.1 Database Consistency, 1845.1.2 Transaction, 184

    5.1.2.1 Transaction Redefined, 1885.2 Multitransaction Processing Systems, 189

    5.2.1 Schedule, 1895.2.1.1 Serial Schedule, 1895.2.1.2 Parallel Schedule, 189

    5.2.2 Conflicts, 1915.2.2.1 Unrepeatable Reads, 1915.2.2.2 Reading Uncommitted Data, 1915.2.2.3 Overwriting Uncommitted Data, 192

    5.2.3 Equivalence, 1925.2.4 Serializable Schedules, 193

    5.2.4.1 Serializability in a Centralized System, 1945.2.4.2 Serializability in a Distributed System, 1955.2.4.3 Conflict Serializable Schedules, 1965.2.4.4 View Serializable Schedules, 1965.2.4.5 Recoverable Schedules, 1975.2.4.6 Cascadeless Schedules, 197

    5.2.5 Advanced Transaction Types, 1975.2.5.1 Sagas, 1985.2.5.2 ConTracts, 199

  • CONTEN