Kanishk Tewari

Embed Size (px)

Citation preview

  • 8/8/2019 Kanishk Tewari

    1/21

    DISTRIBUTED STORAGESYSTEMS

    A Seminar Reportdone in partial fulfilment of

    the requirements for the award of the degree of

    Bachelor of Technologyin

    Computer Science and Engineering

    by

    KANISHK TEWARI

    B070351CS

    Department of Computer Science & Engineering

    National Institute of Technology CalicutKerala - 673601

    Monsoon 2010

  • 8/8/2019 Kanishk Tewari

    2/21

    National Institute of Technology CalicutDepartment of Computer Science & Engineering

    Certified that this Seminar Report entitled

    DISTRIBUTED STORAGESYSTEMS

    is a bonafide record of the Seminar presented by

    KANISHK TEWARI

    B070351CS

    in partial fulfilment ofthe requirements for the award of the degree of

    Bachelor of Technologyin

    Computer Science & Engineering

    SREENU NAIK BHUKYA

    (Seminar Co-ordinator)

    Assistant Professor

    Dept.of Computer Science & Engineering

  • 8/8/2019 Kanishk Tewari

    3/21

    Acknowledgment

    I would like to thank Shri. Sreenu Naik Bhukya, Assistant Professor and his associates, De-

    partment of Computer Science and Engineering, National Institute of Technology Calicut, and

    LATEX, for providing me an opportunity to present this work. I would also like to thank God

    and my parents.

    3

  • 8/8/2019 Kanishk Tewari

    4/21

  • 8/8/2019 Kanishk Tewari

    5/21

    List of Figures

    1 [4]DISKOS Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2 [2]The Classic Web 2.0 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3 [2]Storage(Information Upload) . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    4 [2]Access(Data Linkage) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5 [2]Data Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    5

  • 8/8/2019 Kanishk Tewari

    6/21

    Abstract

    During the past years, anywhere, anytime access to large and reliable storage has become

    increasingly important in both enterprise and home computer environments. Also, web ap-

    plications store vast quantities of information online, storage, protection and maintenance of

    which becomes the sole responsibility of web application provider. This is where the distributed

    storage system makes its foray as it has evolved from one host infrastructure to distributed en-

    vironment. It becomes more and more evident for the necessary of the separation of computing

    and storage. So, in this paper we discuss various aspects about distributed storage systems like

    data replication schemes, integration of data into the Web 2.0 model and data access methods.

    6

  • 8/8/2019 Kanishk Tewari

    7/21

  • 8/8/2019 Kanishk Tewari

    8/21

    is large and the storage capacity is limited. To overcome these limitations, an efficient genetic

    algorithm based heuristic provides good quality solutions.

    1.3 Data Access in Distributed Computing

    The increasing demand for massive data processing has prompted the evolution from one host

    to distributed environment. Distributed computing has become the most important method for

    massive data processing and many platforms have emerged. At the same time the distributed

    storage system has been used to supply massive distributed data. These different storage

    systems vary greatly in design and implementation and it is desirable to construct the computing

    that can adapt to a variety of applications. So data accessing becomes the key problem to

    achieve it. Some research has been developed on this. Such as SRB(storage resource broker)[3]

    ,it supplies plenty of data accessing methods and protocols for distributed storages. It focuses

    more on the data supplies from the resource and can be seen as a middleware between the

    computing and the storage. Some research on the accessing of certain data type such as HENP

    project[3] in Berkeley on physical data processing. The accessing of distributed data will be

    designed as an interface on the accessing from the computing. The aim is to improve the

    supportability of various storages and the data processing reliability and validity.

    1.4 Web 2.0 and Data Storage

    The current breed of Web Applications, known as Web 2.0, rely on monolithic storage models

    which place the sole burden of data management on the web application provider. Having the

    application provider manage this data has resulted in problems with data ownership issues, data

    freshness and data duplication, as has been covered by our previous work . As the majority

    of data in Web 2.0 is user-generated, it is suggested that the responsibility for storing user-generated content should be given to the data owner themselves. As such, a distributed storage

    model which allows web applications to offload the storage and management of usergenerated

    content to storage systems managed by the data owner, is presented. The Distributed Data

    Service (DDS) API[2] allows web applications to seamlessly present user-generated content to a

    3rd party user. The 3rd- party interacts directly with both the web application and numerous

    DDS systems which host the web applications content. Data owners can manage their data

    elements by interacting with a DDS directly while also exposing this data to web applicationsvia a publish/subscribe model.

    8

  • 8/8/2019 Kanishk Tewari

    9/21

    2 DISKOS System Overview

    The distributed storage systems designed so far are tailored to specific storage environments

    and are mostly implemented on the file system level of the I/O stack of an operating system.

    Secondly, distributed file systems are optimized to specific application characteristics but disk

    access patterns change over time as applications evolve, a fact that ensures a continuous need

    for new file systems. But DISKOS, on the other hand, is able to achieve high performance in

    a storage environment consisting of more than two storage hosts.

    Overall, it appears that most distributed storage systems assume the use of expensive storage

    and network equipment, which in some cases is totally proprietary. Other systems assume that

    system reliability is maintained through the use of some type of replication mechanism, which is

    implemented by external software or hardware. Also, cluster file systems require the existence

    of powerful, dedicated metadata servers for conducting all file metadata operations, resulting to

    a hierarchical structure that scales well only in combination with high-speed, low-latency and

    symmetric network links. DISKOS, on the other hand, is a completely decentralized storage

    system that may be deployed even in poor and heterogeneous environments, in terms of CPU,

    storage and network resources. In DISKOS, there is decoupling of file system-related semantics

    such as directories and files metadata storage and management, concurrent file access poli-

    cies and file locking mechanisms from the actual distribution of data and provide a low-level

    operating system (OS) storage interface, which can be directly integrated with existing or new

    OS compliant file systems. As a result of this, DISKOS becomes a file system neutral storage

    infrastructure, whose main component is implemented as a regular disk driver embedded in an

    operating system kernel. File system neutrality is achieved since the disk driver is agnostic to

    the I/O requests content.

    To achieve scalability in terms of supporting thousands of users in both wide area networks

    and cooperative storage environments, and to enable self management, complete decentraliza-

    tion, data replication, resilience to failures, and transparency regarding underlying hardware

    and exported interfaces to the upper layers, including applicability to a multitude of storage

    environments, we chose to exploit the basic principles of Distributed Hash Tables(DHTs)[4].

    Here, the DHT acts as storage provider for block requests that the disk driver has to satisfy.

    9

  • 8/8/2019 Kanishk Tewari

    10/21

    Figure 1: [4]DISKOS Overview

    2.1 System Overview

    In this section we briefly discuss the components of the proposed system and their interaction

    with each other and other parts of the I/O system of an operating system. Every application

    that runs in the user space of the operating system is not able to directly access any kernel

    component, except by using the system calls exported by the kernel. For instance, when an

    application has to open a file found in a local file system for reading, it simply issues an open()

    system call along with the OS specific arguments.

    One of the main components of the proposed system, the Virtual Disk Driver (VDD), is built

    in the kernel and therefore it supports direct integration with every file system that implements

    the interfaces exported by the kernel. The VDD is directly connected with the Virtual Device

    Controller (VDC). This connection is accomplished using some kind of IPC mechanism sup-

    ported by the operating system. The VDD receives I/O requests for sectors from the upper

    levels of the I/O system and asynchronously forwards them to the VDC. The VDC provides

    the VDD with higher level I/O services, such as request buffering and sector prefetching. Itis also the component, which, on behalf of the VDD, accesses the DHT, which is implemented

    in the Storage Provider (SP). The SP implements a DHT peer, which asynchronously receives

    10

  • 8/8/2019 Kanishk Tewari

    11/21

    and satisfies requests from the VDC. The VDC and the SP are user level processes.

    The reason for choosing a three-tier architecture is twofold. Firstly, the VDD is a kernel com-

    ponent, and therefore it has to be lightweight in terms of memory and CPU usage. Secondly,

    separating the VDC from the SP allows a host that needs access to a distributed storage space

    to connect to a remote SP, rather than implementing one itself. Using this schema, the pro-

    posed system can also be applied to devices with limited CPU and storage resources, such as

    mobile phones and PDAs.

    11

  • 8/8/2019 Kanishk Tewari

    12/21

    3 Replication Algorithms

    Our replication policy assumes the existence of one primary copy for each object in the network.

    Let SPk be the site which holds the primary copy of Ok, the object, i.e., the only copy in the

    network that cannot be deallocated, hence referred to as primary site of the kth object. Each

    primary site SPk, contains information about the whole replication scheme Rk of Ok. This can

    be done by maintaining a list of the sites where the kth object is replicated at, called from now

    on the replicators of Ok. Moreover, every site S(i) stores a two-field record for each object. The

    first field is its primary site SPk and the second the nearest SN(i)k site to S

    (i) which holds a

    replica of Ok. In other words, SN(i)k is the site for which the reads from S

    (i) for Ok, if served

    there, would incur the minimum possible communication cost. It is possible that SN(i)k = S(i),

    if S(i) is a replicator or the primary site of Ok. Another possibility is that SN(i)k =SPk, if the

    primary site is the closest one holding a replica of Ok.

    When a site S(i) reads an object, it does so by addressing the request to the corresponding SN(i)k .

    For the updates we assume that every site can update every object. Updates of an object Ok are

    performed by sending the updated version to its primary site SPk, which afterwards broadcasts

    it to every site in its replication scheme Rk. The simplicity of this policy allows us to develop

    a general cost model that can be used with minor changes to formalize various replication and

    consistency strategies.

    We are basically interested in minimizing the total network transfer cost (NTC) due to object

    movement, since the communication cost of control messages has minor impact to the overall

    performance of the system. There are two components affecting NTC. First, is the NTC created

    from the read requests.

    Let R(i)k denote the total NTC, due to S

    (i)s reading requests for object Ok, addressed to the

    nearest site SN(i)k .

    R(i)k = r

    (i)k ok C(i,SN

    (i)k ) (1)

    where,

    C(i,j) = Communication cost (per unit) between sites i and j

    r(i)k = Number of reads from site i for object k

    ok = Size of object k

    The second component of NTC is the cost arising due to the writes. Let W(i)k be the total

    12

  • 8/8/2019 Kanishk Tewari

    13/21

    NTC, due to S(i)s writing requests for object Ok, addressed to the primary site SPk.

    W(i)k = w

    (i)k ok

    C(i,SPk) +

    jRkj=i

    C(SPk, j)

    (2)

    where,

    w(i)k = Number of writes from site i for object k

    Let Xik=1 if S(i) holds a replica of object Ok, and 0 otherwise. Hence Xik defines an MxN

    replication matrix, named X, with boolean elements. Sites which are not replicators of object

    create NTC equal to the communication cost of their reads from the nearest replicator, plus

    that of sending their writes to the primary site of Ok. Sites belonging to the replication scheme

    of Ok, are associated with the cost of sending/receiving all the updated versions of it. Using

    the above formulation, the Data Replication Problem (DRP) can be defined as:

    Find the assignment of 0, 1 values in the X matrix that minimize D.

    There is a greedy method which tries to create replicas of objects by introducing a new term,

    B(i)k , called the replication benefit for each site S

    (i) and object Ok. Larger the replication benefit

    value, better the chance of an objects replica to be created in that site.

    We define the replication benefit value, B(i)k , as:

    B(i)k =

    R(i)k

    Mx=1w

    (x)k ok C(i,SPk) W

    (i)k

    ok(3)

    13

  • 8/8/2019 Kanishk Tewari

    14/21

    4 Data Access Configuration

    Efficient data accessing from different storage systems was first considered for implementation

    of the computing platform in our work. Here, the method proposed shields the difference below

    and supplies the computing platform with universal data accessing to achieve higher flexibility

    and efficiency.

    The Data I/O model[3] which covers the difference of the below storage systems and fulfills

    the reaction between the computing platform and the storage. This enables the computing

    platform to focus only on data accessing from the I/O . The data storage and management will

    not impact the computing.

    The details of the storage system are covered by I/O. When different storage systems are used

    in the simple configuration (or it is called the driver), then the computing can call the universal

    API to make the read or write operations from the storage. This decreases the dependence

    between the storage and the computing. The computing can configure with different storages.

    When constructing a distributed application the storage and the computing can be implemented

    separately therefore making the applications much more flexible.

    The distributed computing platform can react with the distributed storage by the I/O. The

    reaction between them should be emphasized. This is supplied by the framework of the comput-

    ing platform. The properties of the different storage systems can be configured in configuration

    files then the computing platform will switch to the corresponding storage system.

    The design and implementation of configuration files is essential to universal data accessing.

    The configuration file for different distributed file systems is DFS-Mapping.xml. It configures

    the IO operations of the storage system and other information related details.

    For example,

    ?xmlversion = 1.0?

    ?xml stylesheettype = text/xslhref= configuration.xsl?

    Configuration

    name dfs.name/name

    value /value

    description The name of this Distributed File System, such as HDFS or GFS/description

    name dfs.io.input/name

    value /value

    description The DataInputStream of the Distributed File System /description

    14

  • 8/8/2019 Kanishk Tewari

    15/21

    name dfs.io.output/name

    value /value

    description The DataOutputSteam of the Distributed File System/description

    ... ....

    /configuration

    The framework takes the mapping from distributed file systems and data I/O. Users can make

    the simple configuration for the computing platform to achieve the universal data accessing

    from low level file systems. Varieties are hidden so that reaction between the computing and

    the storage becomes clearer and easier.

    15

  • 8/8/2019 Kanishk Tewari

    16/21

  • 8/8/2019 Kanishk Tewari

    17/21

    in-house or outsourced to a specialised data storage provider. When a Web 2.0 site requests

    content from the data owner, he or she provides a link into the DSS in which the data is stored.

    Later, when the data is needed for display as part of a page generated by the site, the dis-

    playing browser instance uses the embedded link to directly retrieve the data from the ownerss

    DSS[2].To aid integration of the DDS into the data owners web experience a new module is

    introduced called the DDS browser extension. Communication between the browser extension

    and the storage service is achieved by inserting a DDSStorageService-AttachRequest into the

    HTTP response. Thisrequest is transparently inspected on-the-fly by the browser extension

    which allows the extension to re-write parts of the response HTML dynamically. For the en-

    duser, the extension re-writes links into a remote DSS with data returned from that specific

    DSS.

    Figure 3: [2]Storage(Information Upload)

    Phase two of the process, involves the data owner publishing content to the storage service.

    Again, the implementation is not restricted by the API as long as each piece of stored data

    is given a unique identifier that is global in that data owners domain. The unique identifier

    comprises three components:

    [name]:[path]@[system]

    The name component represents an identifier for each specific piece of data (for example, credit

    card). The path component supports a hierarchical storage structure allowing a data owner

    wishing to store various groupings of data (for example, a data owner may have separate sets

    of personal and business data). The system component is a unique identifier for the specific

    17

  • 8/8/2019 Kanishk Tewari

    18/21

    distributed data service.

    Once the data owner has uploaded data into their DDS, the next stage is to allow web applica-

    tions to Access[2] this data.During the registration process for a DDS-enabled web application,

    the web browser extension inspects the HTTP response traffic from the application and detects

    a DDS-Application-Subscribe-AuthRequest request. This request, and the corresponding re-

    sponse generated by the browser extension, is the basis for establishing a trust relationship

    between the web application and the distributed data service.Once this relationship is estab-

    lished, the data owner establishes a link between their data element and the web application.

    This link is akin to the data owner uploading content to the web application in a standard Web

    2.0 scenario, except that in the DDS design the data owner provides the unique identifier of the

    data as opposed to uploading the content itself. Once the link is established between a piece

    of data referenced by the web application and the storage location for that data in a DDS, the

    web application is free to request the data directly from the DDS.

    Figure 4: [2]Access(Data Linkage)

    The final stage in the design, is the Presentation[2] stage. Here we define how the data is

    transparently presented to end users. Again the browser extension plays a key role. In this

    instance the extension operates as a 3rd party does not have any direct relationship to the

    18

  • 8/8/2019 Kanishk Tewari

    19/21

    DDS of the rendered data. For example, an end user may access a web application and request

    to view an image collage built from images stored in multiple DDSs owned by multiple data

    owners.In this instance the web application, instead of returning raw data as in the classic Web

    2.0 case, will return a DDS-Present-DataRequest containing security information exchanged

    between the web application and DDS during the initial authentication request. This security

    information is protected using PKI to ensure that it cannot be abused to falsify links between

    a DDS and unauthorised web applications and clients. The trust relationship enforced in this

    case is between the web application and the DDS, hence the DDS itself does not need to be

    aware of all the end users who can render the data linked to a specific web application. The

    DDS-Present-DataRequest message triggers a handoff of the user from the web application to

    the DDS, allowing the browser extension to request and render the data directly from the DDS,

    under the web applications instruction.

    Figure 5: [2]Data Presentation

    19

  • 8/8/2019 Kanishk Tewari

    20/21

    6 Conclusion

    In this report, I have tried to outline some of the basic but important points regarding Dis-

    tributed Storage Systems, ranging right from one of its infrastructure, DISKOS, to its appli-

    cation in Web 2.0. Data replication schemes and methods of accessing data have also been

    outlined.

    DISKOS has shown that we can create a file system neutral distributed storage environment

    while reaching high performance for several write access patterns.

    The data replication problem has also been addressed and a cost model has been developed

    which is applicable to very large distributed systems such as the WWW and distributed

    databases.

    Light has also been put on some data access methods and how they are used in separating

    storage from the computing. The accessing efficiency and availability would be the focused

    research points.

    We have also seen concerns resulting from the growing popularity of Web 2.0 applications and

    defined a new paradigm for the distributed storage of data on the Internet. As the user take-up

    of Web 2.0 applications continues, it is sensible to adopt a distributed approach that parallels

    the way content is originally generated.

    20

  • 8/8/2019 Kanishk Tewari

    21/21