Cloud Computing

MCAE25 Cloud Computing

Chethan VenkateshDepartment of MCA

M S Ramaiah Institute of TechnologyApril 8, 2023 1Chethan Venkatesh, Dept of MCA, MSRIT

UNIT-IV

April 8, 2023 2Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine

• Programming the Google App Engine :

• Web resources : http://code.google.com/appengine.

• Books and articles :

• www.byteonic.com/2009/overview-of-java-in-google-app-engine/


http://code.google.com/appengine

http://www.byteonic.com/2009/overview-of-java-in-google-app-engine/

http://www.byteonic.com/2009/overview-of-java-in-google-app-engine/

Programming Support of Google App Engine Cont..

• Key features of GAE programming model supported on two languages : Java and Python :-

• 1. A client environment with Eclipse plug-in for Java to debug GAE on local machine.

• 2. GWT available for Java web application developers.

• Developers can use this or any other language using JVM based interpreter or compiler, such as JavaScript or Ruby.

• Python is often used with frameworks such as Django and CherryPy.

• Google also supplies a built in webapp Python environment.



• Powerful constructs for storing and accessing data :-

• The data store is NOSQL data management.

• Schema-less properties.

• Java offers JDO (Java Data Object) and JPA (Java Persistence API) interfaces implemented by the open source Data Nucleus Access platform.

• Python has SQL-like query language called GQL.

• Data store is strongly consistent.

• Uses optimistic concurrency control.



• An update of an entity occurs in a transaction that is retried a fixed number of times if other processes are trying to update the same entity simultaneously.

• Data store implements transactions across its distributed network using “entity groups”.

• Entities of the same group are stored together.

• A transaction manipulates entities within a single group.

• GAE applications can assign entities to groups when they are created.

• Performance can be enhanced by in-memory caching using memcache.


Programming Support of Google App Engine Cont..• Blobstore :- store large files.

• Google SDC (Secure Date Connection) :- tunnel through the internet and link your intranet to an external GAE applications.

• URL Fetch operation:- provides the ability for applications to fetch resources and communicate with other hosts over the internet using HTTP and HTTPS requests.

• Applications can access web services, resources and other data on the internet.

• Specialized mail mechanism to send e-mail from your GAE application.

• Google’s corporate facilities includes maps, sites, groups, calendar, docs, and Youtube.

• Support Google Data API and can be used inside GAEApril 8, 2023 7Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine Cont..• Google accounts :- used by applications for user

authentication.

• Handles account creation and sign-in.

• Easy for user with Google account.

• Image service :- manipulate image data (resize, rotate, flip, crop).

• Cron jobs :- applications can perform tasks outside of responding to web requests.

• GAE is configured to consume resources up to certain limits or quotas.

• Free up to certain quotas.


Programming Support of Google App Engine Cont..• Google File System (GFS) :

• Fundamental storage service for Google’s search engine.

• Web data crawled and saved was huge.

• Need for an distributed file system to redundantly store massive amounts of data on cheap and unreliable computers.

• Assumptions :-

• The system is built from many inexpensive commodity components that often fail.

• It must constantly monitor itself and detect, tolerate, and recover promptly from component failures on a routine basis.


Programming Support of Google App Engine Cont..• The system stores a modest number of large files.

• We expect a few million files, each typically 100 MB or

• larger in size.

• Multi-GB files are the common case and should be managed e ciently. Small files must be supported, but we need not ffioptimize for them.

• The workloads primarily consist of two kinds of reads: large streaming reads and small random reads.

• In large streaming reads, individual operations typically read hundreds of KBs, more commonly 1 MB or more.

• The workloads also have many large, sequential writes that append data to files.


Programming Support of Google App Engine Cont..• The system must e ciently implement well-defined semantics ffi

for multiple clients that concurrently append to the same file.

• High sustained bandwidth is more important than low latency.


Programming Support of Google App Engine Cont..• Design Decision :-• Files stored as chunks

– Fixed size (64MB)• Reliability through replication

– Each chunk replicated across 3+ chunkservers• Single master to coordinate access, keep metadata

– Simple centralized management• No data caching

– Little benefit due to large data sets, streaming reads• Familiar interface, but customize the API

– Simplify the problem; focus on Google apps– Add snapshot and record append operations




Programming Support of Google App Engine Cont..• Single master

• Mutiple chunkservers

• Master – Manages namespace/metadata– Manages chunk creation, replication, placement– Performs snapshot operation to create duplicate of file or

directory tree– Performs checkpointing and logging of changes to

metadata– Load balancing– Unused storage reclaim – Periodically communicate with chunkservers (HeartBeat

message)


Programming Support of Google App Engine Cont..• Chunkservers

– Stores chunk data and checksum for each block– On startup/failure recovery, reports chunks to master– Periodically reports sub-set of chunks to master (to detect

no longer needed chunks)– Chunkservers store chunks on local disk as Linux files


Programming Support of Google App Engine Cont..• From distributed systems we know this is a:

– Single point of failure– Scalability bottleneck

• GFS solutions:– Shadow masters– Minimize master involvement (client chunk server)

• never move data through it, use only for metadata– and cache metadata at clients

• large chunk size• master delegates authority to primary replicas in data mutations

(chunk leases)

• Simple, and good enough!


Programming Support of Google App Engine Cont..• Data mutation (Write, Append operations) in GFS :-

• Data blocks must be created for all replicas.• Goal minimize involvement of the master.April 8, 2023 17Chethan Venkatesh, Dept of MCA, MSRIT

Programming Support of Google App Engine Cont..• Steps in mutation :-1. The client asks the master which chunk server holds the

current lease for the chunk and the locations of the other replicas. If no one has a lease, the master grants one to a replica it chooses (not shown).2. The master replies with the identity of the primary and the locations of the other (secondary) replicas. The client caches this data for future mutations. It needs to contact the master again only when the primary becomes unreachable or replies that it no longer holds a lease.



3. The client pushes the data to all the replicas. A client can do so in any order. Each chunk server will store the data in an internal LRU bu er cache until the data is used or aged ffout. By decoupling the data flow from the control flow, we can improve performance by scheduling the expensive data flow based on the network topology regardless of which chunk server is the primary.4. Once all the replicas have acknowledged receiving the data, the client sends a write request to the primary.The request identifies the data pushed earlier to all of the replicas. The primary assigns consecutive serial numbers to all the mutations it receives, possibly from multiple clients, which provides the necessary serialization. It applies the mutation to its own local state in serial number order.



5. The primary forwards the write request to all secondary replicas.Each secondary replica applies mutations in the same serial number order assigned by the primary.6. The secondaries all reply to the primary indicating that they have completed the operation.7. The primary replies to the client. Any errors encountered at

any of the replicas are reported to the client.In case of errors, the write may have succeeded at the primary and an arbitrary subset of the secondary replicas.



(If it had failed at the primary, it would not have been assigned a serial number and forwarded.)The client request is considered to have failed, and the modified region is left in an inconsistent state. Our client code handles such errors by retrying the failed mutation. It will make a few attempts at steps (3) through (7) before falling backto a retry

from the beginning of the write.


Programming Support of Google App Engine Cont..• BigTable, Google’s NOSQL System :

• Lots of (semi-)structured data at Google– URLs:

• Contents, crawl metadata, links, anchors, pagerank, …

– Per-user Data:• User preference settings, recent queries/search results, …

– Geographic locations:• Physical entities (shops, restaurants, etc.). roads, satellite image

data, user annotations, …

• Scale is large– Billions of URLs, many versions/page(~20K/version)– Hundreds of millions of users, thousands of q/sec– 100TB+ of satellite image data


Programming Support of Google App Engine Cont..• Why not commercial database?• Scale is too large for most commercial databases• Even if it weren’t, cost would be very high

– Building internally means system can be applied across many projects for low incremental cost

• Low-level storage optimizations help performance significantly– Much harder to do when running on top of a database layer


Programming Support of Google App Engine Cont..• BigTable is :-• Distributed multi-level map

– With an interesting data model• Fault-tolerant, persistent• Scalable

– Thousands of servers– Terabytes of in-memory data– Petabytes of disk-based data– Millions of reads/writes per second, efficient scans

• Self-managing– Servers can be added/removed dynamically– Servers adjust to load imbalance


Programming Support of Google App Engine Cont..• The BigTable system is built on top of an existing Google

cloud infrastructure.• Uses the following building blocks :-1. GFS : stores persistent state2. Scheduler : schedules jobs involved in BigTable serving3. Lock service : master election, location bootstrapping4. MapReduce : often used to read/write BigTable data


Programming Support of Google App Engine Cont..• Tablets :-• Large tables broken into tablets at row boundaries

– Tablet holds contiguous range of rows• Clients can often choose row keys to achieve locality

– Aim for ~100MB to 200MB of data per tablet• Serving machine responsible for ~100 tablets

– Fast recovery:• 100 machines each pick up 1 tablet from failed machine

– Fine-grained load balancing:• Migrate tablets away from overloaded machine• Master makes load-balancing decisions


Programming Support of Google App Engine Cont..• Tablet Location Hierarchy :


Programming Support of Google App Engine Cont..• Since tablets move around from server to server, given a row,

how do clients find the right machine?– Need to find tablet whose row range covers the target row

• One approach: could use the BigTable master– Central server almost certainly would be bottleneck in

large system• Instead: store special tables containing tablet location info in

BigTable cell itself


Programming Support of Google App Engine Cont..• Google’s approach: 3-level hierarchical lookup scheme for

tablets– Location is ip:port of relevant server– 1st level: bootstrapped from lock service, points to owner

of META0– 2nd level: Uses META0 data to find owner of appropriates

META1 tablet– 3rd level: META1 table holds locations of tablets of all

other tables• META table itself can be split into multiple tables


Programming Support of Google App Engine Cont..• Chubby, Google’s Distributed Lock Service :\• Provide coarse-grained locking service.• Store small files inside Chubby storage which provide

namespace as a file system tree.• Files stored are small compared to GFS.• Paxos agreement protocol.• Reliable.




• Two main components:– server (Chubby cell)– client library– communicate via RPC

• Proxy– optional

Programming on Amazon AWS and Microsoft AZURE cont..

• Programming on Amazon EC2 :• First company to introduce VMs in application hosting.• Rent VM instead of physical machines.• Can load any software on VM.• Elastic feature customer can create, launch, and terminate

server instances as needed.• Pay by hour for active servers.• Provides preinstalled VMs.• Instances are called as Amazon Machine Images (AMI).• Preconfigured with operating systems based on Linux or

Windows, and additional software.• Table defines 3 types of AMI.




Image Type Definition

Private Images created by you, which are private by default. You can grant access to other users to launch your private images.

Public

Images created by users and released to the Amazon Web Services community, so anyone can launch instances based on them and use them any way they like. The Amazon Web Services Developer Connection Web site lists all public images.

PaidYou can create images providing specific functions that can be launched by anyone willing to pay you per each hour of usage on top of Amazon charges.

http://www.ibm.com/developerworks/opensource/library/ar-cloudaws3/index.html


• Execution environment of Amazon EC2



• AMIs are the templates for instances, which are running VMs.• Workflow to create a VM is :-

• Create an AMI Create Key Pair Configure Firewall Launch

• This sequence is supported by public, private, and paid AMIs.

• Table shows instance types available on Amazon EC2 (Oct 6, 2010)




Compute Instance Memory GB

ECU or EC2 Units

Virtual Cores

Storage GB

32/64 Bit

Standard: Small 1.7 1 1 160 32

Standard: Large 7.5 4 2 850 64

Standard: Extra Large 15 8 4 1690 64

Micro0.613 Up to 2 Only

EBS32 or

64

High-Memory 17.1 6.5 2 420 64

High-Memory: Double 34.2 13 4 850 64

High-Memory: Quadruple 68.4 26 8 1690 64

High-CPU : Medium 1.7 5 2 350 32

High-CPU: Extra Large 7 20 8 1690 64

Cluster Compute 23 33.5 8 1690 64




• Amazon Simple Storage Service (S3) :• Provides simple web service interface.• Used to store and retrieve any amount of data, anytime from

anywhere on the web.• Provides object-oriented storage service.• Users can access their objects through Simple Object Access

Protocol (SOAP) using browsers or other client program which supports SOAP.

• SQS :- responsible for ensuring a reliable message service between two processes, even if the receiver processes are not running.

• Figure shows S3 execution environment.




Object is the basic unit of data

Bucket for storing objects

Key for data object retrieval

Object is attributed to value, metadata, and access control


• Object-Based Storage.

• 1 B – 5 GB / object.

• Redundant thru geographic dispersion.

• 99.99% Availability Goal.

• Authentication mechanisms.

• Objects can be Private or Public.

• Per-object URLs & ACLs.

• BitTorrent Support (default download protocol is HTTP).



• Pricing :-

• $.15 per GB per month storage.

• First 1 GB per month input or output free and then $.08 to $0.15 per GB for transfers outside S3 region.

• There is no data transfer charge for data transferred between EC2 and S3 within the same region or for data transferred between EC2 Northern Virginia and S3 U.S. Standard region (Oct 6, 2010)



• Amazon Elastic Block Store (EBS) and SimpleDB :

• EBS provides volume block interface for saving and restoring the virtual images of EC2 instances.

• Traditional EC2 will be destroyed after use.

• Status of EC2 will be saved on to the EBS after the machine is shutdown.

• S3 is “Storage as a Service” with messaging interface.

• EBS is similar to distributed file system.

• Allows to create volumes from 1GB to 1TB which can be mounted on EC2 instances.

• Multiple volumes can be mounted on the same instance.

• Storage volume behaves like raw unformatted block devices.



• You can create a file system on top of EBS volumes.

• Also use them as a hard disk.

• Snapshots for incremental data saving.

• Pricing :-

• $0.10 pre GB/month.

• $0.10 per 1 million I/O requests made to the storage (Oct 6 , 2010).

• Nimbus an open source equivalent to EBS.



• Amazon SimpleDB Service :

• Also called as “LittleTable” (metadata).

• Provides a simplified data model based on relational database data model.

• Domains used to organize structured data.

• Each domain can be considered as a table.

• Items are rows in the table.

• Cell is the value for a specific attribute (column name) of corresponding row.

• Key feature :- possible to assign multiple values to a single cell in the table (not permitted in traditional databases).



• Removes requirement to maintain database schemas.

• Faster store, access, and query operations.

• Pricing:-

• First 25 Amazon SimpleDB Machine Hours consumed per month free.

• $0.140 per Amazon SimpleDB Machine Hour consumed (Oct 6, 2010).


Programming on Amazon AWS and Microsoft AZURE cont..• Microsoft Azure Programming Support :

• Components shown in the fig ref text book page number 385.

• Underlying Azure fabric which consists of virtualized hardware together with sophisticated control environment.

• Implements dynamic assignment of resources, fault tolerance, DNS and monitoring capabilities.

• Automated service management allows service models to be defined by an XML template and multiple service copies to be instantiated on request.

• Azure storage :- stores event logs, trace/debug data, performance counters, IIS web server logs, crash dumps, and other log files.

• There is no debugging capability for running cloud applications.


Programming on Amazon AWS and Microsoft AZURE cont..• Basic capabilities.

• Storage.

• Compute.

• Web role :- customized VM (appliances) link to internet for Microsoft web hosting.

• Worker role :- schedule needed resources.

• Roles support HTTP(S) and TCP.

• Offer

• Onstart() method allows you to perform initialization tasks.

• Onstop() method called when a role is to be shutdown.

• Run() method contains the main logic.


Programming on Amazon AWS and Microsoft AZURE cont..• SQL Azure :

• Offer SQL server as a service.

• All the storage modalities are accessed with REST interfaces except drives.

• Drives :- recently introduced.

• Similar to Amazon EBS.

• Offers file system interface as durable as NTFS.

• Also support blob storage.

• Storage replication is 3 times for fault tolerance.


Programming on Amazon AWS and Microsoft AZURE cont..• Basic storage system is built from blobs similar to S3.

• Blobs arranged in 3 level hierarchy

• Account Containers Page or Block Blobs.

• Containers similar to directories in traditional file systems.

• Account as root.

• Blobs used to stream data and is a sequence of blocks up to 4 MB.

• Each block has 64 byte ID.

• Block blobs can be up to 200GB in size.

• Page blobs are for random read/write access.

• Array of pages with a mazimum blob of 1TB.

• Metadata can be associated with blobs as <name, value> pairs up to 8 KB per block.


Programming on Amazon AWS and Microsoft AZURE cont..• Azure Tables :

• Azure Table and queue storage modes for smaller data volumes.

• Queues provide reliable message delivery.

• Support work spooling between web and worker roles.

• 8KB limit on message size.

• Can consist of unlimited number of messages.

• Azure supports PUT, GET, and DELETE message operations.

• Queues supports CREATE and DELETE operations.

• Each account can have any number of Azure tables.

• Consists :-

• Rows entities

• Columns properties.April 8, 2023 50Chethan Venkatesh, Dept of MCA, MSRIT

Programming on Amazon AWS and Microsoft AZURE cont..• All entities have upto 255 general properties.

• <name, type, value> triples.

• Two extra properties PartitionKey and RowKey.

• RowKey gives unique label to each entity.

• PartitionKey designed to be shared and entities with the same partitionKey are stored next to each other.

• Maximum storage for an entity is 1MB.

• For large values store a link to a blob store in Table property value.

• ADO.NET and LINQ support table queries.


Emerging Cloud Software Environments• Open Source Eucalyptus and Nimbus :• A software platform developed by Eucalyptus Systems, Inc.,

(started 2008 and stable release 2010)• Written in Java, C, running with Linux, can host Linux

and Windows VMs• Use hypervisors (Xen, KVM and VMWare) and compatible

with EC2 and S3 services• Eucalyptus stands for “Elastic Utility Computing Architecture

for Linking Your Programs To Useful Systems”.• For use in developing IaaS-style private cloud or hybrid cloud

on computer cluster, working with AWS API• License : Proprietary or GPLv3 for open-core enterprise

edition and also an open-source edition available • Web site: www.eucalyptus.com


Emerging Cloud Software Environments cont..• Eucalyptus Architecture :


Emerging Cloud Software Environments cont..• Open software environment.

• Supports cloud programmers in VM image management.

• Supports both computer cloud and storage cloud.

• VM Image Management :

• Many design queues from EC2.

• Similar image management system.

• Stores images in Walrus.

• Walrus :- block storage system similar to S3.

• User can bundle his/her own root file system, and upload and then register this image and link it with a particular kernel and ramdisk image.


Emerging Cloud Software Environments cont..• Images are uploaded into user-defined bucket within Walrus

and can be retrieved anytime from any availability zone.

• Users need to create special virtual appliances and deploy them on Eucalyptus.

• http://en.wikipedia.org/wiki/Virtual_appliance.

• Available in both commercial proprietary and open source versions.


http://en.wikipedia.org/wiki/Virtual_appliance

Emerging Cloud Software Environments cont..• Nimbus :


Emerging Cloud Software Environments cont..• Set of open source tools that together provide an IaaS cloud

computing solution.

• Allows client to lease remote resources by deploying VMs on those resources and configuring them.

• Offers special web interface known as Nimbus Web.

• Provides administrative and user functions in a friendly interface.

• Cumulus :- a storage cloud implementation tightly integrated with other central services.

• Compatible with Amazon S3 REST API.

• Additional feature quota management.

• Nimbus cloud client uses Java Jets3t library to interact with Cumulus.


Emerging Cloud Software Environments cont..• Two resource management strategies supported by Nimbus :-

• Default “resource pool” mode.

• Service has direct control of a pool of VM manager nodes.

• “pilot” mode.

• Service makes request to a cluster’s Local Resource Management System (LRMS) to get a VM manager available to deploy VMs.

• Nimbus also provides an implementation of Amazon’s EC2 interface.


Emerging Cloud Software Environments cont..• Open source tool kit which allows users to transform existing

infrastructure into an IaaS cloud with cloud-like interfaces.


Feature Function

Internal Interface • Unix-like CLI for fully management of VM life-cycle and physical boxes• XML-RPC API and libvirt virtualization API

Scheduler • Requirement/rank matchmaker allowing the definition of workload and resource-aware allocation policies

• Support for advance reservation of capacity through Haizea

Virtualization Management

• Xen, KVM, and VMware• Generic libvirt connector (VirtualBox planned for 1.4.2)

Image Management • General mechanisms to transfer and clone VM images

Network Management • Definition of isolated virtual networks to interconnect VMs

Service Management and Contextualization

• Support for multi-tier services consisting of groups of inter-connected VMs, and their auto-configuration at boot time

Security • Management of users by the infrastructure administrator

Fault Tolerance • Persistent database backend to store host and VM information

Scalability • Tested in the management of medium scale infrastructures with hundreds of servers and VMs (no scalability issues has been reported)

Installation • Installation on a UNIX cluster front-end without requiring new services• Distributed in Ubuntu 9.04 (Jaunty Jackalope)

Flexibility and Extensibility

• Open, flexible and extensible architecture, interfaces and components, allowing its integration with any product or tool

Emerging Cloud Software Environments cont..• Main features of OpenNebula :-

Chethan Venkatesh, Dept of MCA, MSRITApril 8, 2023 60

Emerging Cloud Software Environments cont..• The Core :-

• Request manager: Provides a XML-RPC interface to manage and get information about ONE entities.

• SQL Pool: Database that holds the state of ONE entities.

• VM Manager (virtual machine): Takes care of the VM life cycle.

• Host Manager: Holds handling information about hosts.

• VN Manager (virtual network): This component is in charge of generating MAC and IP addresses.


Emerging Cloud Software Environments cont..• The tools layer :-

• Scheduler:

– Searches for physical hosts to deploy newly defined VMs

• Command Line Interface:

– Commands to manage OpenNebula.

– onevm: Virtual Machines

• create, list, migrate…

– onehost: Hosts

• create, list, disable…

– onevnet: Virtual Networks

• create, list, delete


Emerging Cloud Software Environments cont..• The drivers layer :-

• Transfer Driver: Takes care of the images.

– cloning, deleting, creating swap image…

• Virtual Machine Driver: Manager of the lifecycle of a virtual machine

– deploy, shutdown, poll, migrate…

• Information Driver: Executes scripts in physical hosts to gather information about them

– total memory, free memory, total cpus, cpu consumed…


Emerging Cloud Software Environments cont..• In case of insufficiency of local resources, OpenNebula can

support hybrid cloud model by using cloud drivers to interface with the external clouds.

• Leads to HA.

• Currently includes EC2 driver and submits requests to EC and Eucalyptus.


Emerging Cloud Software Environments cont..• Sector/sphere :

• Software platform that supports very large distributed data storage and simplified distributed data processing over large clusters of commodity computers.

• Can take place either within a same data center or across multiple data centers.

• Sector: Distributed File System• Sphere: Simplified Parallel Data Processing Framework• Goal: handling big data on commodity clusters• Open source software, BSD license, written in C++.• Started since 2006, current version 2.3

• http://sector.sf.net• Architecture figure refer text book page 391.


http://sector.sf.net/

Emerging Cloud Software Environments cont..• DFS designed to work on commodity hardware

– racks of computers with internal hard disks and high speed network connections.

• File system level fault tolerance via replication• Support wide area networks

– Can be used for data collection and distribution.

• Security server :-

• User accounts, permission, IP access control lists.

• Use independent accounts, but connect to existing account database via a simple “driver”, e.g., Linux accounts, LDAP, etc.

• Single security server, system continue to run when security server is down, but new users cannot login.


Emerging Cloud Software Environments cont..• Master servers :-• Maintain file system metadata

– Metadata is a customizable module, currently there are two implementations, one in-memory and one on disk.

• Authenticate users, slaves, and other masters (via security server).

• Maintain and manage file replication, data IO and data processing requests– Topology aware.

• Multiple active masters can dynamically join and leave; load balancing between masters.


Emerging Cloud Software Environments cont..• Slave nodes :-

• Store Sector files

– Sector file is not split into blocks.

– One Sector file is stored on the “native” file system (e.g., EXT, XFS, etc.) of one or more slave nodes.

• Process Sector data

– Data is processed on the same storage node, or nearest storage node possible.

– Input and output are Sector files.


Emerging Cloud Software Environments cont..• Clients :-• Sector file system client API

– Access Sector files in applications using the C++ API.• Sector system tools

– File system access tools.• FUSE

– Mount Sector file system as a local directory.• Sphere programming API

– Develop parallel data processing applications to process Sector data with a set of simple API.

• The client communicate with slave directly for data IO, via UDT.


Emerging Cloud Software Environments cont..• UDT: UDP-based Data Transfer :-• http://udt.sf.net

• Open source UDP based data transfer protocol

– With reliability control and congestion control.

• Fast, firewall friendly, easy to use.

• Already used in many commercial and research systems for large data transfer.


Emerging Cloud Software Environments cont..• Files are not split into blocks

– Users are responsible to use proper sized files.

• Directory and File Family

– Sector will keep related files together during upload and replication.


Emerging Cloud Software Environments cont..• Sphere :-

• Data parallel applications.

• Data is processed at where it resides, or on the nearest possible node (locality).

• Same user defined functions (UDF) are applied on all elements (records, blocks, files, or directories).

• Processing output can be written to Sector files or sent back to the client.

• Transparent load balancing and fault tolerance.


OpenStack Community Today


Emerging Cloud Software Environments cont..• OpenStack :

• Introduced by Rackspace and NASA in July 2012.

• An open source community spanning technologists, developers, researchers, and industry to share resources and technologies.

• Goal creating a massively scalable and secure cloud infrastructure.

• Software is open source and limited to just open source APIs.

• Addresses compute and storage aspects.

• OpenStack Compute and OpenStack Storage solutions.


Emerging Cloud Software Environments cont..• The pieces of OpenStack :-


OpenStack Compute (Nova)

OpenStack Object Storage (Swift)

OpenStack Image Service (Glance)

Emerging Cloud Software Environments cont..• OpenStack Compute :

• Nova :- OpenStack is developing a cloud computing fabric controller, a component of an IaaS system.

• Architecture built on the concept shared-nothing and message-based information exchange.

• Communication in Nova facilitated by message queues.

• Shared-nothing :- the overall system is kept in a distributed data system.

• Updates are made consistent through atomic transactions.

• Implemented using Python.

• Architecture fig refer text book page 392.


Emerging Cloud Software Environments cont..• Supports external libraries and components.

• Boto, Amazon API provided in Python, and Tornado, a fast HTTP server used to implement the S3 capabilities in OpenStack.

• Cloud controller :- maintains global state of the system.

• Ensures authorization while interacting with user manager via Lightweight Directory Access Protocol (LDAP).

• Interacts with S3 service.

• Manages nodes as well as storage workers through a queue.


Emerging Cloud Software Environments cont..• Integrates networking components to manage private networks, public IP

addressing, virtual private network (VPN) connectivity, firewall rules.

• Includes following types :-

• NetworkController :- manages address and virtual LAN (VLAN) allocations.

• RoutingNode :- governs the NAT (network address translation) conversion of public IPs to private IPs, and enforce firewall rules.

• AddressingNodes : runs Dynamic Host Configuration Protocol (DHCP) services for private networks.

• TunnelingNode :- provides VPN connectivity.


Emerging Cloud Software Environments cont..• The network state (managed in the distributed object store) consists

of the following:-

• VLAN assignment to a project.

• Private subnet assignment to a security group in VLAN.

• Private IP assignments to running instances.

• Public IP allocations to a project.

• Public IP associations to a private IP/running instance.


Emerging Cloud Software Environments cont..• OpenStack Storage :

• Built around a number of interacting components and concepts including a proxy server, a ring, an object server, a container server, an account server, replication updaters, and auditors.

• Proxy server :- enable lookups to the accounts, containers, or objects in OpenStack storage rings and route the requests.

• Ring :- represents a mapping between names of entities stored on disk and their physical location.

• Separate rings for accounts, containers, and objects exist.

• A ring includes the concepts of using zones, devices, partitions, and replicas.

• Handling failure is easier.


Emerging Cloud Software Environments cont..• Manjrasoft Aneka Cloud and Appliances :

• What is Aneka?

• Cloud application platform developed by Manjrasoft, based in Melbourne, Australia.

• www.manjrasoft.com

• Designed to support rapid development and deployment of parallel and distributed applications on private or public clouds.

• Service Oriented Architecture (SOA).

• Provides a runtime environment and set of APIs.

• Choice for flexible, extensible .NET enterprise Cloud application and deployment.


http://www.manjrasoft.com/

Emerging Cloud Software Environments cont..

• Aneka Meaning : many, in many ways, many in one…


Designed to be a configurable middleware with the aim of supporting an open ended set of abstractions for distributed

computing and deployment scenarios

This means:Multiple programming/deployment modelsMultiple scheduling strategiesMultiple authentication modelsMultiple persistence backendsMultiple platform and OSs

Emerging Cloud Software Environments cont..• Aneka acts as a workload distribution and management platform for

accelerating applications in both Linux and Microsoft .NET framework environments.

• Advantages with respect to workload distribution :

• Supports of multiple programming and application environments.

• Simultaneous support of multiple runtime environments.

• Rapid deployment tools and framework.

• Ability to harness multiple virtual and/or physical machines for accelerating application provisioning based on users’ Quality of Service/service-level agreement (QoS/SLA) requirements.

• Built on top of Microsoft .NET framework, with support for Linux environments.


Emerging Cloud Software Environments cont..• Offers 3 types of capabilities

• Build, Accelerate, Manage.

• Build :- includes a new SDK that combine API and tools to enable users to rapidly develop applications.

• Allows users to build different runtime environment like enterprise/private cloud.

• Achieved with the help of compute resources in network or enterprise data centers.


Emerging Cloud Software Environments cont..• Accelerate :- supports rapid development and deployment of

applications in multiple runtime environments running different OSs such as Windows or Linux/UNIX.

• Uses physical machines to achieve maximum utilization in local environments.

• To achieve QoS parameters in case of insufficiency of resources supports dynamic leasing of extra capabilities from public clouds like EC2.


Emerging Cloud Software Environments cont..• Manage :-

• Management tools include a GUI, and APIs to set up, monitor, manage, and maintain remote and global Aneka compute clouds.

• Accounting mechanism manages priorities and scalability based on SLA/QoS which enables dynamic provisioning.

• Important programming models supported by Aneka for both cloud and traditional parallel applications :

1. Thread programming model.

2. Task programming model.

3. MapReduce programming model.


Emerging Cloud Software Environments cont..


Emerging Cloud Software Environments cont..• Aneka Architecture :

• Cloud platform features a homogeneous distributed environment for applications.

• Collection of physical and virtual nodes hosting the Aneka container.

• Interaction with hosting platform through PAL (Platform Abstraction Layer).

• Hides implementation of heterogeneity of different OSs.

• Supports all infrastructure related tasks.

• PAL and container together represents the hosting environment of services.


Emerging Cloud Software Environments cont..• Categories of services are :-

• Fabric Services :-

• Implements fundamental operations of the infrastructure of the cloud.

• Services are HA and failover for improved reliability, node membership and directory, resource provisioning, performance monitoring, and hardware profiling.

• Foundation Services :-

• Comprises core functionalities of Aneka middleware.

• Provides basic set of capabilities to enhance application execution in the cloud.

• Services are storage management, resource reservation, reporting, accounting, billing, services monitoring, and licensing


Emerging Cloud Software Environments cont..• Application Services :-

• Execution of applications.

• Provides appropriate runtime environment for each application model.

• Leverage foundation and fabric services for several tasks of an application execution such as elastic, scalability, data transfer, and performance monitoring, accounting, and billing.

• Virtual Appliances :-

• Refer textbook page 398.


Documents

Cloud Computing