Week 12 Student(1)

8/17/2019 Week 12 Student(1)

1/111

WEEK 12

Dr. A. Brennan

8/17/2019 Week 12 Student(1)

2/111

8/17/2019 Week 12 Student(1)

3/111

NOTE:

Efficient Physical DB design produces technical

specifications to be used during the DB

implementation phase

8/17/2019 Week 12 Student(1)

4/111

For efficient physical DB design, certain

info. needs to be gathered:

Normalised relations with estimates of table volume (numberof rows in each table)

Attribute (field) definitions and possible max. length

Descriptions of data usage (when and where data areentered, deleted, retrieved, updated etc.)

Response time expectations

Data security needs

Backup/recovery needs

Integrity expectations

What DBMS technology will be used to implement thedatabase

What DB architecture to use

8/17/2019 Week 12 Student(1)

5/111

Once this info. is gathered, the designer

has to decide on a range of issues:

Suitable storage format (i.e. data types) for eachattribute (in order to minimise storage space andmaximise data integrity)

Grouping attributes from the logical model into physicalrecords (denormalisation)

File organisation (arranging similarly structured recordsin secondary memory for the purpose of storage, fastand efficient retrieval and update, protection of dataand its recovery after errors are found)

Query optimisation

8/17/2019 Week 12 Student(1)

6/111

What is it?

translate the logical description of data into technicalspecifications for storing and retrieving of data

Why?

Good performance , database integrity, security and

recoverability.

Physical Design

8/17/2019 Week 12 Student(1)

7/111

Input and Output for Physical Design

• Normalised relations with

estimates of table volume (number

of rows in each table)• Attribute definitions (and possible

maximum length)

• Descriptions of data usage (when

and where data are entered,

retrieved, deleted, updated)

• Response time expectations

• Data security needs

• Backup/recovery needs

• Integrity expectations

• Description of DBMS technology

•Suitable storage format (data

type) for each attribute in the

logical data model in order tominimise storage space and

maximise data integrity

• Grouping attributes from the

logical model into physical records

• File organisation

• Selection of indexes and database

architectures for storing and

connecting files to efficiently

retrieve related data

• Query optimisation

8/17/2019 Week 12 Student(1)

8/111

Data Types

• CHAR – fixed-length character

• VARCHAR2 – variable-length character (memo)

• LONG – large number

• NUMBER – positive/negative number

• DATE – actual date

• BLOB – binary large object (good for graphics,

sound clips, etc.)

8/17/2019 Week 12 Student(1)

9/111

Goals

Data type

Goals = minimise storage space, represent all possible

values, improve data integrity, support all data

manipulations Data integrity controls

Default value, range control (constraints/validation

rules), null value control (i.e. PK cannot be null)

Referential integrity (FK cannot be null)

8/17/2019 Week 12 Student(1)

10/111

Default value - assumed value if no explicit value is entered for aninstance of the field (this reduces data entry time and helps prevententry errors for the most common value);

Range control – imposes allowable value limitations (constraints orvalidation rules). This may be a numeric lower to upper bound, or a setof specific values. This approach should be used with caution, since thelimits may change over time

Null value control – allows or prohibits empty fields (e.g. each primarykey must have an integrity control that prohibits a null value);

Referential integrity – a form of a range control (and null valueallowances) for foreign-key to primary-key match-ups. It guaranteesthat only some existing cross-referencing value is used

Integrity Controls

8/17/2019 Week 12 Student(1)

11/111

Physical Records

A Physical Record is a group of fields that are stored in adjacent secondary memory

locations and are retrieved and written together as a unit by particular DBMS

Scope:

•

Efficient use of secondary storage (influenced by both the size ofthe record and the structure of the secondary storage)

• Data processing speed.

Computer operating systems read data from secondary memory in units called pages.

A page is the amount of data read or written by an operating system in one operation.

Blocking Factor is the number of physical records per page.

8/17/2019 Week 12 Student(1)

12/111

Normalization

Normalization is a logical database design that is

structurally consistent and has minimal redundancy.

Normalization forces us to understand completely

each attribute that has to be represented in the

database. This may be the most important factor

that contributes to the overall success of the system.

8/17/2019 Week 12 Student(1)

13/111

What is Denormalization?

Denormalization a process of transforming normalised

relations into unnormalised physical record

specifications

Denormalization can also be referred to a process

in which we combine two relations into one new

relation, and the new relation is still normalized butcontains more nulls than the original relations

8/17/2019 Week 12 Student(1)

14/111

Denormalization

In addition, the following factors have to be

considered:

Application specific;

Denormalization may speed up retrievals but it

slows down updates

Size of tables

Coding

8/17/2019 Week 12 Student(1)

15/111

Answer15

Efficient data processing (second goal of physical record

design after efficient use of storage space) in most cases,

dominates the design process.

The speed of data processing depends on how close

together the related data are.

8/17/2019 Week 12 Student(1)

16/111

Benefits and Possible Problems16

Benefits:

• Can improve performance (speed)

Due to data duplication

Problems:

• Wasted storage space

• Data integrity/consistency threats

8/17/2019 Week 12 Student(1)

17/111

Denormalisation – How?

• Option one: Combine attributes from several logical

relations together into one physical record in order to avoid

doing joins (one to one, many to many, one to many)• Option two: Partition a logical relation into several

physical records (multiple tables);

• Option three: Data replication; or a combination of the

two options above.

8/17/2019 Week 12 Student(1)

18/111

Denormalisation – Option 1

1. Two entities with a one-to-one relationship

8/17/2019 Week 12 Student(1)

19/111

Mapping

Logical Model: Normalised Relations

8/17/2019 Week 12 Student(1)

20/111

Select * from EMPLOYEE, PARKING

WHERE

EMPLOYEE.Employee-ID = PARKING.Employee-ID

8/17/2019 Week 12 Student(1)

21/111


1. Two entities with a one-to-one relationship

8/17/2019 Week 12 Student(1)

22/111

Try this!

EMPLOYEE MANAGER

Employee

PPSName

Address

Manager

ID

Expertise

Manages

8/17/2019 Week 12 Student(1)

23/111

EMPLOYEE(EmployeePPS, Name, Address,

ManagerID)

MANAGER(ManagerID, Expertise)

Select * from EMPLOYEE, MANAGER

WHEREEMPLOYEE.ManagerID = MANAGER.ManagerID

ManagerID Expertise EmployeePPS Name Address

8/17/2019 Week 12 Student(1)

24/111


2. Many-to-many relationship (associative entity)with non-key attributes

8/17/2019 Week 12 Student(1)

25/111

8/17/2019 Week 12 Student(1)

26/111


Physical Model: Denormalised Relation

8/17/2019 Week 12 Student(1)

27/111


3 One to many relationship

8/17/2019 Week 12 Student(1)

28/111

Logical Model: Normalised Relations Resulting from One-to-Many (1:M) Relationship

8/17/2019 Week 12 Student(1)

29/111

Physical Model: Denormalised Relation

8/17/2019 Week 12 Student(1)

30/111


Horizontal partitioning - places different rows of a table into several physical files, based

on common column values.

Vertical partitioning – distributing the columns of a table into several separate files,

repeating the primary key in each one of them

Option 2 : Partitioning of logical relation into multiple tables

8/17/2019 Week 12 Student(1)

31/111

31

CUSTOMER

CustID

FirstName

MiddleName

LastName

Address1

Address2

CityCounty

Country

Phone

CreditLimit

SalesTaxRate

Fax

Email

CUSTOMERA

CustID

FirstName

MiddleName

LastName

Address1

Address2City

County

Country

Phone

Fax

Email

CUSOMTERB

CustID

CreditLimit

SalesTaxRate

Vertical partitioning

8/17/2019 Week 12 Student(1)

32/111

32

CUSTOMER

CustID

FirstName

MiddleName

LastName

Address1

Address2

CityCounty

Country

Phone

CreditLimit

SalesTaxRate

Fax

Email

CUSTOMERA-M

CustID

FirstName

MiddleName

LastName

Address1

Address2

CityCounty

Country

Phone

CreditLimit

SalesTaxRate

Fax

Email

CUSTOMERN-Z

CustID

FirstName

MiddleName

LastName

Address1

Address2

CityCounty

Country

Phone

CreditLimit

SalesTaxRate

Fax

Email

Horizontal partitioning

8/17/2019 Week 12 Student(1)

33/111

Advantages and disadvantages of

partitioning33

Efficiency

Local optimisation

Recovery

Slow retrieval

Complexity

Extra space and time for updates

8/17/2019 Week 12 Student(1)

34/111


Data Replication - the same data is purposely stored in multiple locationsof the database.

Data replication improves performance by allowing multiple users to

access the same data at the same time with minimum contention.

Option 3 : Data replication; or a combination of the other twooptions

8/17/2019 Week 12 Student(1)

35/111

Denormalisation Disadvantages

The potential for loss of integrity is considerable.

Additional time that is required to maintain consistency

automatically every time a record is inserted, updated,

or deleted

Increase in storage space resulting from the duplication

8/17/2019 Week 12 Student(1)

36/111

Whose responsibility?36

DBMS

Database Designer

8/17/2019 Week 12 Student(1)

37/111

File Organisation37

1. Sequential File Organisation

2. Indexed File Organisation

3. Hashed File Organisation

8/17/2019 Week 12 Student(1)

38/111

38

8/17/2019 Week 12 Student(1)

39/111

Sequential File Organisation

The records are stored in sequence according to aprimary key value. To locate a particular record, a

program must scan the file from its beginning until

the desired record is located

8/17/2019 Week 12 Student(1)

40/111

40

https://www.youtube.com/watch?v=zDzu6vka0rQ

https://www.youtube.com/watch?v=zDzu6vka0rQhttps://www.youtube.com/watch?v=zDzu6vka0rQhttps://www.youtube.com/watch?v=zDzu6vka0rQhttps://www.youtube.com/watch?v=zDzu6vka0rQhttps://www.youtube.com/watch?v=zDzu6vka0rQ

8/17/2019 Week 12 Student(1)

41/111

Th f I d t f l

8/17/2019 Week 12 Student(1)

42/111

Therefore Indexes are most useful

for…

Larger tables

Attributes which are referenced in ORDER BY or

GROUP BY clauses

8/17/2019 Week 12 Student(1)

43/111

43

https://www.youtube.com/watch?v=h2d9b_nEzoA

https://www.youtube.com/watch?v=h2d9b_nEzoAhttps://www.youtube.com/watch?v=h2d9b_nEzoAhttps://www.youtube.com/watch?v=h2d9b_nEzoAhttps://www.youtube.com/watch?v=h2d9b_nEzoAhttps://www.youtube.com/watch?v=h2d9b_nEzoA

8/17/2019 Week 12 Student(1)

44/111

Hashed File Organisation

The address of each file is determined using a

hashing algorithm

A hashing algorithm is a routine that converts a PK

value into a record address A hash index table uses hashing to map a key into a

location in an index, where there is a pointer to the

data record matching the hash key

8/17/2019 Week 12 Student(1)

45/111

DB Architecture

8/17/2019 Week 12 Student(1)

46/111

Note46

De-normalisation should only take place after a

satisfactory level of normalisation has taken place

8/17/2019 Week 12 Student(1)

47/111

Goal of Physical DB Design

The goal of physical DB design is to create technical

specifications from the logical descriptions of data

that will provide adequate data storage and

performance and will ensure database integrity,security and recoverability

8/17/2019 Week 12 Student(1)

48/111

DATA AND DATABASE

ADMINISTRATION

8/17/2019 Week 12 Student(1)

49/111

Data within the organisation

Data are a resource to be translated into

information

Data is constantly being produced and analysed to

create evenmore data

8/17/2019 Week 12 Student(1)

50/111

Database use in the organisation

Top management

strategic decision making, planning and policy

Middle management

tactical decisions and planning

Operational management

support company operations

MIS

DSS

TPS

Database

8/17/2019 Week 12 Student(1)

51/111

Management data

Two recognised roles

8/17/2019 Week 12 Student(1)

52/111

Data/database administration

Data administration is responsible for: planning and analysis function responsible for setting data

policy and standards

promoting company’s data as a competitive resource

providing liaison support to systems analysts duringapplication development

Database administration

operationally oriented

responsible for day-to-day monitoring and management ofactive database

liaison and support during application development

8/17/2019 Week 12 Student(1)

53/111

Data administrator

Data coordination keep track of update, responsibilities and interchange

Data standards

e.g naming standards

Liaison with systems analysts and programmers,including design

Training managers, users, developers

Arbitration of disputes and usage authorization Documentation and internal publicity

Promotion of data’s competitive advantage

8/17/2019 Week 12 Student(1)

54/111

Database administrator

Responsible for the day-to-day administration ofthe database

Monitors performance to maximize efficiency

Provides central point for troubleshooting Monitors security and usage (audit log)

Responsible for operational aspects of datadictionary

Carries out data and software maintenance

Involved in database design

8/17/2019 Week 12 Student(1)

55/111

Database Administrator in DB Design

Define conceptual schema what data to be held; what entities; what attributes

Define internal schema

decide physical database design

Liaise with users

ensure the data they need is available

Define security needs

Define backup and recovery Monitor performance

respond to changing requirements

8/17/2019 Week 12 Student(1)

56/111

A Summary of DBA Activities

data distribution and usedelivering

data backup and recoverymonitoring

data security, privacy and integritytesting

policies, procedures and standardsorganising

end-user supportplanning

db service

of

db activity

8/17/2019 Week 12 Student(1)

57/111

Tools for Database Administration

Information is kept about all corporate resources,

including data

This “data about data” is termed metadata

The database which holds this metadata is the datadictionary

Two types data dictionary

stand-alone or passive integrated or active

8/17/2019 Week 12 Student(1)

58/111

Metadata in Access

8/17/2019 Week 12 Student(1)

59/111

Data Dictionary

Passive data dictionary

self-contained database

all data about entities are entered into the dictionary

requests for metadata information are run as reportsand queries as necessary

Active data dictionary

8/17/2019 Week 12 Student(1)

60/111

Data dictionary: relationships

Table construction which attributes appear in which tables

Security which people have access to which databases or tables

Impact of change which programs might be affected by changes to which tables

Physical residence which tables or files are on which disks

Program data requirements which programs use which tables or files

Responsibilitywho is responsible for updating which databases or tables

I d i D b

8/17/2019 Week 12 Student(1)

61/111

Introducing a Database: Considerations

Three important aspects

technological:

managerial:

cultural:

DBMS software and hardware

Administrative functions

Corporate resistance to change

8/17/2019 Week 12 Student(1)

62/111

Social impact databases

Data collection is extensive

both voluntary and involuntary

Data is a commodity

8/17/2019 Week 12 Student(1)

63/111

DATABASE SECURITY

8/17/2019 Week 12 Student(1)

64/111

Security - types threat

Loss or corruption to data due to sabotage

external

internal

• Loss or corruption to data due to error

• Disclosure of sensitive data

• Fraudulent manipulation of data

8/17/2019 Week 12 Student(1)

65/111

Threats to data security

8/17/2019 Week 12 Student(1)

66/111

Controlling unauthorised access

Physical access to building

Access to hardware

Monitor any unusual activity

C

8/17/2019 Week 12 Student(1)

67/111

Controlling unauthorised access

Developing user profiles

care over decisions on what data and resources can be

accessed (and type access) for each end user

user training and education Firewalls

Encryption

Plugging known security holes using patches available for known problems

l f l

8/17/2019 Week 12 Student(1)

68/111

Developing user profiles

Every user is given an identifier for authentication Users are given privileges to access data

dependent on what is essential for their work

insert update

delete

Most DBMS provide an approach called

Discretionary Access Control (DAC) SQL standard supports DAC through the GRANT

and REVOKE commands

DAC d MAC

8/17/2019 Week 12 Student(1)

69/111

DAC and MAC

DAC has certain weaknesses in that an unauthorizeduser can trick an authorized user into disclosing sensitivedata

An additional approach is required called Mandatory

Access Control (MAC) MAC based on system-wide policies that cannot be

changed by individual users each database object is assigned a security class

each user is assigned a clearance for a security class

rules are imposed on reading and writing of databaseobjects by users

SQL standard does not include support for MAC

secret

Fi ll

8/17/2019 Week 12 Student(1)

70/111

Firewalls

Firewall controls network

traffic

E i

8/17/2019 Week 12 Student(1)

71/111

Encryption

Encryption: decoding or scrambling data to make it unintelligible

to those without the key

encryption

C lli l DP f ili i

http://localhost/var/www/apps/conversion/tmp/scratch_5/Encryption.pptxhttp://localhost/var/www/apps/conversion/tmp/scratch_5/Encryption.pptx

8/17/2019 Week 12 Student(1)

72/111

Controlling loss DP facilities

Redundancy

Virus protection

Disaster protection

Minimise error

Alert network managers to problems

Minor disruptions require on-going monitoring

P i

8/17/2019 Week 12 Student(1)

73/111

Protect against error

Educate all employees Reminders to save

Should you overwrite existing files?

incorporate safety nets on deletion Include integrity checks on data

validation

cross checking

range checks

hash totals

check digits

batch totals

8/17/2019 Week 12 Student(1)

74/111

Software Invasion

Cruise virus

• attacks for profit

• Exploits the network’s weakest link -you

• attacks through the public domain

• waits to reach its target

• reports successful penetration

• delivers payload

Stealth viruses

• encrypt and hides tracks

Worm• makes copies of itself

• transmits copies to other machines

• difficult to access to disable

Trojan horse

• looks like something else

• once launched, too late!

Trapdoor

• simulates regular entry

• or bypasses normalsecurity procedures

• difficult to detect that ithas been run

Logic bomb

• event driven

P t ti i t i tt k

8/17/2019 Week 12 Student(1)

75/111

Protecting against virus attacks

Prepare a company policy on viruses Educate on the destructive power of viruses

Control the source of software purchasing

Ensure new or upgraded software is installed by system

administrator on quarantined machine Control use of bulletin boards

Install anti-virus software where necessary

Make regular back-ups data and programs separately store back-up copies off-site once software opened

Be aware of software holes in systems software

H it b i d

8/17/2019 Week 12 Student(1)

76/111

How security can be compromised

Poor security management

Poor connections to the outside world

Shoddy system control

Human folly

Lack of security ethic

And the answer is:

Education

8/17/2019 Week 12 Student(1)

77/111

DISTRIBUTED DATABASE MANAGEMENT SYSTEMS

Di t ib t d d t b

8/17/2019 Week 12 Student(1)

78/111

Distributed databases

Distributed database a logically interrelated collection of shared data (and

description of this data) physically distributed over acomputer network

Distributed DBMSs (DDBMS) the software systems that permits the management of the

distributed database and makes the distribution apparentto users

must perform all the functions of a centralized DBMS must handle all necessary functions imposed by the

distribution of data and processing

Di t ib t d i /d t b

8/17/2019 Week 12 Student(1)

79/111

Distributed processing/database

Distributed Processing

Shares data processing chores over

sites using communications network

Database resides at one site only

Distributed Database

Each site has a data fragment

which might be replicated at

other sites

Requires distributed processing

DDBMS

8/17/2019 Week 12 Student(1)

80/111

DDBMS

Advantages Reflects organisational

structure

Faster data access andprocessing

Improved communications inorg.

Reduced operating costs

Improved share-ability andlocal autonomy

Less danger of single-pointfailure

Modular growth easier

Disadvantages Complexity management

and control

Security

Integrity control moredifficult

Lack of standard comms.protocols for dbs

Increased training costs Database design more

complex

Ch t i ti DDBMS

8/17/2019 Week 12 Student(1)

81/111

Characteristics DDBMS

A collection of logically related shared data The data is spilt into a number of fragments

Fragments may be replicated

Fragments/replicas are allocated to sites Sites linked by a communications network

Data at each site is under control of a DBMS

DBMS at each site can handle local applications

autonomously Each DBMS participates in at least one global

application

DDBMS feat res

8/17/2019 Week 12 Student(1)

82/111

DDBMS features

Application interface to interact with end user orapplication programs and with other DBMs

Validation to analyse data results

Transformation: to determine which data requestsare distributed and which local

Query optimization to find best access strategy

Mapping to determine location fragments

I/O interface Formatting to prepare data for presentation

Distributed database design

8/17/2019 Week 12 Student(1)

83/111

Distributed database design

Data fragmentation (divide) need to decide how to split into fragments

OR

Data replication (copy)

a copy of a fragment (or all) may be held at several sites

THEN

Data allocation:

need to decide where to locate those fragments andreplicas : each fragment is stored at the site with “optimaldistribution”

Data fragmentation

8/17/2019 Week 12 Student(1)

84/111

Data fragmentation

Users work with views, so appropriate to work withsubsets data

Cheaper to store data closest to where it is used

May give reduced performance for globalapplications

Integrity control may be difficult if data andfunctional dependencies are at different sites

Data fragmentation must be done carefully

Data fragmentation

8/17/2019 Week 12 Student(1)

85/111

Data fragmentation

Breaks single object into two or more segments orfragments

Each fragment can be stored at any site over a

computer network Information about data fragmentation is stored in

the distributed data catalog (DDC), from which it is

accessed by the transaction processor

Strategies for fragmentation

8/17/2019 Week 12 Student(1)

86/111

Strategies for fragmentation

For successful fragmentation, must ensure: completeness: each data item must appear in at least

one fragment

reconstruction: should be able to define a relationaloperation that will reconstruct relation from fragments

disjointness: a data item appearing in one fragment

should not appear in another

Strategies for data fragmentation

8/17/2019 Week 12 Student(1)

87/111

Strategies for data fragmentation

Horizontal fragmentation division of a relation into subsets (fragments) based on

tuples (rows)

Vertical fragmentation

division of a relation into attribute (column) subsets

Mixed fragmentation

combination

Data replication

8/17/2019 Week 12 Student(1)

88/111

Data replication

Storage of data copies at multiple sites served by a computer

network

Fragment copies can be stored at several sites to serve specific

information requirements

can enhance data availability and response time

can help to reduce communication and total query costs

Data replication

8/17/2019 Week 12 Student(1)

89/111

Data replication

Fully replicated database: stores multiple copies of each database fragment at

multiple sites

can be impractical due to amount of overhead

Partially replicated database: stores multiple copies of some database fragments at

multiple sites

most DDBMSs are able to handle the partially replicateddatabase well

Unreplicated database: stores each database fragment at a single site

no duplicate database fragments

Data allocation

8/17/2019 Week 12 Student(1)

90/111

Data allocation

Data allocation is closely related to the way thedatabase if fragmented : leads to decisions on whichdata is stored where

Centralized

entire database is stored at one site

Partitioned/ fragmented

database divided into several fragments and stored atseveral sites

Replicated copies of one or more database fragments (selective

replication) are stored at several sites

Strategies for data allocation

8/17/2019 Week 12 Student(1)

91/111

Strategies for data allocation

8/17/2019 Week 12 Student(1)

92/111

BIG DATA, SMALL DATA

Big Data

8/17/2019 Week 12 Student(1)

93/111

Big Data

Big data is the term for data sets so large and complex that itbecomes difficult to process them using on-hand database

management tools or traditional data processing applications

We are collecting more data than ever

electronics enables us to do so (RFID)

storage is cheap

We have streamlined our processes through normal channels

computing has enabled us to improve what we do and …

… businesses are looking for new ways to have a competitive edge

By looking at patterns in this data we can find out useful things

From a McKinsey report …

8/17/2019 Week 12 Student(1)

94/111

$600 to buy a disk which can store all of ethworld’s music

Internet of Things

8/17/2019 Week 12 Student(1)

95/111

g

Ubiquitous Broadband

Reduction in connectivitycosts

RFID enables uniqueaddressability

Increasingly, we are including sensors in everyday objects

These often have communicative capacity and link to source

through the internet

8/17/2019 Week 12 Student(1)

96/111

Use of Big Data

8/17/2019 Week 12 Student(1)

97/111

Use of Big Data

We can gain additional information derivable from analysis ofa single large set of related data (rather than a large number

of small sets)

Correlations can be found which "spot business trends,

determine quality of research, prevent diseases, link legalcitations, combat crime, and determine real-time roadway

traffic conditions"

The business case (Mc Kinsey)

8/17/2019 Week 12 Student(1)

98/111


1. Big data can unlock significant value by making information

transparent and usable at much higher frequency.

2. Organizations can collect more accurate and detailed

performance information on everything from product

inventories to sick days, and therefore expose variability and

boost performance. Leading companies are using data

collection and analysis to conduct controlled experiments to

make better management decisions; others are using data for

basic low-frequency forecasting to high-frequency nowcastingto adjust their business levers just in time.


8/17/2019 Week 12 Student(1)

99/111


3. Big data allows ever-narrower segmentation of customers and

therefore much more precisely tailored products or services.

4. Sophisticated analytics can substantially improve decision-making.

5. Big data can be used to improve the development of the next

generation of products and services. For instance, manufacturersare using data obtained from sensors embedded in products to

create innovative after-sales service offerings such as proactive

maintenance (preventive measures that take place before a failure

occurs or is even noticed).

http://www.mckinsey.com/insights/business_technology/big_data_the_next_

frontier_for_innovation

Big Data in Health

8/17/2019 Week 12 Student(1)

100/111

Big Data in Health

Big data is enabling a new understanding of the molecular biologyof cancer. The focus has changed over the last 20 years from the

location of the tumor in the body (e.g., breast, colon or blood), to the

effect of the individual’s genetics, especially the genetics of that

individual’s cancer cells, on her response to treatment and sensitivity

to side effects. For example, researchers have to date identifiedfour distinct cell genotypes of breast cancer; identifying the cancer

genotype allows the oncologist to prescribe the most effective

available drug first.

http://strata.oreilly.com/2013/08/cancer-and-clinical-trials-the-role-

of-big-data-in-personalizing-the-health-experience.html

Big Data in banking

8/17/2019 Week 12 Student(1)

101/111

Big Data in banking

IBM’s Watson can do analysis with “unstructured data” such as thosefound in e-mails, news reports, books and websites. Citigroup has

hired Watson to help it decide what new products and services to

offer its customers and to try to cut down on fraud and look for

signs of customers becoming less creditworthy. In most financial

institutions the immediate use of big data is in containing fraud and

complying with rules on money-laundering and sanctions.

Big credit card companies are getting better at recognising patterns

Solutions are getting cheaper – even for smaller banks

Banks also use the data to sell products (eg insurance) by looking at

the type of transactions customers make

http://www.economist.com/node/21554743

Some geospatial uses

8/17/2019 Week 12 Student(1)

102/111


The Climate Corporation, an insurance company, combines modernBig Data techniques, climatology and agronomics to analyse the

weather’s complex and multi-layered behaviour to help the world’s

farmers adapt to climate change.

McLaren’s Formula One racing team uses Big Data to identify issues

with its racing cars using predictive analytics and takes corrective

actions pro-actively. They spend 5% of their budget on telemetry. An

F1 car is fitted with about 130 sensors. In addition to the engine

sensors, video and GPS is used to work out the best line to take

through each bend. The sensor data is helping in traffic smoothing,energy-optimising analysis and driver’s direction determination.

e.g.new Pirelli tyres this year meant teams had to watch for tyre

wear, grip, temperature under different weather conditions and

tracks, relating all that to driver acceleration, braking and steering.


8/17/2019 Week 12 Student(1)

103/111


Vestas Wind Systems is implementing a big data solution that issignificantly reducing data processing time and helping faster and

more accurate predict weather patterns prediction at potential sites

to increase turbine energy production. They currently store 2.8

petabytes in a wind library covering over 178 parameters, such astemperature, barometric pressure, humidity, precipitation, wind

direction and wind velocity from the ground level up to 300 feet.

Nokia need a technology solution to support the collection, storage

and analysis of virtually unlimited data types and volumes. They

leverage data processing and complex analyses in order to build

maps with predictive traffic and layered elevation models, to source

information about points of interest around the world, to understand

the quality of phones and more. www.geospatialworld.net

More geospatial uses

8/17/2019 Week 12 Student(1)

104/111

More geospatial uses

US Xpress, transportation solutions, collects about a thousanddata elements ranging from fuel usage to tyre condition to truck

engine operations to GPS information, and uses this for optimal

fleet management and to drive productivity, saving millions of

dollars in operating costs. When an order is dispatched, it is

tracked using an in-cab system installed on a DriverTech tablet

with speech recognition capability. US Xpress constantly connects

to the devices to monitor progress of the lorry. The video camera

on the device could be used to check if the driver is nodding off.

All the data collected is analysed in real time using geospatialdata, integrated with driver data and truck telematics. They can

minimise delays and ensure trucks are not left waiting when they

arrive at a depot for maintenance. www.geospatialworld.net

Big data in the university

8/17/2019 Week 12 Student(1)

105/111

Big data in the university

Huddersfield University linked library data to identify learning styles. Now including

lecture attendance records

Purdue University, Indiana

when student logs into a course website, they see a traffic

light signal (and advice how to move to green) University of Derby

VLE use, sports, car parking

Loughborough University

analyses staff-student interaction

www.theguardian.com/education/2013/aug/05

Role of Cloud Computing

8/17/2019 Week 12 Student(1)

106/111

Role of Cloud Computing

Enables easier gathering, storage and processing of BigData

Cloud computing provides accessibility any time, anyplace

Large scale data gathering is possible from multiplelocations

Sharing of data easier

Large scale storage

Processing power also available with virtual machinesprovision to analyse data Can be utilised on an ad-hoc basis

Analysing Big Data

8/17/2019 Week 12 Student(1)

107/111

Analysing Big Data

Data mining blend of applied statistics and artificial intelligence

neural networks, cluster analysis, genetic algorithms, decision trees,

support vector machines

Analytics Machine learning

Visualisation

interactive rather than static graphs help to understand patterns

Shift of skills to digital analysis and visualisation techniques

Data mining

Who interprets?

http://localhost/var/www/apps/conversion/tmp/scratch_5/Data%20Warehousing%20Vs%20Data%20Mining.pptxhttp://localhost/var/www/apps/conversion/tmp/scratch_5/Data%20Warehousing%20Vs%20Data%20Mining.pptx

8/17/2019 Week 12 Student(1)

108/111

Who interprets?

A new set of tools make it easier to do a variety of dataanalysis tasks. Some require no programming, while other toolsmake it easier to combine code, visuals, and text in the sameworkflow. They enable users who aren’t statisticians or datageeks, to do data analysis. While most of the focus is on

enabling the application of analytics to data sets, some toolsalso help users with the often tricky task of interpreting results.In the process users are able to discern patterns and evaluatethe value of data sources by themselves, and only call uponexpert data analysts when faced with non-routine problems

http://strata.oreilly.com/2013/08/data-analysis-tools-target-non-experts.html

Issues

8/17/2019 Week 12 Student(1)

109/111

Problems with algorithms can magnify misbehaviour (e.g. selection bias)

Privacy and security

anonymity: profiling individuals Over-reliance on technology

Need for skilled workers with “deep analytics” skills

www.internetofthings.eu

House Keeping

8/17/2019 Week 12 Student(1)

110/111

p g

Groups and group names Project distribution

Weighting (65% exam : 35% CA)

35% CA = 28% project, 7% SQL CAs(approx.)

110

8/17/2019 Week 12 Student(1)

111/111

FINI

Documents

Week 12 Student(1)