21
Software Connector Classification and Selection for Data- Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl. Workshop on Incorporating COTS Software into Software Systems

Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

  • View
    227

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Software Connector Classification and Selection for Data-Intensive

Systems

Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian

2nd Intl. Workshop on Incorporating COTS Software into Software Systems (IWICSS 2007)

Page 2: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Agenda

• Research Problem and Importance• Our Approach

– Classification– Selection– Analysis

• Evaluation– Precision, Recall, Accuracy Measurements

• Related Work• Conclusion & Future Work

Page 3: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Research Problem and Importance

• Content repositories are growing rapidly in size

• At the same time, we expect more immediate dissemination of this data

• How do we distribute it…– In a performant manor?– Fulfilling system

requirements? ?NASA Planetary Data System

Archive Volume Growth

0

10

20

30

40

50

60

70

80

90

1990 1992 1994 1996 1998 2000 2002 2004 2006 2008

Year

TB (Accum)

TBytes

Page 4: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Software Architecture

• The definition of a system in the form of its canonical building blocks– Software Components: the computational units in the system– Software Connectors: the communications and interactions

between software components– Software Configurations: arrangements of components and

connectors and the rules that guide their composition

Page 5: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Data Distribution Systems

Data Producer

Data ConsumerData ConsumerData ConsumerData Consumer

data

???

data

Connector

Insight: Use Software Connectors to model data distribution technologies

ComponentComponent

Page 6: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Data Movement Technologies

• Wide array of available OTS “large-scale” connector technologies– GridFTP, Aspera software, HTTP/REST, RMI,

CORBA, SOAP, XML-RPC, Bittorrent, JXTA, UFTP, FTP, SFTP, SCP, Siena, GLIDE/PRISM-MW, and more

• Which one is the best one?• How do we compare them

– Given our current architecture?– Given our distribution scenarios & requirements?

Page 7: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Research Question

• What types of software connectors are best suited for delivering vast amounts of data to users, that satisfy their particular scenarios, in a manner that is performant, scalable, in these hugely distributed data systems?

Page 8: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Data Distribution Problem Space

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 9: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Broad variety of distribution connector families

• P2P, Grid, Client/Server, and Event-based

• Though each connector family varies slightly in some form or fashion– They all share 3 common atomic connector

constituents• Data Access, Stream, Distributor• Adapted from Mehta et al.’s Connector

Taxonomy

Page 10: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Connector Tradeoff Space

• Surveyed properties of 13 representative distribution connectors, across all 4 distribution connector families and classified them– Client/Server

• SOAP, RMI, CORBA, HTTP/REST, FTP, UFTP, SCP, Commercial UDP Technology

– Peer to Peer• Bittorrent

– Grid• GridFTP, bbFTP

– Event-based• GLIDE, Sienna

Page 11: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Large Heterogeneity in Connector Properties

Procedure Call Connector Breakdown (5 connectors, 2 families)

0

1

2

3

4

5

6

HTTP ResponseRMI message

GridFTP messageSOAP messageCORBA message

one senderMethod Call

Globus Log LayerHTTP Server logRMI Registry

CORBA Name Registry

Web Server

valuereference

publicprotected

private

one receiverkeyword

Num Connectors

proc_call_params_return_valueproc_call_cardinality_sendersproc_call_invocation_explicitproc_call_params_invocation_recordproc_call_params_datatransferproc_call_accessibilityproc_call_semantics

Data Access Connector Breakdown (8 Connectors, 4 families)

0

1

2

3

4

5

6

7

8

9

ProcessGlobal

Dynamic Data Exchange

Database AccessRepository Access

File I/O

Session-Based

Cache

Peer-Based

Many ReceiversOne Receiver

AccessorMutator

Many SendersOne Sender

Num Connectors

data_access_localitydata_access_persistencedata_access_avail_transientdata_access_cardinality_receiversdata_access_accessesdata_access_cardinality_senders

Distributor Connector Breakdown (8 connectors, 4 families)

0

1

2

3

4

5

6

7

8

9

ad-hocbounded

RMI MessageGridFTP Message

SOAP Message

Event

HTTP MessagePeer Pieces

registry-basedattribute-basedHeirarchical

Flat

content-based

tcp/ip

architecture configuration

tracker

Exactly OnceAt least onceBest Effort

dynamiccachedstaticUnicastMulticastBroadcast

Num Connectors

distributor_routing_membershipdistributor_delivery_typedistributor_naming_typedistributor_naming_structuresdistributor_routing_typedistributor_delivery_semanticsdistributor_routing_pathdistributor_delivery_mechanisms

Stream Connector Breakdown (8 connectors, 4 families)

0

1

2

3

4

5

6

7

8

9

Raw

StructuredMany Senders

One Sender

RemoteLocal

Exactly OnceAt least onceBest Effort

bps

Many ReceiversOne Receiver

StatefulStatelessNamed

Bounded

Asynchronous

Time Out Synchronous

Buffered

Num Connectors

stream_formatsstream_cardinality_sendersstream_localitiesstream_deliveriesstream_throughputstream_cardinality_receiversstream_statestream_identitystream_boundsstream_synchronicitystream_buffering

Page 12: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

How do experts make these decisions?

• Performed survey of 33 “experts”• Experts defined to be

– Practitioners in industry, building data-intensive systems

– Researchers in data distribution– Admitted architects of data

distribution technologies

• General consensus?– They don’t the how and the why

about which connector(s) are appropriate

– They rely on anecdotal evidence and “intuition”

Percentage Breakdown of Expert Responses

67%

15%

15%

3%

No ResponseNot ComfortableNo TimeFull Response

Expert Survey Demographic

6%

18%

12%

12%6%

22%

6%

12%

6%

Cancer Research

Planetary Science

Earth Science

Industry

Grid Computing

Professors

Web Technologies

Open Source

Students45% of respondents claimed to be uncomfortable being addressed as a data

distribution expert.

Page 13: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Our Approach: DISCO

• Develop a software framework for:– Connector Classification

• Build metadata profiles of connector technologies, describing their intrinsic properties (DCPs)

– Connector Selection• Adaptable, extensible algorithm development framework

for selecting the “right” connectors (and identifying wrong ones)

– Connector Selection Analysis• Measurement of accuracy of results

– Connector Performance Analysis

Page 14: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

DISCO in a Nutshell

Page 15: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Building DCPs of all 13 connectors (Classification)

• Rely on Mehta et al. metadata to describe data distribution connectors

• Carefully select metadata to include/exclude

Page 16: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Develop complementary selection algorithms

Page 17: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Preliminary Evaluation

• We developed 13 connector profiles– Based on literature, expert

reviews, and our own development experience

• 30 distribution scenarios• 24 score functions (white

box) and Bayesian domain profiles with 100 conditional probabilities (black box)

ConnectorProfiles

Distribution Scenarios

Answer Key Score Bayesian

DISCO

Precision-RecallAnalysis

Clustering Clustering

Page 18: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Precision-Recall Results

• Error Rate– Probability of incorrectly

labeling a connector as appropriate for a scenario

• Precision– The fraction of selected

connectors appropriate for a scenario

• Recall– Probability of detecting a

connector as appropriate for a scenario

Bayesian Scored-based

True Positive (TP) 101 63

False Positive (FP) 25 200

True Negative (TN) 245 67

False Negative (FN) 19 60

Bayesian Scored-based

Error Rate 11.28% 32.56%

Precision 80.16% 48.46%

Recall 25.90% 16.15%

Page 19: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Related Work

Page 20: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Conclusions & Future Work

• Conclusions– Domain experts (gurus) rely on tacit knowledge and

often cannot explain design rationale– Disco provides a quantification of & framework for

understanding an ad hoc process– Bayesian algorithm has a higher precision rate

• Future Work– Explore the tradeoffs between white-box and black-

box approaches– Investigate the role of architectural mismatch in

connectors for data system architectures

Page 21: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl

Thank You!

Questions?