19
Software Engineering betrieblicher Informationssysteme (sebis) Fakultät für Informatik Technische Universität München wwwmatthes.in.tum.de Prateek Bagrecha, Garching, 12.02.2018, Advisor: Manoj Mahabaleshwar Implementation of an exploratory workbench for identifying similar design decisions 1

Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Software Engineering betrieblicher Informationssysteme (sebis)

Fakultät für Informatik

Technische Universität München

wwwmatthes.in.tum.de

Prateek Bagrecha, Garching, 12.02.2018, Advisor: Manoj Mahabaleshwar

Implementation of an exploratory workbench for identifying similar design decisions

1

Page 2: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Introduction

Research Questions

Requirements

System Design

Process Overview

Implementation Overview

Evaluation

Lessons Learned

Agenda

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 2

Page 3: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 3

Introduction: Design Decisions

In software architecture, Design Decisions are decisions that address

architecturally significant requirements. They are

Hard to make

Costly to Change

Often influence similar concerns Reuse ?

Could knowledge about past decisions be used to make new informed decisions ?

Page 4: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Example: Comparing Two Decisions

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 4

Issues SPARK-8321 SPARK-19625

Description Authorization Support(on all

operations not only DDL) in

Spark Sql

Authorization Support(on all

operations not only DDL) in Spark

Sql version 2.1.0

Concepts Apache, SQL, authentication Apache, SQL, authentication

Keywords Spark, operations, Support,

Authorization

Spark, operations, Support,

Authorization

Components SQL Spark Core, SQL

Issue Type Improvement Improvement

Created 12/Jun/15 03:34 16/Feb/17 09:36

Resolved 16/Jun/16 08:22 24/Mar/17 01:21

Page 5: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

What are the functional and non-functional requirements of a workbench that supports

identifying similar design decisions?

How to identify similar design decisions using context-aware similarity measures and

clustering analysis?

How can a workbench support end-users in identifying the contextual parameters that are

necessary for identifying similar design decisions ?

Research Questions

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 5

Page 6: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Workbench to provide both UI & Restful APIs

for creating & configuring experiments for clustering design decisions

to input new design concern and predict similar past design decisions

Workbench shall abstract all operations related to identifying similar design decisions

Automated import of data from SocioCortex & Amelie knowledge base

Import different data formats

Extension points for using multiple machine learning libraries

Workbench is extensible without significant impact to system design

Requirements

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 6

Page 7: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

System Design

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 7

Page 8: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Process Overview

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 8

Page 9: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Implementation Overview

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 9

Page 10: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

DEMO

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 10

Page 11: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Evaluation Steps

For each design decision from the training dataset

Mark related design decisions (related to, parent tasks, duplicates etc. )

Run Predict Pipeline

Check if the returned results contains related design decision

Datasets

2 Open Source Projects : Apache Solr & Hadoop Common

1 Component Based Cross Project Decisions

Evaluation Strategy

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 11

Page 12: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Evaluation Results

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 12

Project Doc Key Sample Results Cosine

Similarity

Jaccard

Similarity

Duplicate Related (Related

To/Part of/Depends

on)

Solr SOLR-236 SOLR-1311 99.83 10.18 No Yes

SOLR-237 99.42 26.51 No Yes

Solr SOLR-373 SOLR-7986 99.41 60.00 Yes No

SQL

Component

CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes

CARBONDATA-503 93.20 28.57 No Yes

Duplicates with relatively high cosine similarity and Jaccard similarity

Related issues (related to, sub-tasks, duplicated by, parent etc. )

Industrial Impacts

Connected Mobility Lab, Siemens

Do not maintain related issues

Digital Factory- Motion Control, Siemens

Expert Recommendation

Page 13: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

No two machine learning libraries are the same

Different representation of ml models

Different representation of results

Occurrence of distinct decisions in the same cluster Model Tuning & Retraining

Low number of related design decisions across projects

Inability to recognize some related words for example: upsert is related to update.

Lessons Learnt

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 13

Page 14: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018)

Thank you

14

Page 15: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Implement cross functional pipelines working with different libraries within a single

pipeline

Custom implementation of clustering algorithm that supports cosine similarity as distance

measure

Support soft clustering mechanism

Pipeline retraining & model tuning

Extended visualization of results corresponding to clustering algorithms

Further evaluation of the workbench

Future Works

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 15

Page 16: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Evaluation ResultsPerformance

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 16

Project Document

Size (KB)

Members Training

Time

# of

Clusters

Average

cluster size

SocioCortex 603 726 18.93s 20 37

Apache Solr 6411 6175 1.2mins 30 206

Hadoop Commons 4024 6262 46.55s 20 313

SQL Component 14107 10069 1.8 30 334

Page 17: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Motivation

By reusing knowledge from past decisions

Documentation - specifying constraints on similar design decisions

Communication - visual representation of related design decisions

Complexity - Inferring the complexity for addressing similar design decisions

Identifying similarities in design decisions (for an organization)

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 17

Page 18: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Evaluation

Implementation

Experimental Setup using tools

Rapid Miner WEKA

Literature Review

Modelling Design DecisionsClustering Analysis & Similarity Measures for

Textual Documents

Research Methodology

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 18

Page 19: Implementation of an exploratory workbench for · CARBONDATA-440 CARBONDATA-504 93.20 28.57 No Yes CARBONDATA-503 93.20 28.57 No Yes Duplicates with relatively high cosine similarity

Helpful if second reporter could have been informed about the similar design

decision made in past

Reduced time to analyse

Reduced time to resolution

Reduced turn-around time for expert feedback

Exmaple: End User Perspective

Implementation of an exploratory workbench for identifying similar design decisions, Prateek Bagrecha (© Florian Matthes, 2018) 19

Given an open design decision, search the knowledge base for similar earlier made design decisions.