13
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/322638429 A Distributed Self-adaption Cube Building Model Based on Query Log Chapter · January 2018 DOI: 10.1007/978-3-319-74521-3_41 CITATIONS 0 READS 72 4 authors, including: Some of the authors of this publication are also working on these related projects: Wechaty View project Conversational AI CLUB View project Zhuohuan Li 1 PUBLICATION 0 CITATIONS SEE PROFILE All content following this page was uploaded by Zhuohuan Li on 03 August 2018. The user has requested enhancement of the downloaded file.

A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/322638429

A Distributed Self-adaption Cube Building Model Based on Query Log

Chapter · January 2018

DOI: 10.1007/978-3-319-74521-3_41

CITATIONS

0READS

72

4 authors, including:

Some of the authors of this publication are also working on these related projects:

Wechaty View project

Conversational AI CLUB View project

Zhuohuan Li

1 PUBLICATION   0 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Zhuohuan Li on 03 August 2018.

The user has requested enhancement of the downloaded file.

Page 2: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

A Distributed Self-adaption Cube Building

Model Based on Query Log

Meina Song, Mingkun Li, Zhuohuan Li,Haihong E, and Zhonghong Ou

Beijing University of Posts and Telecommunications, China {mnsong,dangshazi,

lizhuohuan, ehaihong,zhonghong.ou}@bupt.edu.cn

Abstract Among the diverse distributed query and analysis engine, Kylin have

gained wide adoption since its various strengths. By using Kylin, users can in-

teract with Hadoop data at sub-second latency. However, it still has some dis-

advantages. One representative disadvantage is the the exponential growth of

cuboids along with the growth of dimensions. In this paper, we optimize the cu-

boid materialization strategy of Kylin by reducing the number of cuboids based

on the traditional OLAP optimization method. We optimize the strategy mainly

from two aspects. Firstly, we propose Lazy-Building strategy to delay the con-

struction of nonessential cuboid and shorten the time of cuboid initialization.

Secondly, we adopt Materialized View Self-adjusting Algorithm to eliminate

the cuboids which are not in use for a long period. Experimental results demon-

strate the efficacy of the proposed Distributed Self-Adaption Cube Building

Model. Specifically, by using our model, cube initialization speed has increased

by 28.5 percent points and 65.8 percent points space are saved, comparing with

the cube building model of Kylin.

Keywords: Distributed OLAP, Distributed Query Processing System, Kylin,

Query Log, Materialization Strategy

1 Introduction

In the era of big data, many modern companies produce huge amounts of data in

their service lines. These data are used to conduct report analysis based on OLAP

analysis. In order to conduct report analysis, companies need a system which can

response to the query of thousands of data analysts at the same time. That requires

high scalability, stability, accuracy and speed of the system. In fact, there doesn’t

exist a widely-accepted method in distributed OLAP field. Many query engines

can also conduct report analysis, such as Presto [4], Impala [2], Spark SQL [14]

or Elasticsearch [10], but they are more emphasis on data query and analysis. As a

matter of fact, Kylin [7] is the specialized tool in Distributed OLAP field which is

used often.

Kylin is originally developed by eBay, and is now a project of the Apache

Software Foundation. It is designed to accelerate analysis on Hadoop and allow

the use of SQL-compatible tools. It also provides a SQL interface and supports

Page 3: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

2

multidimensional analysis on Hadoop for extremely large datasets. Kylin can

reach the scale of one million or even millisecond OLAP analysis. So it is very

frequently-used in the domestic IT industry.

The idea of Kylin is not original. Many technologies in Kylin have been used

to accelerate analysis over the past 30 years. These technologies involve storing

pre-calculated results, generating each level's cuboids with all possible combina-

tions of dimensions, and calculating all metrics at different levels. Essentially,

Kylin extends the methods of traditional OLAP field to the distributed field, gen-

erating Cube on Hadoop ecology.

When data becomes bigger, the pre-calculation processing becomes impossible

even with powerful hardware. However, with the benefit of Hadoop's distributed

computing power, calculation jobs can leverage hundreds of thousands of nodes

[9]. This allows Kylin to perform these calculations in parallel and merge the final

result, thereby significantly reducing the processing time.

Data cube [5] construction is the core of Kylin, it has two characteristics: one is

the exponential growth of cuboids [5] along with the growth of dimensions; the

other is the large amount of IO due to increased number of cuboids. The cube is

usually very sparse, the increase of sparse data will waste a lot of computing time

and memory space.

A full n-dimensional data cube could contains cuboids [5]. However, most

of cuboids are not used, because most of query requested by data analyst follow

the normal distribution. That's a waste of IO and memory.

In this paper, we propose a self-adaption cube building model which adopts a

method called lazy-building cuboids and abandons useless cuboids based on que-

ry log. It can reduce the cube construction time and cube size a lot to save IO and

memory. The paper is structured as follows. In Section II, we present the back-

ground. In Section III, we introduce the design and implementation details of the

self-adaption cube building model. In Section IV, we focus on experimental eval-

uation. Finally, in Section V, we discuss the Self-Adaption Cube Building Model

and give a summary of the paper.

2 Background

2.1 Cube Calculation Algorithm

There are several strategies of data cube materialization [11] to reduce the cost of

aggregation calculation and increase the query processing efficiency including

iceberg cube calculation Algorithm [3], condensed cube calculation Algorithm

[15], shell fragment cube calculation Algorithm [13], approximate cube calcula-

tion Algorithm [17], and time-series data stream cube calculation Algorithm [6].

They are all based on Partial Materialization [16], which means that a data sub-

cube is selected and pre-calculated according to specific methods. Partial Materi-

alization is a compromise between storage space, cost of maintenance and query

processing efficiency.

Page 4: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

3

In the process of iceberg cube calculation, sub-cubes which are higher than the

minimum threshold are aggregated and materialized. Beyer proposed BUC algo-

rithm [12] for iceberg cube calculation, which is widely-accepted.

According to the order of cuboid calculation, the methodologies of aggregation

calculation can be divided into two categories: top-down and bottom-up.

1. Top-Down: Firstly, calculate the metric of the whole data cube, and then the recur-

sive search is performed along each dimension. Secondly, check the conditions of

the iceberg, prune branches that do not meet the condition. The most typical algo-

rithm is BUC algorithm, it perform best on sparse data cube.

2. Bottom-Up: Starting from the base cuboids, compute high level cuboid from the

low level cuboid in the search grid according the parents-children relationship.

Typical algorithms are Pipesort algorithm, pipehash algorithm, overlap algorithm

and Multiway aggregation algorithm [18].

However, Kylin doesn't follow the principle of partial materialization. In order

to reduce unnecessary redundant calculation and shorten the cube construction

time, Kylin adopts a Method called By Layer Cubing, which is a distributed ver-

sion of the Pipesort algorithm, a kind of bottom-up algorithm [1].

2.2 By Layer Cubing

As its name indicates, a full cube is calculated by layer: N dimension, N-1 dimen-

sion, N-2 dimension, until 0 dimension; Each layer's calculation is based on it's

parent layer (except the first, which base on source data); So this algorithm need

N rounds of MapReduce running in sequence [8]; In the MapReduces, the key is

the composite of the dimensions, the value is the composite of the measures;

When the mapper reads a key-value pair, it calculates its possible child cuboids;

For each child cuboid, remove 1 dimension from the key, and then output the new

key and value to the reducer; The reducer gets the values grouped by key; It ag-

gregates the measures, and then output to HDFS; One layer's MR is finished;

When all layers are finished, the cube is calculated. The following Fig.1 describes

the workflow:

It has some disadvantage:

1. This algorithm causes too much shuffling to Hadoop; The mapper doesn't do ag-

gregation, all the records that having same dimension values in next layer will be

omitted to Hadoop, and then aggregated by combiner and reducer;

2. Many reads/writes on HDFS: each layer's cubing need write its output to HDFS for

next layer MR to consume; In the end, Kylin need another round MR to convert

these output files to HBase HFile for bulk load; These jobs generates many inter-

mediate files in HDFS;

All in all: the performance is not good, especially when the cube has many di-

mensions.

Page 5: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

4

Fig. 1. By Layer Cubing. Each level of computation is a MapReduce task, and serial execution.

A N dimensional Cube needs N times MapReduce Job at least.

2.3 By Segment Cubing

In order to solve these shortcomings above, Kylin develops a new cube building

algorism called by segment cubing. The core idea is, each mapper calculates the

feed data block into a small cube segment (with all cuboids), and then output all

key/values to reducer; the reducer aggregates them into one big cube segment,

finishing the cubing; Fig. 2 illustrates the flow;

Fig. 2. By Segment Cubing.

Page 6: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

5

Compared with By Layer Cubing, the By Segment Cubing has two main dif-

ferences:

1. The mapper will do pre-aggregation, this will reduce the number of records that the

mapper output to Hadoop, and also reduce the number that reducer need to aggre-

gate;

2. One round MR can calculate all cuboids;

Based on the work mentioned above, we take advantage of both two Algorithm,

and optimize cuboid materialization strategy.

3 Design and Implementation

In this section, we first introduce the architecture of Self-adaption Cube Building

Model (SCBM) and the overall workflow. Then we explain cuboids Lazy-

Buliding and the cuboid spanning tree. Finally, we describe the implementation

details of the Materialized View Self-Adjusting Algorithm.

3.1 Architecture of Self-Adaption Cube Building Model

The overall architecture of self-adaption cube building model is illustrated in fol-

lowing Fig. 3.

Fig. 3. Architecture of Self-Adaption Cube Building Model.

Self-adaption cube building model takes fact table [5] as input of the overall

system, usually the fact table is managed by distributed Data Warehouse Hive.

We first set the parameters of the cube model, such as the filed for analysis, the

base cuboid level and so on, then we build base cuboids in a mapper-reduce. After

the construction of the base cuboids, the system can support query request, Query

Page 7: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

6

execution engine [7] resolves the query to find the required cuboids. If the cuboid

has been generated, the query will be executed; if the cuboid is missing, the Lazy

building module will be triggered to build the cuboid using the method in Section

3.2. When the query result returns, the system records the query log and waits for

the adjustment of cube launched by Self-Adaption module according to the Mate-

rialized View Self-Adjusting Algorithm explained in Section 3.3. At the same

time, the system maintains a dynamic cube spanning tree to store the metadata of

cuboids.

3.2 Cuboid Spanning Tree and Lazy-Buliding

Cuboid Spanning Tree In original By Layer Cubing, Kylin calculates the cu-

boids with Broad First Search (BFS) order, which causes a waste of memory. On

the contrary, Cuboid Spanning Tree generates cuboid with Depth First Search

(DFS) order to reduce the cuboids that need be cached in memory. This avoids

unnecessary disk and network I/O, and the resource Kylin occupied is highly re-

duced;

With the DFS order, the output of a mapper is fully sorted (except some special

cases), as the row key of cuboid is composed of cuboid ID and dimension values

like [ Cuboid ID + dimension values ], and inside a cuboid the rows are already

sorted. Since the outputs of mapper are already sorted, Shuffles sort would be

more efficient.

In addition, DFS order is vary suitable for cuboid's lazy building. Cuboid span-

ning tree also record the metadata of cuboids in every node in the tree, it provides

the basis for the selection of ancestor cuboids.

Lazy-Building Lazy-Building is a basic concept of the model. In order to reduce

the number of cuboids, we adopt the strategy of generating on demand. At the

same time, we persist all cuboids on the low layer in the By-Layer cubing algo-

rithm for higher speed and lower computational complexity of Lazy-Buliding. For

example: a cube has 4 dimensions: A, B, C, D; Each mapper has 1 million source

records to process; The column cardinality in the mapper is Card(A), Card(B), Card(C) and Card(D). The Lazy-Buliding is demonstrated in Fig. 4.

1. User set a base-layer parameter in cube model info to control the scale of base cu-

boids layer. If this parameter is not set, it will use default value log (dimensions) +

1.

2. Base cuboids building module import data from fact table and build the base cu-

boids by Cube Bulid Engine in Kylin.

3. Update the Cuboid Spanning Tree and save the metadata.

4. Client launch a query select avg (measure i) from table group by C which hit the

missing cuboid [C]. Then, lazy building module receive request to build cuboid

[C].

5. Lazy building module find a cuboid generation path according to Ancestor Cu-

boids Selection and build the missing cuboid to response the query as soon as pos-

sible.

Page 8: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

7

6. Record the path and determine whether to build all the cuboid on the path at low

load according to Materialized View Self-Adjustment Algorithm.

Ancestor Cuboids Selection When the needed cuboids is missing, we should

select an ancestor cuboid and a cuboid generation path. The basic principle is to

choose the ancestor cuboid whose measures are the least to aggregate, which

means we can get the minimum amount of computation and time to generate the

missing cuboid. After that, we need to find a path P from ancestor cuboid to the

missing cuboid in compliance with the Minimum cardinality principle.

For example, in the Fig. 4. In order to generate the missing cuboid [C], we

firstly find all the candidate cuboids [A B C] [A C D] [B C D]. Then, we compare

the size of the three candidate cuboids. Assuming [B C D] is selected, we gener-

ate [C] by aggregate [B C D] on dimension B, D. The cube is enough to response

the query. However, for the sake of maintenance of cube according to By Layer

Cubing, we need to find a path from [B C D] to [C].

Fig. 4. Lazy-Building and Ancestor Cuboids Selection.

When aggregating from parent to a child cuboid, assuming that from base cu-

boid [B C D] to 1-dimension cuboid [C], There are two paths: [B C D] [B C] [C]

and [B C D] [C D] [C]. We assume Card(D) > Card(B) and the dimension A is

independent with other dimensions, after aggregation, the cuboid [BCD]'s size

will be about 1/Card(D) or 1/Card(B) of the size of base cuboid; So the output

will be reduced to 1/Card(D) or 1/Card(B) of the original one in this step. So we

choose the first path, the records that written from mapper to reducer can be re-

Page 9: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

8

duced to 1/Card(D) of original size; The less output to Hadoop, which means less

I/O and computing and the model can attain better performance.

3.3 Materialized View Self-Adjustment Algorithm

Self-adpation module adjusts the cube according to the Materialized View Self-

Adjusting Algorithm. This chapter proposes a query statistics method which takes

fixed times of queries as a statistical period, and this method updates the corre-

sponding query statistics. This method adjusts materialized views set according to

the threshold of elimination and generation, stabilizes the query efficiency, and

minimizes the shake of materialized view.

Query Statistics Method A kind of Statistics Method for query.

Definition 1 : Materialized view adjustment cycle

The materialized view adjustment cycle can be customized to a fixed number

of queries, for example, every 100 queries for a materialized view adjustment

cycle.

Definition 2 : Average query statistics

Since the actual query may change over time, the query set should also be ad-

justed accordingly. For example, a query that has not been executed in a couple of

cycles should be removed from the query collection and the corresponding mate-

rialized view is deleted.

After many queries, the query log will accumulate a certain amount of query

records, this paper presents a query statistical method based on the query log

which described in the following. If there is a query set Q = { , , . . . , },

and the query log set L, scan forward the log file from the ending of the log file

and determine whether there is a query qi in the cycle , and update

according to Equation 1:

(1)

In the formula, α is a weighted coefficient, a constant; is the query set in

the cycle. By this method, we can monitor the change of the query set Q,

which can greatly reduce the shake of materialized view.

Materialized View Self-Adjustment Algorithm The main steps of the material-

ized view set adjustment with the query changes are listed as follows:

1. Prior to the adjustment, initialize materialized view set M = {base cuboids}, the

corresponding query task set to Q.

2. During the query, the query is written into the query log L, and the query counter is

accumulated

3. Set the threshold of elimination T and the threshold of generation S, update the Av-

erage query statistics each life cycle , and determine whether eliminate or mate-

rialize corresponding views.

Page 10: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

9

The pseudo code of Materialized View Self-Adjustment Algorithm is showed

in Algorithm 1. Input: Query Log L, Materialized View M, Query Task set

Q, Materialized view adjustment cycle , Threshold Of

Elimination T, Threshold of generation S, Path set from

ancestor cuboid to the missing cuboid P Output: Materialized View after Adjustment M

1 Gets the current query count value count;

2 if count % then

3 for j = 1; j ≤ ; j++ do

4 scan forward the log file from the ending of the

log file ;

5 update query task set Q according to and ;

6 end

7 end

8 update according to formula 2;

9 for each in Q do

10 if ≥ S then

11 materialize views m corresponding to ;

12 M.add(m);

13 end

14 else if ≤ T then

15 eliminate views m corresponding to ;

16 M.delete(m);

17 end

18 end

19 Return M;

Algorithm 1: Materialized View Self-Adjustment Algorithm

In the above algorithm, from line 1 to line 8, it scans the query log in a statisti-

cal period , and update the query task set Q during the scanning. From line 9 to

line 17, it iterate around query in Q, and determine whether eliminate or material-

ize corresponding views according to the comparison of threshold and the

calculated by formula 2. Suppose query task set Q contains k different

query, then the time complexity of the algorithm is O( + k).

4 Experimental Evaluation

4.1 Dataset

To test performance, we use the standard weather dataset from the China Meteor-

ological Data network. The dataset contains 4726499 weather records from Chi-

na's 2170 distinct counties started from January 1, 2011 to January 1, 2017. The

original dataset is too complicated. In order to better conduct the experiment, we

Page 11: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

10

select eight dimensions: Province, city, county, date, weather, wind direction,

wind speed, air quality level and two measures: Maximum temperature, Minimum

temperature.

4.2 Evaluation Metrics

We use cube first construction time, average query time and cube size as the

evaluation metrics of our proposed method.

Cube First Construction Time refers to the base cuboids building time for self-

adaption cube building model.

Average Query Time is defined as the average query time during materialized

view adjustment cycle ~ . The detailed calculation is listed in Equation 2.

(2)

Cube Size refers to the disk allocation that the whole cube takes up.

4.3 Experimental Results

We first compare the metric of cube first construction time. Because the parame-

ter of base-layer has a great impact on this metric, in order to reflect the average

condition, we use the default value log (dimensions) + 1. We test the model 5

times and the results were aggregated to calculate averages which can avoid the

impact of MapReduce failure. Results can be seen from table 1, the time con-

sumption of the new model is reduced by 28.5%.

Table 1. Cube first Construction Time

Test Result Test 1 Test 2 Test 3 Test 4 Test 5 Average origin Kylin cube building model 92min 83min 86min 104min 91min 91.2min self-adaption cube building model 64min 61min 83min 57min 61min 65.2min

Fig. 5. Average query time trends in T1 ∼ T30.

Page 12: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

11

In query time, we set Materialized view adjustment cycle 50 and test 30 cy-

cles ~ . We can observe that Cuboid hit rate and query response time signif-

icantly increased and improved along with the increase of query requests. Finally,

the query efficiency of the two models are almost on a par.

Fig. 6. Average cube size trends in T1 ∼ T30.

In cube size, we see that the curve that represents this metric tends to be stable

after vibration in prophase from Fig. 6. Finally, the spaces consumption of the

proposed model was reduced by 65.83%.

5 Conclusion

We have presented a Distributed Self-Adaption Cube Construction Model Based

on Query Log and applied it to a weather dataset to test its performance. Our

model adopts a special partial materialization strategy and it can automatically

adjust the cuboid set which is used in query request according to query log. Based

on experimental results, the proposed model can reduce the cube construction

time and cube size to a great extent at the expense of tiny query efficiency reduc-

tion. However, this model has good performance only when the query distribution

is relatively concentrated. So users can choose either of the two models according

to their practical business query scenario. Overall, the proposed model is of great

practical significance in the application of BI tools. In the next stage, we will

optimize the base cuboids generation strategy to reduce the query latency in pro-

phase.

6 Acknowledgement

This work is supported by the National Key project of Scientific and Technical

Supporting Programs of China (Grant No. 2015BAH07F01); Engineering Re-

search Center of Information Networks, Ministry of Education.

Page 13: A Distributed Self-adaption Cube Building Model Based on ...static.tongtianta.site/paper_pdf/99cdd5a8-b65a-11e9-9079-00163e08bb86.pdfmatter of fact, Kylin [7] is the specialized tool

12

7 References

1. Ying Chen, Frank Dehne, Todd Eavis, and Andrew Rau-Chaplin. Parallel rolap data cube

construction on shared-nothing multiprocessors. Distributed and parallel Databases,

15(3):219–236, 2004.

2. Impala. http://impala.apache.org/, 2017. [Online; accessed 13-April-2017].

3. Prasad M Deshpande, Rajeev Gupta, and Ashu Gupta. Distributed iceberg cubing over

ordered dimensions, March 16 2015. US Patent App. 14/658,542.

4. Presto. https://prestodb.io/, 2017. [Online; accessed 13-April-2017].

5. Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali

Venkatrao, Frank Pellow, and Hamid Pirahesh. Data cube: A relational aggregation operator

generalizing group-by, cross-tab, and sub-totals. Data mining and knowledge discovery,

1(1):29–53, 1997.

6. Mateusz Kalisch, Marcin Michalak, Piotr Przysta lka, Marek Sikora, and Lukasz Wr obel.

Outlier detection and elimination in stream data–an experimental approach. In International

Joint Conference on Rough Sets, pages 416–426. Springer, 2016.

7. Kylin. http://kylin.apache.org/, 2017. [Online; accessed 13-April-2017].

8. Suan Lee, Jinho Kim, Yang-Sae Moon, and Wookey Lee. Efficient distributed parallel

top-down computation of rolap data cube using mapreduce. In International Conference on

Data Warehousing and Knowledge Discovery, pages 168–179. Springer, 2012.

9. Feng Li, M Tamer Ozsu, Gang Chen, and Beng Chin Ooi. R-store: a scalable distributed

system for supporting real-time analytics. In Data Engineering (ICDE), 2014 IEEE 30th Inter-

national Conference on, pages 40–51. IEEE, 2014.

10. Elasticsearch. https://www.elastic.co/products/elasticsearch, 2017. [Online; accessed 13-

April-2017].

11. Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed cube

materialization on holistic measures. In Data Engineering (ICDE), 2011 IEEE 27th Interna-

tional Conference on, pages 183–194. IEEE, 2011.

12. Yongge Shi and Yiqun Zhou. An improved apriori algorithm. In Granular Computing

(GrC), 2010 IEEE International Conference on, pages 759–762. IEEE, 2010.

13. Rodrigo Rocha Silva, Celso Massaki Hirata, and Joubert de Castro Lima. Computing big

data cubes with hybrid memory. Journal of Convergence Information Technology, 11(1):13,

2016.

14. Sqark sql. http://spark.apache.org/sql/, 2017. [Online; accessed 13-April-2017].

15. Wei Wang, Jianlin Feng, Hongjun Lu, and Jeffrey Xu Yu. Condensed cube: An effective

approach to reducing data cube size. In Data Engineering, 2002. Proceedings. 18th Interna-

tional Conference on, pages 155–165. IEEE, 2002.

16. Ying Xia, Ting Ting Luo, Xu Zhang, and Hae Young Bae. A parallel adaptive partial

materialization method of data cube based on genetic algorithm. 2016.

17. Dan Yin, Hong Gao, Zhaonian Zou, Jianzhong Li, and Zhipeng Cai. Approximate ice-

berg cube on heterogeneous dimensions. In International Conference on Database Systems for

Advanced Applications, pages 82–97. Springer, 2016.

18. Yihong Zhao, Prasad M Deshpande, and Jeffrey F Naughton. An array-based algorithm

for simultaneous multidimensional aggregates. In ACM SIGMOD Record, volume 26, pages

159–170. ACM, 1997.

View publication statsView publication stats