Analysis of OLAP queries execution for evaluating a fragmentation schema in data warehouse

 

 

 

 

Hacène Derrar

USTHB, Faculty of electronic and computer science, LSI Bp 32 El Alia 16111, bab Ezzouar/ Algiers/AlgeriaCette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.';document.getElementById('cloak381b389f745e346e4bacf5bb48d07cc5').innerHTML += ''+addy_text381b389f745e346e4bacf5bb48d07cc5+'<\/a>';


Omar Boussaid

University of Lyon 2, ERIC Laboratory

5 avenue Pierre Mendes-France 69676 Bron Cedex-France

Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.

 

 

Mohamed Ahmed-Nacer

USTHB, Faculty of electronic and computer science, LSI Bp 32 El Alia 16111, bab Ezzouar/ Algiers/Algeria

Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.

 

 

 

Abstract—  Data  fragmentation  improves  query  performance and data administration. The fragmentation schema is designed from statistical analysis of data access collected from the execution of more frequent queries. The application of data fragmentation to data warehouse is faced with the evolution of data model and workload, due more particularly to the specific characteristics of OLAP queries. So, at the end of data fragmentation, there are no techniques to know that the implemented   schema   remain   the   more   optimal   and   the produced partitions are relevant. We propose in this paper an approach based on analysis of the OLAP queries execution to evaluate the effectiveness and the relevance of the data fragmentation schema.

 

Keywords:      Data   warehouse,   data   fragmentation,   OLAP

queries, optimization.

 

 

I.INTRODUCTION

Data warehousing applications [1,2,3] cope with enormous amounts of data ranging in Gigabytes and Terabytes. While transactional (OLTP, online transaction processing) DBMS like bank applications usually use simple query patterns to retrieve a very small part of a database (usually one record) by a primary key access, data processing in data warehousing (OLAP, online analytical processing) involves complex queries that usually access a large portion of the database [4,6].

 

The design for data warehouse (DW) is known as a star schema, which consists of a set of dimension tables and a fact table. Each dimension table is related to the fact table in a one- to-many fashion. The fact table contains a number of measures. A  measure is  usually of  numerical type,  which represents an important performance indicator such as sales, unit price, or number of units sold. The values for a measure are often called facts. Each fact value is associated with an array of dimension members, which relate the fact to its “context”. Data for dimensions are stored in dimension tables. Facts are stored in a  fact table whose primary key is the composite of foreign keys from all dimension tables.

 

Data processing in DW applications retrieves aggregated measures organized or classified according to several dimensions or hierarchies over the dimensions. For this reason multidimensional data models MD, multidimensional query languages or the OLAP approach have been developed by the


research   community   and   implemented   as   commercial products. Typical OLAP operations are drill-down, roll-up and slice-and-dice [3] and usually multiple dimensions are restricted at the same time. In general one can state that these operations in a MD model lead to range restrictions on the lowest hierarchy level of each dimension [5].

 

So,  clustering, indexing and  fragmentation (also  called partitioning) are common techniques to facilitate the administration, the management and the maintenance of data warehouse and more particularly to improve query performance.

 

Thus, traditional fragmentation techniques and more particularly horizontal and vertical fragmentation, developed in relational DBMS, were applied to the data warehouse.

 

Partitioning tables, indexes and materialized views in fragments  stored  and  accessed  separately  improve significantly data manageability, accessibility and query execution time. Horizontal partitioning consists in dividing a relation into partitions with the same schema. Each one preserves part of the tuples according to restriction criteria. Vertical partitioning, it consists in dividing a relation into partitions of different schema, by projection with duplicating the key. As for a mixed fragmentation technique proposed simultaneously applies horizontal and vertical fragmentation on a relation (or vice versa) [12].

 

These approaches are designed from a statistical analysis of more frequent queries based on both qualitative and quantitative information. So,  algorithms used  to  design an optimal  partitioning  schema  are  static  algorithms.    Their entries are bases on workload gathered from data exploitation. If a change occurs in the inputs of these algorithms, they must be rerun to determine a new optimal fragmentation schema. Moreover, these algorithms are based on the clustering principle which is considered as combinatorial problem and requires for its resolution to use heuristics methods [7]. So, in the case of models evolution and / or changes in workload these algorithms become very complicated, or unworkable.

 

In the context of relational and object oriented databases and in any environment (centralized, parallel, distributed) much of the literature has addressed this issue. Researchers concentrate their  work on data  redistribution or  fragments reallocation in the event of performance degradation. So, it was considered that the solution lies at the physical level by applying  load  balancing  strategies  of  treatment  and  data

 

between nodes. The logical aspect, namely the design of the fragmentation schema,  itself,  remains  adapted  because  the workload is almost stable.

 

Conversely, in  data  warehousing the  evolution of  data model and workload are dynamic. This is due more particularly to the specific characteristics of OLAP queries. So, an inappropriate and badly conceived fragmentation schema have a considerable influence on the   system’s performance and more particularly during the execution of the expensive operations such  as  the  joint  and  the  multi-joint which characterize the decisional queries.   In [21] authors have clearly demonstrated through theorems and lemma that the choice of partition keys and how to arrange the records in the  fact  table  have  a  great  impact  on  the  OLAP  queries

 

 

télécharger l'article