Olap4ld Query Execution

From www.b-kaempgen.de
Jump to: navigation, search

Design

Basic MDX query

SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
  {[Product].members} ON ROWS
FROM [Sales]
WHERE [Time].[1997].[Q2] 

MDX Parse Tree of Basic MDX query

  • MDX Parse Tree
    • MDX Object Model of olap4j [1]
    • Assumptions, see MDX documentation, e.g., [2]
      • Each axis contains a list of list of members (= position list,

Query Tree Basic MDX Parse Tree.png

Logical OLAP Operator Tree

We have devided to use a common algebra of OLAP operations for query execution. Such operations can directly be implemented as operations over the conceptual model of Data Cubes and as user-friendly interactions in user interfaces (see three step interface of Linked Data Cubes Explorer as an example).

Most analysis applications work on relations and thus use relational algebra. OLAP algebra works on multidimensional datasets but apart from that is similar to relational algebra. Also, operations on the data can also be implemented using relational algebra (see ROLAP).

The QB data model is built on the SDMX data model.

SDMX provides query capabilities over the Web, e.g., using "Query Messages" that allow to query for data, concepts, code lists, data and metadata structure definitions.

Also, RESTful access is possible. As an example query language over SDMX data published over the web using the RESTful principles, there is the SDMX-JSON API. The language allows to query for specific data from SDMX datasets. However, no Drill-Across, Slice and Roll-Up seem to be supported.

See the following list for the signature of OLAP operators supported by olap4ld:

Logical OLAP Operators

  • BaseCubeOp: Base-Cube is defined as Base−Cube : URI → DataCube allows to initialise a cube from the URI of an instance of

qb:DataSet.

  • ProjectionOp: Projection is defined as Projection : DataCube × 2Measure → DataCube and selects measures from the input cube; all non-selected measures are removed from the cube.
  • DiceOp: Dice is defined as Dice : DataCube × 2Hierarchy × 2V → DataCube and allows to filter for certain dimension members.
  • SliceOp: Slice is defined as Slice : DataCube × Dimension → DataCube and removes a dimension from the input cube, i.e., removes this dimension from all selection lists over which to aggregate. We evaluate Slice by setting the tuple element of the dimension to ALL.
  • RollupOp: Roll-Up is defined as Roll−Up : DataCube × 2Hierarchy × 2Level → DataCube and allows to create a cube that contains instance data on a higher aggregation level. Note, Drill − Down can be seen as an inverse operation to Roll −U p.
  • DrillAcrossOp: Drill-Across is defined as Drill−Across : DataCube × DataCube → DataCube allows to combine instance data from two cubes into a new cube.

The the following figure for an example nested set of OLAP operators (logical query plan):

Query Tree Basic Logical Olap Query Tree.png

Why use OLAP operators? Why not simply visualise a QB datasets (or SDMX dataset etc.) using Excel?

  • An import of the observations of a QB dataset is no problem. One could easily write an import script that takes in a qb:DataSet URI and returns a table with all observations in the rows and their dimension values in the columns (something similar, we have shown by using common OLAP engines for query processing over QB datasets). Applications such as Excel Pivot can use such table as a data source and allow pivot analysis capabilities and visualisations for users.
  • The problem is: Once you have integrated the data there, you loose the connection to Linked Data. If new data becomes available, a reload of all data will be necessary.
  • Also, you loose all benefits from having a data structure definition (from Etcheverry and Vaisman and others):
    • to assist in the integration process of different data sets <= How. By using resources for identification, with labels for humans.
    • to provide a self-contained description of the contents of the data set, which allows for example to implement client applications and operators
    • to enable the verification that the data set instances match the expected structure

Also, in this approach, one leaves out a lot of possible pre-processing, e.g., retrieving data from distributed sources, entity consolidation, validation/normalisation, updates, ...

Optimised OLAP Logical Operator Tree

XXX Figure

Physical OLAP Operator Tree

XXX Figure

Background info

  • See Database Systems - The Complete Book:
  • Why do we need a special query language?
    • By limiting what we can say or do in query language: 1) ease of programming 2) highly optimized code
  • What is an Algebra?
    • operators and atomic operands
    • e.g., algebra of arithmetic: atomic operands are variables and constants, operators are addition, subtraction...
    • Algebra allows to build expressions by applying operators to operands or other expressions
  • Relational algebra?
    • attomic operands: variables, constants
    • operations: set, removing, combining, renaming
  • OLAP algebra
    • attomic operands: cubes
    • operations: set, drill-across, slice, dice, roll-up, projection, base

Nodes in Tree


    ParseTreeNode is a node in a parse tree representing a parsed MDX statement.
    SelectNode represents a SELECT statement, including FROM and WHERE clauses if present.
    AxisNode represents an axis expression.
    CallNode represents a call to a function or operator.
    IdentifierNode represents an identifier, such as Sales or [Measures].[Unit Sales].
    LiteralNode represents a literal, such as 123 or "Hello, world!".
    MemberNode represents a use of a member name in an expression.
    LevelNode represents a use of a level name in an expression.
    HierarchyNode represents a use of a hierarchy name in an expression.
    DimensionNode represents a use of a dimension name in an expression.
    WithMemberNode represents a WITH MEMBER clause defining a calculated member.
    WithSetNode represents a WITH SET clause defining a calculated set.
    PropertyValueNode represents property = value pair as part of a the declaration of a calculated member or set.
    CubeNode

  • OLAP Logical Operator Tree
    • Related Work
      • Romero, O. (2007). On the Need of a Reference Algebra for OLAP. Data Warehousing and Knowledge Discovery, 99–110. Retrieved from http://www.springerlink.com/index/44w1137718585665.pdf
      • Kämpgen, B., Riain, S. O., & Harth, A. (n.d.). Interacting with Statistical Linked Data via OLAP Operations. ESWC 2012 Selected Workshop Papers. Springer Verlag, 2012 (in press). (Extended Paper published in Conf. Proceedings.).
        Selection (aka Dice, selection in Relational Algebra)
        Roll-up (level -> level, group-by and aggregation in RA)
        Projection (selects subset of measures, projection in RA)
        ChangeBase (map to other coordinate system)
        Drill-across (join and projection of second cube's columns in RA)
        Set Operations (set operations Union, Difference, Intersection in RA; not supported in MDX)
  • Optimised OLAP Logical Operator Tree
  • OLAP Physical Operator Tree
...

Related Work query optimizers

  • Farrago Query Optimizer
  • Query Flow [3]