Olap4ld Documentation

From www.b-kaempgen.de
Jump to: navigation, search

General information

Downloading Olap4ld

Building/Installation of Olap4ld

  • Requirements
    • nxparser-1.2.3.jar - already included in sources
    • saxon-9.2.jar - already included in sources
    • xercesImpl.jar - already included in sources
    • olap4j
    • Sesame Triple Store - for use of EmbeddedSesameEngine and SesameEngine

Configurations/Running Olap4ld

  • Olap4ld can either be used via an instance of Olap4j driver or as an Olap4ld Linked Data Engine.

Olap4j driver

See Olap4j connections [1]

import java.sql.*;
import org.olap4j.*;

Class.forName("mondrian.olap4j.MondrianOlap4jDriver");
Connection connection =
    DriverManager.getConnection(
        "jdbc:mondrian:Jdbc=jdbc:odbc:MondrianFoodMart;"
        + "Catalog=/WEB-INF/queries/FoodMart.xml;"
        + "Role='California manager'");
OlapWrapper wrapper = (OlapWrapper) connection;
OlapConnection olapConnection = wrapper.unwrap(OlapConnection.class);
OlapStatement statement = olapConnection.createStatement();

CellSet cellSet =
    statement.executeOlapQuery(
        "SELECT {[Measures].[Unit Sales]} ON COLUMNS,\n"
        + "  {[Product].Members} ON ROWS\n"
        + "FROM [Sales]");
  • A connection with Olap4ld is configured and started using a connection string.
  • For writing a simple program against olap4ld, see example at [2] for olap4j.

Extending Olap4ld with a new Linked Data Cubes Engine

Publishing data for Olap4ld

Programming Olap4ld

API

  • olap4ld provides an API just as any olap4j driver
  • How to use the API
  • Link to Java API
  • The API is tested via "SSBMetadataTest" and "SSBQueryTest".

Developer's Guide

Performance Tuning

  • ...

Roadmap

Roadmap

Add new normalization algorithm part from spec

  • GLD is discussing additional normalisation algorithms.

Improve performance by having more directed uniquenames with segments as in mondrian

  • olap4ld often has to search the entire cube if resolving members, e.g., measures.
  • Here, we can speed up olap4ld by having the dimension mentioned in the uniquename by segments.
  • However, we need to make sure that this is consistent.

Improving catalog, schema handling

  • How do the different platforms define the data sources?
    • Saiku: 1) By the connect string in saiku-datasources, e.g., jdbc:ld://olap4ld;Catalog=

LdCatalog;JdbcDrivers=com.mysql.jdbc.Driver;Server=http://public.b-kaempgen.de:8890/sparql;Database= OPENVIRTUOSO;Datastructuredefinitions=http://fios:saiku,http://fios:saikumanually;Datasets= http://fios:saiku,http://fios:saikumanually 2) By asking metadata queries and OLAP queries.

    • Example_QB_Datasets_QueryTest.java: org.olap4j.RemoteXmlaTester.JdbcUrl=jdbc:ld://olap4ld;Catalog=

LdCatalog;JdbcDrivers=com.mysql.jdbc.Driver;Server=http://;Database= EMBEDDEDSESAME;Datastructuredefinitions=;Datasets=;

    • LDCX (xml-server): OlapDriverConnectionString, e.g., jdbc:ld://olap4ld;Catalog=LdCatalog;JdbcDrivers=

com.mysql.jdbc.Driver;Server=http://;Database=EMBEDDEDSESAME;Datastructuredefinitions=;Datasets=;

      • XMLA can restrict discover data sources with DataSourceName.
    • Slicer (Linked Data Engine, directly):

How do we design to query linked data?

  • Connect string is supposed to define and to setup (e.g., server url, datasets) the Linked Data Engine.
  • setDatabase(String), setCatalog(String), setSchema(String) should all be doing the same thing, somehow

define the datasource to query the metadata and the OLAP data from.

  • I would say, we simply allow to define an arbitrary long, comma-separated string of datasets for setDatabase(String), setCatalog(String), setSchema(String).
  • The good thing about deferrednamedlist would then be that list of cubes, dimensions, etc. are only queried on demand

but then the metadata objects would be fixed (in olap4j) and no further queries would need to be asked.

  • Problem would be that we do not want the Linked Data Engine to load the entire set of datasets.
  • The current problem is simply that visit(IdentifierNode) looks for cubes and as such tries to load

to identify nodes as cubes which throws an error.

  • Currently, we always have the same catalog and schema.
  • We query for specific cubes by asking getCubes() from schema and then asking the DefferedList for

a specific cube .get(cubename).

  • A better solution would be to take into account cataglogs and schemas.
  • For that, Olap4ldConnection manages some attributes, schemaName, catalogName, databaseName:

/** * Name of the database to which the user wishes to bind this connection. * This value can be set through the JDBC URL or via * {@link Olap4ldConnection#setCatalog(String)} */ private String databaseName;

  • If we know that XMLA and other clients set this value not only by the connection string, we can

change their value and depending on their values, execute getCubes() etc. (and also return OlapExceptions directly from getCubes() etc. We would not need to populate a deferred list on get(cubename).

  • However, for now it works.

Aggregation functions are not used by publishers

  • We recommend qb4olap for representing aggregation functions
  • However, aggregation functions are often not made explicit in datasets
  • Still, the measures need to be aggregated somehow in OLAP operations
  • This is relevant for SMART-DB for example where MAX, MIN, SUM, AVG etc. may be useful.
  • If there is an aggregation function given, we use that one.
  • If not, we derive measures (and measure members) for each measure property for each aggregation function (or for single ones according to heuristics)
  • We definitely need a way to define one or more aggregation functions per measure property
    • The name of a measure comprises the concatenated measure property and aggregation function
    • If the aggregation function is given, we use it, if not, we define new measures depending on the range
      • If no range is given: count only
      • If xsd:decimal or xsd:integer is given: avg, sum, min, max, count
      • If xsd:string is given: count

xkos:ClassificationLevels are not used by publishers

  • We need to allow skos:narrower hierarchies
  • However, we have to assume that all concepts are still attached to the code list.

Better GUI for LD-Cubes Explorer

  • Have more intuitive and useful GUI (possibly embedded in webpages and SMW)
  • Dice over measures
  • Dice over dimensions
  • Make errors in modeling more visible to users
  • Link from PlanetData Datasets directly to LD-Cubes Explorer. Showing one default query.
  • Can I publish one static query in SMW or on a website?
  • How to test robustness of LD-Cubes Explorer?
    • Include tests with failed datasets.

Bugs

Retrieving of data sometimes not working

  • If a qb:DataSet links to a qb:DataStructureDefinition which in turn links to qb:DimensionProperties, we may not have sufficient information about those qb:DimensionProperties, since they have not been resolved, yet.
  • Actually, we need to make sure that:
    • If we have the DS and query for Cubes, we load the DS and the DSD
    • If we have the DS and query for Measures, we load the Measures
    • If we have the DS and query for Dimensions, we load the Dimensions
    • If we have the DS and query for Hierarchies, we load the Hierarchies (qb:CodeLists)
    • If we have the DS and query for Levels, we load the Levels (xkos:ClassificationLevels)
    • If we have the DS and query for Members, we load the Members (skos:Concepts)
    • <= for now, "If we have the DS and query for anything, we load the DS, DSD, Measures, Dimensions, Codelists.

CellSet contains duplicates in FIOS demo

  • Check why Saiku returns duplicates in query:
  • Query SIC Code 7389 Mailing List: Business Services, Nec

PREFIX dc: <http://purl.org/dc/elements/1.1/>

PREFIX sdmx-measure: <http://purl.org/linked-data/sdmx/2009/measure#>

PREFIX edgar: <http://edgarwrap.ontologycentral.com/vocab/>

PREFIX gesis-dbpedia-stats2: <http://lod.gesis.org/dbpedia-stats/>

PREFIX smartdbwrap: <http://smartdbwrap.appspot.com/>

PREFIX qb: <http://purl.org/linked-data/cube#>

PREFIX refgovukyear: <http://reference.data.gov.uk/id/year/>

PREFIX refgovukmonth: <http://reference.data.gov.uk/id/month/>

PREFIX dcterms: <http://purl.org/dc/terms/>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX skosclass: <http://ddialliance.org/ontologies/skosclass#>

PREFIX dropedia: <http://agkwebserver2.agk.uni-karlsruhe.de/~dropedia/index.php/Special:URIResolver/>

PREFIX eus: <http://ontologycentral.com/2009/01/eurostat/ns#>

PREFIX smartlocation: <http://smartdbwrap.appspot.com/id/location/>

PREFIX dbpedia: <http://dbpedia.org/resource/>

PREFIX owl: <http://www.w3.org/2002/07/owl#>

PREFIX smartanalysisobject: <http://smartdbwrap.appspot.com/id/analysisobject/>

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

PREFIX rdfh: <http://lod2.eu/schemas/rdfh#>

PREFIX dropedialocal: <http://localhost/Dropedia/index.php/Special:URIResolver/>

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

PREFIX gesis-dbpedia-stats: <http://lod.gesis.org/dbpedia-stats/ns#>

PREFIX refgovukday: <http://reference.data.gov.uk/id/day/>

PREFIX rdfh-inst: <http://lod2.eu/schemas/rdfh-inst#>

select ?httppublicbkaempgende8080vocabissuer1 ?httppurlorgdctermsdate0 AVG(xsd:decimal(?sdmxZZZmeasureXXX3AobsValue)) from <http://fios:saiku> from <http://fios:saikumanually> from <http://fios:saiku> from <http://fios:saikumanually> where {

?obs qb:dataSet ?ds. ?ds qb:structure <http://public.b-kaempgen.de:8080/fios#secyhofdsd>.

?obs <http://public.b-kaempgen.de:8080/vocab#issuer> ?httppublicbkaempgende8080vocabissuer0.

?httppublicbkaempgende8080vocabissuer0 skos:narrower ?httppublicbkaempgende8080vocabissuer1.

?httppublicbkaempgende8080vocabissuer1 skos:member <http://public.b-kaempgen.de:8080/fios#issuerCodeListSicLevel>.

?obs <http://purl.org/dc/terms/date> ?httppurlorgdctermsdate0.

?obs <http://public.b-kaempgen.de:8080/vocab#subject> ?httppublicbkaempgende8080vocabsubject0.

FILTER ( ?httppublicbkaempgende8080vocabsubject0 = <http://public.b-kaempgen.de:8080/vocab/us-gaap-2009-01-31#Assets> OR ?httppublicbkaempgende8080vocabsubject0 = <http://public.b-kaempgen.de:8080/vocab/us-gaap-2011-01-31#Assets> )

OPTIONAL { ?obs <http://purl.org/linked-data/sdmx/2009/measure#obsValue> ?sdmxZZZmeasureXXX3AobsValue. }

}group by ?httppublicbkaempgende8080vocabissuer1 ?httppurlorgdctermsdate0 order by ?httppublicbkaempgende8080vocabissuer1 ?httppurlorgdctermsdate0

  • => It seems, it is not a problem of the query but rather of the display. The created pivot table would contain any date twice. Apparently, CellSet is not well created. Would need bugfixing. Should be fixed in new version of olap4ld.

Simple multi-level hierarchies

  • Often, publishers simply use skos:narrower paths to describe hierarchical relationships between members.
  • It is still open how to have multi-level hierarchies without using xkos:ClassificationLevel: Every member has to be skos:inScheme of the code list, but: The first level of members could be attached to code list via skos:hasTopConcept (or not be linked from any member via skos:narrower), the second level of members could be attached to members from first level via skos:narrower, the third level of members could be attached to members from second via skos:narrower etc. We would simply have a large query unioning over a certain number of level and defining the level depth and level name.

Drill-Across queries

  • So far, only single cubes can be queried using olap4ld.

Handling multiple measures

  • See QB spec: http://www.w3.org/TR/vocab-data-cube/#dsd-mm
  • According to the spec, the multi-measure observations approach is commonly used in applications such as Business Intelligence and OLAP.
  • Problems: "Note that one limitation of the multi-measure approach is that it is not possible to attach an attribute to a single observed value. An attribute attached to the observation instance will apply to the whole observation (e.g. to indicate who made the observation). Attributes can also be attached directly to the qb:MeasureProperty itself (e.g. to indicate the unit of measure for that measure) but that attachment applies to the whole data set (indeed any data set using that measure property) and cannot vary for different observations. For applications where this limitation is a problem then use the measure dimension approach."
  • For now, we only support the "multi-measure approach"

FAQ of Olap4ld

  • ...

Publications

  • Benedikt Kämpgen, Andreas Harth. No Size Fits All – Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views. ESWC 2013, LNCS 7882, Seiten: 290-304, Springer, Heidelberg, Mai, 2013 (http://www.aifb.kit.edu/web/Inproceedings3335)
  • Benedikt Kämpgen, Sean O'Riain, Andreas Harth. Interacting with Statistical Linked Data via OLAP Operations. In C. Unger, P. Cimiano, V. Lopez, E. Motta, P. Buitelaar, R. Cyganiak (eds.), Proceedings of Interacting with Linked Data (ILD 2012), workshop co-located with the 9th Extended Semantic Web Conference, Seiten: 36-49, CEUR-WS.org (http://ceur-ws.org/Vol-913 ), Mai, 2012
  • Benedikt Kämpgen, Andreas Harth. Transforming Statistical Linked Data for Use in OLAP Systems. I-SEMANTICS 2011, 7th Int. Conf. on Semantic Systems, ACM, September, 2011 (http://www.aifb.kit.edu/web/Inproceedings3211)

Help