OLAP4LD Demo at ESWC 2014
From www.b-kaempgen.de
On this page, we collect information about our demonstration at ESWC 2014.
In our demonstration we will show how changes in modelling are propagated to LDCX by live modifying a published QB dataset. Also, we show common modelling errors in existing QB datasets such as missing dimension rdfs:range or qb:CodeList and observations not adhering to data structure definitions.
Contents
Demonstrating the three-step interface
What will the audience learn? How to explore one dataset.
- Select dataset -> Explore dataset...
- Select measures
- Select dimensions on rows and columns -< Update table...
FAQ
- How does the query look like? MDX query, since datasets are represented as data cubes. Why not directly a SPARQL query? Because OLAP application designers do not know about SPARQL. MDX is specifically designed for analytical queries over multidimensional datasets (cube,measures,dimensions).
SELECT /* $session: 2e72789e-08d7-d14d-2450-c9f4004b04c1 */ NON EMPTY CrossJoin({[httpXXX3AXXX2FXXX2Folap4ldYYYgooglecodeYYYcomXXX2FgitXXX2FOLAP4LDZZZtrunkXXX2FtestsXXX2Fssb001XXX2FttlXXX2FexampleYYYttlXXX23lo_quantity],[httpXXX3AXXX2FXXX2Folap4ldYYYgooglecodeYYYcomXXX2FgitXXX2FOLAP4LDZZZtrunkXXX2FtestsXXX2Fssb001XXX2FttlXXX2FexampleYYYttlXXX23lo_revenue]}, {Members([httpXXX3AXXX2FXXX2Folap4ldYYYgooglecodeYYYcomXXX2FgitXXX2FOLAP4LDZZZtrunkXXX2FtestsXXX2Fssb001XXX2FttlXXX2FexampleYYYttlXXX23lo_suppkeyCodeList])}) ON COLUMNS , NON EMPTY CrossJoin({Members([httpXXX3AXXX2FXXX2Folap4ldYYYgooglecodeYYYcomXXX2FgitXXX2FOLAP4LDZZZtrunkXXX2FtestsXXX2Fssb001XXX2FttlXXX2FexampleYYYttlXXX23lo_custkeyCodeList])}, {Members([httpXXX3AXXX2FXXX2Folap4ldYYYgooglecodeYYYcomXXX2FgitXXX2FOLAP4LDZZZtrunkXXX2FtestsXXX2Fssb001XXX2FttlXXX2FexampleYYYttlXXX23lo_orderdateCodeList])}) ON ROWS FROM [httpXXX3AXXX2FXXX2Folap4ldYYYgooglecodeYYYcomXXX2FgitXXX2FOLAP4LDZZZtrunkXXX2FtestsXXX2Fssb001XXX2FttlXXX2FexampleYYYttlXXX23ds]
- How to drill-down? By adding dimensions.
- What is done in the backend? 1. Loading data cubes. Running normalisation algorithm, checking integrity constraints... 2. Executing MDX query over the data cubes.
Live modifying a published QB dataset
What will the audience learn?
- Pastebin Example Star Schema Dataset: http://pastebin.com/raw.php?i=839G2u72#ds
- Pastebin: http://pastebin.com/839G2u72
- Example modifications:
- Change label of dataset.
- Change discount of first observation.
- Add a new dimension so that error is thrown. ("Failed specification check: IC-4. Dimensions have range. Every dimension declared in a qb:DataStructureDefinition must have a declared rdfs:range.", "Failed specification check: IC-11. All dimensions required. Every qb:Observation has a value for each dimension declared in its associated qb:DataStructureDefinition.")
[ qb:dimension :lo_superkey ]
- Remove rdfs:range or qb:CodeList for skos:Concept dimensions.
Drill-Across Queries
What will the audience learn?
- If you want to query over several datasets, you can use a comma-separated list of datasets.
- http://estatwrap.ontologycentral.com/id/tsdcc310#ds,http://estatwrap.ontologycentral.com/id/t2020_rd310#ds,http://estatwrap.ontologycentral.com/id/tsdec360#ds,http://estatwrap.ontologycentral.com/id/t2020_rd300#ds,http://estatwrap.ontologycentral.com/id/t2020_31#ds,http://estatwrap.ontologycentral.com/id/t2020_50#ds,http://estatwrap.ontologycentral.com/id/t2020_51#ds,http://estatwrap.ontologycentral.com/id/t2020_52#ds,http://estatwrap.ontologycentral.com/id/t2020_53#ds
Example datasets:
Energy dependence: http://estatwrap.ontologycentral.com/id/tsdcc310 (2001-2012) Energy productivity: http://estatwrap.ontologycentral.com/id/t2020_rd310 (2000-2012) Energy intensity: http://estatwrap.ontologycentral.com/id/tsdec360 (2001-2012) Greenhouse gas emissions per capita: http://estatwrap.ontologycentral.com/id/t2020_rd300 (2000-2011) Share of renewable energy: http://estatwrap.ontologycentral.com/id/t2020_31 (2004-2012) People at risk of poverty or social exclusion: http://estatwrap.ontologycentral.com/id/t2020_50 (2004-2012) People living in households with very low work intensity: http://estatwrap.ontologycentral.com/id/t2020_51 (2004 - 2012) People at risk of poverty after social transfers: http://estatwrap.ontologycentral.com/id/t2020_52 (2003-2012) Severely materially deprived people: http://estatwrap.ontologycentral.com/id/t2020_53 (2003-2012)
Common modelling errors
What will the audience learn?
Missing range
- Transparency International Linked Data - Corruption Perceptions Index 2011: http://transparency.270a.info/dataset/CPI2011
- Missing rdfs:range for dimension "source" [1]: "Failed specification check: IC-4. Dimensions have range. Every dimension declared in a qb:DataStructureDefinition must have a declared rdfs:range."
No resolveable URIs
- COINS - 2006-2007 dataset: http://source.data.gov.uk/dataset/coins/coins_fact_table_2006_2007 and http://finance.data.gov.uk/coins/coins_fact_table_2006_2007
- More information, see PlanetData and COINS.
No DataStructureDefinition
- Average annual producer price indices of industrial products, CA 1996 (previous year = 100): http://elpo.stat.gov.rs/lod2/RS-DATA/Prices/Annual_producer_price_indices_of_industrial_products_CA_1996/data_2011_12_06
- DSD: http://elpo.stat.gov.rs/lod2/RS-DATA/Prices/dsd#Annual_producer_price_indices_of_industrial_products_CA_1996
No properly modelled cube
- Reused Eurostat Linked Data Wrapper (http://estatwrap.ontologycentral.com/) to rdfize Eurostat datasets (http://epp.eurostat.ec.europa.eu/): http://eurostat.linked-statistics.org/data/tgs00003
- Dsd talks about dcterms:date, observations use timePeriod, timePeriod has range resource and not Literal.