v0.1.0
Jamie Clark
Dimensional Insight
business analytics and
integrated business intelligence solutions
Diver Platform
Supply Chain, Manufacturing,
Healthcare, Beverage Alcohol, ...
> spectre dive basic.cbase \
-q -d Direction -c Value
Direction Value
East 2,937.7687
North 2,520.4438
South 1,641.8041
West 2,924.6749
15+ years building business intelligence tools
author of DivePort
Spectre team lead
first poultry and egg conference
james.clark@dimins.com
☑ storage geometry:
row-oriented vs. column-oriented
☑ database types:
MOLAP vs. in-memory ROLAP
(usually) two-dimensional
but they must be stored
on one-dimensional media
so a decision must be made...
row-oriented
Title | Year | Gross |
---|---|---|
Bolt | 2008 | 114M |
Juno | 2007 | 143M |
Thor | 2011 | 181M |
Bolt 2008 114M Juno 2007 143M Thor 2011 181M
column-oriented
Title | Year | Gross |
---|---|---|
Bolt | 2008 | 114M |
Juno | 2007 | 143M |
Thor | 2011 | 181M |
Bolt Juno Thor 2008 2007 2011 114M 143M 181M
so what's the difference?
locality of reference
disk is very far away
data: Jonas Bonér
row-oriented
Bolt 2008 114M Juno 2007 143M Thor 2011 181M
column-oriented
Bolt Juno Thor 2008 2007 2011 114M 143M 181M
row-oriented
column-oriented
(OLAP: online analytical processing)
it's possible to do analysis
in a transactional database
using SQL
a business intelligence solution
makes it easier and faster
let's look at two structures
for busines analytics...
MOLAP and ROLAP
multidimensional OLAP
8 x 7 = ?
8 x 7 = 56
pre-computed query results
to answer a query: identify the right table, load it
this is very fast
(when the answer was pre-computed)
this works on weak hardware
this takes time to prepare
and cannot be in-memory
cannot change summary functions at run time
must limit number of dimensions
MOLAP is fast at run-time
but inflexible
relational OLAP
102 x 11 = ?
102 x 11 = 1,122
no pre-computed results
just the detail table
queries are run directly against detail table
this is very flexible
and takes much less space
but it requires strong hardware
even then, it can be slow
there's a balance between speed and flexibility
but it's complicated...
if ROLAP fits in memory
and we have multiple processors
and the questions are complex
and the data changes frequently
and the engine can optimize the query
and nobody is leaning against the server
and the moon is waxing
and your favorite show is on
...
ROLAP is very flexible,
and was considered slow at run-time
but with modern hardware
and enough memory
in-memory ROLAP is looking good
MOLAP
in-memory ROLAP
let's bring it back to storage geometry
MOLAP "data cubes" and ROLAP data tables
could be row- or column-oriented
for MOLAP it doesn't matter much
for ROLAP it matters (among many other factors)
transactional databases are usually row-oriented
"optimize" the data by moving it
to a column-oriented database
Dimensional Insight started in 1989
with row-major MOLAP
in 2014 we introduced Spectre,
column-oriented in-memory ROLAP
why?
ROLAP has a lot of advantages
and two disadvantages
(performance and hardware requirements)
the disadvantages were impossible to overcome
but things change
memory is very cheap
data: hblok.net
CPU cores are becoming plentiful
we've ridden Moore's law to a tipping point
where cheap processors and abundant memory
permit fast, dynamic, in-memory analysis
things change
buy lots of memory
know how your data is stored
modern hardware opens up possibilities
columns under Gaudí's Parc Güell, Barcelona