Analytics Blog

Dsg Subscribe Mail Web

Blogs (Analytics Blog) (Analytics Blog)

An Ode to the Date Dimension

thumbnail

Jose Hernandez
3 Months Ago

Ode to the Date Dimension,
Metrics travel through time,
Don’t need H.G. Wells or even a dime.

Ode to the Date Dimension,
A neat trick to have,
It cuts down date processing way more than half.

Ode to the Date Dimension
So much under used,
Don’t pay attention, and performance is bruised.

Sorry, could not resist :-)  But kidding aside, I do want to expound on the benefits of having and relying on a date dimension in your data warehouse.  Depending on who you ask, the response you receive (regarding Date Dimensions) will vary.  The responses tend to fall in one of these camps:

  • Date dimensions? … just use the date functions provided with your database.
  • Sure, my data warehouse has a date dimension; it helps reduce stress on the database server.
  • Date dimensions are cool (and so are fez hats*), they allow analysis of metrics in many revealing ways (from a date point of view)!

In this installment, I simply want to introduce the three camps.  In a future post, I will dive into details on useful date dimension uses.  Ultimately if you are not in the “Date dimensions are cool” camp, I want to get you there. 

If it is not evident, I fall into the “date dimensions are cool” camp.  But before we jump into how cool date dimensions are, let’s explore the three camps.

#1: “Just use the database functions” Camp

If you look at a typical date dimension table you will immediately notice that many of the attributes (columns) you will find pertain to variations and parts of a date, for example: day number, day, week number, month number, month name, quarter, year, etc.  Admittedly, all these components can be derived through a database function from the original date.  So, why create and maintain a date dimension table to house all these date variants?  The answer to this question is found in the other two camps.  It’s all about easing stress on the database servers and on the data consumers.

Let’s focus first on the database servers.  Given a date, say 2/14/2014, I can derive practically any calendar variant.  For example:

Function

Result

DATENAME(dw,2/14/2014)

Friday

DATEPART(mm, 2/14/2014)

2

DATEPART(quarter, 2/14/2014)

1

MONTH(2/14/2014)

2

YEAR(2/14/2014)

2014

DAY(2/14/2014)

14

 

The amount of information you can derive about dates from database functions is great.  Depending on the database server you are using there may be other date functions that would come in handy to derive parts from dates.  You certainly can derive all this date information right from the date itself.  So what’s the problem? Consider this question, “How much revenue was realize on Tuesday since going into business?”  To answer this question you would need to look at the date of every transaction in the database, derive the date name, and look for the ones that where on “Tuesday”, then add up those revenues.  That’s right, you must execute a function on every transaction to determine if it was on a Tuesday.  If there are millions of transactions, the function is executed on all of them to find Tuesdays.  Each and every time the question is posed.

And what about stress on the data consumer?  Each individual that would like to ask these types of questions must either learn the complexities of SQL date functions.  Sure, you could create views that would shield them from the complexities or use reporting tools to help, but then you still have to worry about the stress on the server.

#2: “It helps reduce stress on the database server” Camp

This camp is certainly aware of the stressed placed on data bases servers when you ask it to, on the fly, derive parts of dates.  And they certainly realize that a way to relive this stress (on the server) is to have a date dimension.  However, this camp is very happy with tactically improving the situation of the “Just use the database functions” camp.

So, what’s their secret?  It’s pretty simple, keep all the date variants in a table so that you can use joins between the date dimension and the transactions.  This does not require executing date functions on each transaction, it relies on the primary key and foreign key relationship between the two tables.  Consider that the date dimension table is relatively small (10 years of dates requires approximately 3,650 rows), your transaction are stored in a much larger table with millions of rows.  If, you were looking for transactions that occurred on Tuesday, then you simply look for the string “Tuesday” in the date dimension table (in the appropriate column).  First, you are traversing a much smaller table (3,650 rows) and you are not executing any date functions to find it.  Second, the related rows in the transaction table (with millions of rows) is now filtered down to the rows with a primary key, foreign key relation to the few rows in the date dimension.  The result, less processing effort than traversing all the transactions and executing date functions on every row.  This technique truly takes advantage of the power of relational databases.

#3: “Date dimensions are cool” Camp

Being in the “It helps reduce stress on the database server” camp is a good place to be, but, I want to get you to move to the “Date dimensions are cool” camp.  In this camp we get all the benefits of the “It helps reduce stress on the database server” camp, and then some.  In this camp we begin to realize that there are some very power things we can do with the date dimension.  Some which cannot be done with any date function!  Let me introduce some of these cool features here, and in the next blog entry, we will dive into the details of the techniques.  I will leave you with a few teasers:

The Metric

The Time Period

How much revenue?

During high-season?
During the peak vacation months?
During college spring break?

What was attendance?

Over the pick your religion holidays?
Over US Federal holidays?
Over the pick your country holidays?

What was the average temperature?

During the US winter months?
During Australia’s Fall?
During pick your region’s Spring?

What was the retention rate?

Over the last trimester?
Over the Fall 2010 semester?

I hope you get the picture by now.  There are many different pieces of meta-data around dates that are not found in the date functions in databases.  Some have geographical boundaries, some have corporate boundaries, and some have cultural boundaries.  Adding this type of information to your date dimension will allow you to perform analysis (easily and quickly) that could greatly benefit your bottom line.

Yes, the date dimension, when fully utilized is a very powerful tool.  It will greatly increase usability, performance and creativity.  Stay tuned for more on the date dimension!

Notes:
In my examples of SQL functions, I used MS SQL Server syntax.
*This is for my nephew Pat, he and I are Dr. Who fans.