Dan English's BI Blog

Welcome to my random thoughts in regards to Business Intelligence, databases, and other technologies

Archive for January 24th, 2011

Calculating Cumulative Values in Microsoft BI Tools

Posted by denglishbi on January 24, 2011

In working on a business intelligence project a common situation you will find yourself in is having to come up with a cumulative value (running value) calculation.  The challenge that you will find is that depending on what product you are working with the calculation that you need is going to be different.  In this post I will compare how this can be done with T-SQL, Excel, SSIS, SSRS, MDX, and DAX.  Along the way I will provide some additional reference links to other options, content, and I will point out some differences in how the totals are being calculated so you will know what to expect with the results.

In these examples I will be working with the Contoso Retail Data Warehouse data sample.  These examples below are not necessarily going to be the optimal approaches, just showing how you can go about getting at the desired result.

SQL Server T-SQL

So let’s start out taking a quick look how calculating the sales over each of the years directly against the data in the database through SQL Server Management Studio.

SELECT d.CalendarYear,

       SUM(f.SalesAmount) SalesAmount,

       SUM(f.SalesAmount) + COALESCE((SELECT SUM(SalesAmount)

                        FROM FactSales fs JOIN DimDate dd ON fs.DateKey = dd.DateKey 

                        WHERE dd.CalendarYear < d.CalendarYear),0) CumSalesAmount

FROM FactSales f

INNER JOIN DimDate d ON f.DateKey = d.DateKey

GROUP BY d.CalendarYear

ORDER BY d.CalendarYear

And the results for this are:

CalendarYear SalesAmount           CumSalesAmount
———— ——————— ———————
2007         4561940955.0212       4561940955.0212
2008         4111233534.6841       8673174489.7053
2009         3740483119.1823       12413657608.8876

 

Not too bad.  We have an inner query that is being used to return the pervious year(s) values and append that onto the current year (if they exist).  If no value is returned then a zero is used in its place.  For additional examples and options take a look at this posting by Garth Wells – Calculating Running Totals.

Microsoft Excel

So now lets take a look at how this would be done if we were using Excel. 

image

There area a few different options you have here, in this example I enter a formula in the C2 cell of =SUM($B$2:B2) and then copy that formula down into cells C3 and C4.

SQL Server Integration Services (SSIS)

Now we will take a quick look at how this could be done in the ETL process.  As we are pulling data from a source we can evaluate the values in the data flow and accumulate the values inside a script component.  In the source we will simply get the sales for each year and then append the values in the flow to each other to get at the same results displayed above.

image

The cumulative value is being performed in the script component with some code.  We have a variable setup that we are using to append the sales amount of each row in the data flow and then sending this value back out in a new output column called CumulativeSales.

image

Nothing too fancy going on here.  I don’t have a destination in the flow, just added a derived column to get the data viewer included so that I could run the data flow to show the results.

SQL Server Reporting Services (SSRS)

Now lets switch over to Reporting Services.  In this example we will not do the cumulative value in the dataset query, we will do this value calculation in the report. 

image

In this example we leverage the RunningValue aggregate function that is available in Reporting Services.  The expression that we use in the Cumulative Sales column is =RunningValue(Fields!SalesAmount.Value,Sum,”DataSet1″)

image

This function returns a running aggregate of the values in our specified dataset using the SUM function which is also specified in the expression.

SQL Server Analysis Services (SSAS) MDX

Now how about taking a look at Analysis Services.  How would we go about doing this calculation using MDX?

WITH MEMBER [Measures].[Cumulative Sales] AS

    SUM({null:[Date].[Calendar Year].CurrentMember},[Measures].[Sales Amount])

    

SELECT {[Measures].[Sales Amount],[Measures].[Cumulative Sales]} ON 0,

    NON EMPTY [Date].[Calendar Year].[Calendar Year].Members ON 1

FROM [Sales]

And the results of this are:

image

We can apply some filters into the MDX statement to remove the years that currently don’t have Sales Amount associated with them.

WITH MEMBER [Measures].[Cumulative Sales] AS

    SUM({null:[Date].[Calendar Year].CurrentMember},[Measures].[Sales Amount])

    

SELECT {[Measures].[Sales Amount],[Measures].[Cumulative Sales]} ON 0,

    NONEMPTY([Date].[Calendar Year].[Calendar Year].Members,[Measures].[Sales Amount]) ON 1

FROM [Sales]

This returns what we have seen in our other results:

image

There are additional ways in going about this calculation and for more options take a look at this recent posting here by Amish Manubhai Shah – Various way to calculate running total from All Period to Currentmember. Now I want to show how this solution differs a bit from the other ones we have seen.  What happens if I apply a filter to the query to only display 2008 and 2009?  What would you expect?

image

Might not be what you would want, but then again maybe it is.  The Cumulative Sales calculation is summing up the values from the beginning of time (the [All] level).  So the first value that we see in 2008 is including the Sales Amount from 2007 as well (if there was sales in 2006 it would include that as well and so on). 

You can make a few modifications to the calculation and setup a named set to reference the items and come up with something along these lines.

WITH DYNAMIC SET [SelectedYears] AS Existing [Date].[Calendar Year].[Calendar Year].Members

 

 MEMBER MEASURES.[Cumulative Sales] as 

    SUM({EXISTS([Date].[Calendar YQMD].[Calendar Year].Members,

    [SelectedYears]).Item(0):[Date].[Calendar YQMD].CurrentMember},

    [Measures].[Sales Amount])

 

SELECT {Measures.[Sales Amount], Measures.[Cumulative Sales]} ON 0,

    [Date].[Calendar YQMD].[Calendar Year].Members ON 1

FROM [Sales]

WHERE {[Date].[Calendar Year].&[2008],[Date].[Calendar Year].&[2009]}

With the following results:

image

Here a dynamic set is used to get the references of the members that are in the slicer (WHERE clause) and feed this information into the calculation for the cumulative sales to grab the first item to come up with the reference to the 2008 year.  This will be the starting point for the cumulative total instead of from the beginning of time.

You might just want to look at year-to-date cumulative totals and be able to drilldown into a user defined hierarchy if you have one setup like Year-Quarter-Month-Date.  If we modify our calculation a bit we might end up with something like the following:

WITH MEMBER [Measures].[Cumulative Sales] AS

    SUM(YTD([Date].[Calendar YQMD].CurrentMember),[Measures].[Sales Amount])

 

SELECT {[Measures].[Sales Amount],[Measures].[Cumulative Sales]} ON 0,

    Hierarchize({[Date].[Calendar YQMD].[Calendar Year].Members,

    [Date].[Calendar YQMD].[Calendar Quarter].Members}) ON 1

FROM [Sales]

WHERE {[Date].[Calendar Year].&[2008],

    [Date].[Calendar Year].&[2009]}

 

With the following results:

image

Here we can see that our cumulative total stops at the year level and we can see at the lower levels that the values are accumulating as expected, so the cumulative total at Q4 is same as the Yearly value.  What we have now is a YTD calculation.

PowerPivot for Excel 2010 DAX

We have one more product that I want to take a look at and that would be PowerPivot.  This is going to be fairly similar to the Analysis Services solution since it actually is Analysis Services behind the scenes, the only difference here is that we will be using DAX instead of MDX.

So for the first example we will setup a cumulative sales calculation just like we did with the first MDX example.  The calculation we use is as follows:

=CALCULATE(SUM(FactSales[SalesAmount]),

DATESBETWEEN(DimDate[Datekey],

FIRSTDATE(ALL(DimDate[Datekey])),

LASTDATE(DimDate[Datekey])))

So we end up with the following:

image

You could go ahead and add some additional logic to evaluate the Sales Amount to determine if a value exists, but you get the general idea here.

Once again this is calculating the cumulative total from the beginning of all time and that is done by using the ALL reference in the calculation for the FIRSTDATE.  If we filter the years and only display 2008 and 2009 we will see the similar results like we did with MDX where the cumulative sales amount for 2008 includes the 2007 sales.

image

The one nice thing about PowerPivot is that this is in Excel, so if you want to do anything a little more custom or make any references to cells you can do that and use Excel functions as well.  As far as setting up a cumulative sales amount value like the second MDX example I can’t determine an equivalent in DAX.  Maybe someone else has an idea if this can be done so that it only does this based on the Date values being evaluated.  If you do, please leave a comment.

You can do the YTD calculation in DAX and here is formula for that:

=TOTALYTD(SUM(FactSales[SalesAmount]),DimDate[Datekey])

And the results look like this if we break out the years by the quarters:

image

Once again we can see that the yearly total is equivalent to the Q4 for the same year.  For some additional information and explanation of the PowerPivot calculations take a look at a great post by Kasper de Jonge here Accumulate values (running value) over multiple years in PowerPivot using DAX.

Conclusion

Cumulative total (running value) calculations are all calculated differently within each of the Microsoft products.  We took a look at some examples of how this can be done in T-SQL, Excel, SSIS, SSRS, MDX, and DAX.  These are not the only options, so if you want to see more take a look at the additional posts that I included as reference points the different sections.  It is important to understand how the formulas need to be configured and also what the results are once the calculations and logic is put into place.  I think that one of the easiest ones, besides the Excel formula, would have to be the SSRS with the RunningValue aggregate function.  I like how this is setup and it evaluates everything based on the context that is being referenced.  So if we go back to the SSRS report and apply a filter to the tablix to remove 2007 from what is displayed we would end up with the following:

image

Very simple and easy to understand, but that is just my personal opinion.

Posted in Analysis Services, Integration Services, PowerPivot, Reporting Services, SQL Server | Tagged: , , , , | 10 Comments »