Dan English's BI Blog

Welcome to my random thoughts in regards to Business Intelligence, databases, and other technologies

Archive for February, 2011

SSIS Data Flow vs. Hand Coded ETL

Posted by denglishbi on February 27, 2011

[tweetmeme source=”denglishbi” only_single=”false”]

“Why do I need an ETL tool?”

“Why would I want to use Integration Services (SSIS)?”

“Why use a source-to-target setup in ETL process?”

It is questions and statements like this that I hear often.  These questions can generate lots of discussion and there are many different topics that you could go into in regards to the benefits of using an ETL tool or just pure coded stored procedures or T-SQL.  Some of the benefits you can get with ETL tools are:

  • control flow,
  • transformations,
  • logging,
  • visual representation of work,
  • more options than just source-to-target,
  • reusability,
  • audit details,
  • configurations,
  • memory management,
  • etc.

I could go on, but this is not the topic for this blog posting.  What I want to discuss is a scenario comparing a potential consequence of using stored procedure or T-SQL ETL over what I would consider more standard ETL using source and target components.  The other item that I want to include and point out in this example is the use of ‘SELECT * FROM’ and its impact.  In all of these examples we will be leveraging a database view for our source and this is simply used to add an additional layer of abstraction and reference point instead of using the underlying table directly.

Situation

Let me set the stage for this.  In the data warehouse we need to load Employee information to reference.  For simplicity we are not interested in tracking any sort of changes over time and just want a current view of the Employees.  All we want to do is take the information from one table and completely refresh a data warehouse table that we will use to deliver information to our end-users.

Development

Piece of cake.  Simply delete the data from our destination data warehouse table, select data from the employee source table, and insert the source contents into the destination.  We have a planning meeting and discuss the different options that could go with to perform this operation – stored procedure, T-SQL, or SSIS.  Sure there are other options that we could possibly consider and for what it is worth, the first two are the same, just determines where you want the logic to sit.

So we decide to evaluate the different options.  In this example we are going to set all three options up in an SSIS package to evaluate.

Option 1: Stored Procedure

The first one is to call a stored procedure where the logic for the load will reside.  We create the stored procedure:

<br>CREATE PROCEDURE [dbo].[usp_DWRefresh] AS<br>BEGIN<br>SET NOCOUNT ON;<br>--clear out DW employee table<br>TRUNCATE TABLE DWEmployee;<br>--reload DW employee table<br>INSERT INTO DWEmployee<br>SELECT * FROM v_Employee;<br>END<br>

And then in the SSIS package we utilize a Execute SQL Task to call the stored procedure:

image

image

Ok, that one is done, now on to the second option using T-SQL.

Option 2: T-SQL

In this example we are simply going to use a Execute SQL Task in SSIS and place the logic that was in the stored procedure directly in the task, so now the logic is embedded in the package.

image

image

Nothing fancy going on here, simply truncate table and insert into table select * from…

Now we go on to the final option that we are going to look at, using a Data Flow Task.

Option 3: Data Flow Task (SSIS)

In this final option we are going to compare we are going to continue to use the ‘SELECT * FROM’ logic for the source, but this time we are going to include this logic in a Data Flow Task and use a source-to-target ETL setup leveraging components.

image

In this package we first use an Execute SQL Task to clear out the data in our destination table using the truncate table logic that was included in the T-SQL example.  Next we use a Data Flow Task to transfer the data from our source and load it into our target (destination) table that we will use in our data warehouse.

image

OLE DB Source setup:

imageimage

OLE DB Destination setup:

image image

One thing to point out here is that even though we are using the ‘SELECT * FROM’ setup for the source the column names are still being set and mapped directly to the destination table.  This is important and you will see why as we continue on in our evaluation.

Evaluation Process

Okay, the stage is set and we are ready to test everything out.  We run the package with the different options:

image

Everything looks great.

Gotcha

But wait, nothing stays this simple or never changes right?  Right.  Now we have a user that decides that they would like to include a new column in v_Employee that provides them the full name of the employee combining the employee’s last name and first name together.  Not a problem and makes complete sense to have this setup in the source to reference and centralize this logic.  The new column is setup in the view and saved with the following highlighted logic below added.

image

What does this do to our logic that we setup?

Option 1: Stored Procedure

image

Process breaks with message

Error Message:

[Execute SQL Task] Error: Executing the query “usp_DWRefresh” failed with the following error: “Column name or number of supplied values does not match table definition.”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.

Option 2: T-SQL

image

Process breaks with the following message

Error Message:

[Execute SQL Task] Error: Executing the query ”    TRUNCATE TABLE DWEmployee;
  
    INSERT INT…” failed with the following error: “Column name or number of supplied values does not match table definition.”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.

Option 3: Data Flow Task (SSIS)

image

Process continues to work with the following warning/informational message

Warning Message:

[OLEDB_SRC_v_Employee [1]] Warning: The external columns for component “OLEDB_SRC_v_Employee” (1) are out of synchronization with the data source columns. The column “FullName” needs to be added to the external columns.

Interesting…  Ok, so why would I want to use an ETL tool and even more than that, a standard source-to-target data process?  The data flow setup continues to work because during the setup the individual columns get referenced in the flow and mapped in the destination.  If either the source or target have additional columns added that is okay.  We will be alerted of this, but it will not break our process.  Items that could potentially impact our process would be if data types change or column names are modified.

Conclusion

Maybe this is not being too fair, because who would ever go with a ‘SELECT * FROM’ setup to load a destination table.  I mean our sources and targets never change, right?  Well I have seen it and when you are in a crunch you just might put something like this together without realizing the impact it might have down the road.  Maybe you set this logic up just to test something out and we know what happens to test and proof of concept work right…they sometimes get moved to production.

So please be careful and really evaluation not only the logic you use when loading data, but also the option and tool you will use for the process.  Tools like Integrations Services (SSIS) have tons of benefits that far out weigh the alternatives.

Some additional postings and articles that you might want to look at are below that further expand on the reason to leverage ETL or help explain reasoning and purposes:

Posted in Integration Services | Tagged: | 4 Comments »

Free SQL Server Community March 2011 Training

Posted by denglishbi on February 25, 2011

Head over to Pragmatic Works site and take a look at all of the training that is available in March.  In March we have the baker’s dozen with 13  webinars lined up with a wide range of topics, just take a look at the line up.image  Looks like a little March BI Madness:)

  • March 1, Ben Evans – Intro to .NET for DBA 
  • March 2, HP and Devin Knight – Designing your Data Warehouse Platform 
  • March 3, Mark Stacey – Microsoft’s BI Stack: How to Choose a Technology 
  • March 8, Chris Webb – Caching and Cache-Warming in Analysis Services 
  • March 9, Tom Pizzato – Implementing the New Fast Track 3.0 Architecture 
  • March 10, Sherri McDonald – Rock Your Report 
  • March 15, Adam Jorgensen – 0 to Cube 
  • March 16, Roger Moore and HP – Scaling a Data Warehouse with EDW 
  • March 17, Brian Knight – Performance Tuning the Cube 
  • March 23, Mark Mortimore and Mark Stacey – HP BDA Appliance: Accelerating Decisions using SharePoint BI and PowerPivot 
  • March 24, Jose Chinchilla – BI Decaffeinated
  • March 29, Jose Chinchilla – Deploying SSIS packages
  • March 30, Brian Knight – Performance Tuning the BI Platform

If you miss a session they record them and you can check them out later on-demand in the webinar resource area.

It looks like they already have the line up going for April as well.

So what are you waiting for, get registered for one of these spectacular webinars today – Webinar Registration.

Enjoy!

Posted in SQL Server, Training | Tagged: | Leave a Comment »

Minnesota Microsoft BI User Group 2011 Q1 Meeting – March 22

Posted by denglishbi on February 24, 2011

The next MN Microsoft Business Intelligence User Group has been scheduled and the agenda is set.  The focus for the first quarter meeting of 2011 is all about Analysis Services and we will be looking at the traditional UDM Analysis Services, go over some SSAS tips/tricks, and then wrap it up with PowerPivot.  PowerPivot is a new in-memory version of Analysis Services that runs within Excel 2010 and also can be deployed and managed in SharePoint 2010 Enterprise.

MN Microsoft BI User Group 2011 Q1 Meeting

Date: Tuesday, March 22, 2011

Check In: 2:30 PM Event Time: 3:00 PM – 7:00 PM

Location: Microsoft Event Center 8300 Norman Center Drive Suite 950 Bloomington, MN 55437

Please Register Now

Agenda:

  • 2:30-3:00 : Registration
  • 3:00-3:15 : Microsoft BI User Group Updates & Announcements
  • 3:15-4:30 : Analysis at the Speed of Thought with SQL Server Analysis Services (Mustafa Hussain) 
  • 4:30-4:45 : Break
  • 4:45-5:15 : SQL Server Analysis Services Tips and Tricks (Neelesh Raheja) 
  • 5:15-6:15 : Leveraging PowerPivot (Dan English) 
  • 6:15-7:00 : Social and Networking

Presentations:

Analysis at the Speed of Thought with SQL Server Analysis Services (by Mustafa Hussain, GNet Group) –  Microsoft SQL Server Analysis Services is the semantic layer in the Microsoft BI technology platform that allows users to see and interact with their information in a faster and more dynamic way compared to static reports. Analysis Services provides business users the ability to perform analysis and mine their Data to understand and uncover business issues/drivers leading to better decisions and new business opportunities. This session covers: Analysis vs. Reporting: End user experience of performing analysis and creating reports using an Analysis Services Cube; Building your first Analysis Services cube and using cubes to answer questions at the speed of thought.

SQL Server Analysis Services Tips and Tricks (by Neelesh Raheja, GNet Group) – This session cover best practices, cool tips and trick to optimize SQL Server Analysis Services Cube design to enable easy yet powerful analysis.

Leveraging PowerPivot (by Dan English, Superior Consulting Services) – No this is not about ‘PowerPoint’, but rather ‘PowerPivot’. During this presentation we will explore the new self-service BI capabilities provided by PowerPivot, compare this technology with Analysis Services (UDM), and dive into creating a PowerPivot solution. By the end of the session you will have an understanding of the positioning of PowerPivot within the Microsoft BI toolset and will know the capabilities that it provides to end-users.

Please Register Now

Hope to see you there!

Posted in Business Intelligence, Training | Tagged: | Leave a Comment »

Minneapolis Tableau User Group March 9, 2011

Posted by denglishbi on February 23, 2011

The next Minneapolis Tableau user group meeting is coming up hosted by Alan Wernke.  If you are interested in Data Visualizations then this will be a user group you will want to check out. 

image

Date: Mar. 9th – 3:00pm CST

Duration: 180 minutes

Location

Lund Food Holdings, Inc.

4100 West 50th Street

Edina, MN 55424

United States

See map: Google Maps

Proposed Agenda:

  • Networking – 3:00 – 3:30 pm
  • Mike Keys, NumericalTruth.com – 3:30 – 4:00 pm
    Dashboards on Tableau Server — Tableau is an excellent platform for creating dashboards and Tableau Server is a good way to distribute and automatically update them. In this demo I’ll create a simple dashboard and demonstrate some tips and tricks I use to make effective interactive dashboards like highlighting actions and benchmark comparisons. I’ll then publish the dashboard to Tableau Server and show off basic functions available to server users like filtering and data export and basic server functions like extract scheduling and permissions. I’ll end with an overview of the Server licensing models.
  • Shawn Spott, RBC Wealth Management – 4:00 – 4:30 pm
    Niche to Nexus, Driving Business Intelligence Usage — Shawn will outline how RBC Wealth Management took a niche implementation of Tableau and grew it to a major firm resource in less than year. What was the impact of Tableau Server in a self-serve model and some unconventional approaches to driving usage.
  • TBD – 4:30 – 5:30 pm
  • Networking – 5:30 – 6:00 pm

If you are interested in visualizing your data and dashboard solutions then you will want to check out this event – Register Here.

If you are not familiar with Tableau you might want to check out a couple of areas on their site Solutions and Visual Gallery.  You might want to check out a white paper on their site by Derek Comingore, Microsoft SQL Server MVP, titled Self-Service BI Synergies with Microsoft PowerPivot.

Posted in Business Intelligence, Training | Tagged: , | Leave a Comment »

PerformancePoint Services (PPS) Wish List

Posted by denglishbi on February 11, 2011

I have been working with the PerformancePoint product since the early CTP days back in 2006.  I previously had experience using the ProClarity product that was acquired and partial converted into the PerformancePoint product.  With the release of PerformancePoint Services in SharePoint 2010 last year there have been some significant enhancements and additions to the product, but there are still some items that we would all like to see.  Over the years I have read and heard complaints of features that are missing.  Some of these are valid, but there are others where I can see why it was ‘by design’.  Below I have compiled my list of items that I would like to see included and enhanced in the product.

1. Include the ProClarity Performance Map (Heat Map) and Perspective reports – these are two ‘key’ and extremely informative options that have just been glossed over and would add so much value to the product offering.  I am baffled why they have not been included.  I find these two views more valuable than the Decomposition Tree that was finally added into the current release (although it still can not be a starting point, it is available to use to explore the multidimensional data).  The ability to analyze two metrics at the same time to determine impact and search out anomalies is a must.  Even though ProClarity views can still be referenced in current product, it just is not the same and requires setup and configuration of additional product.

ProClarity Performance Map viewProClarity Perspective view

2. Allow users to save and share analytical views (reports) – often when users are exploring existing analytical reports they come across a new and interesting view of the data.  It would be great if this view of the report could be saved to use as a starting point and shared with other users to help streamline report creation and sharing of information.  Another feature that was available in ProClarity.

3. Provide Analytical Map report – another nice visualization feature that was added in SSRS 2008 R2 and is available in other competing products.  It would be nice to have this option available even though you can leverage the SSRS reports in PerformancePoint to add more value to the product offering.  Enhance the map experience as well with zooming and dragging functionality as well just like users would expect.

4. KPI alerts – this was available in Business Scorecard Manager so that you could receive notifications based on criteria – Scorecard Alerts in BSM 2005.  This did leverage SQL Server Notification Services and I have heard rumors of a new alerting mechanism coming in the near future.  Currently there is a solution that you can purchase – Bidoma Alert.

5. Ability to control chart types for data series in Analytical Reports – current version provides option to include a bar (actually a column, but called a bar…) and line chart option if one of the measures is a percentage (automatically generates second y-axis).  ProClarity provided way to select a data series and chose the options independently of each other and they could be the same data type – this allowed you to show say current year sales measure in columns and previous year in a line.

6. More Analytical Charting options – provide the ability to select or provide a color scheme (you can do this in the other Microsoft products, why not here as well?); include axis labels; add more chart types – bar (in addition to the column charts already included that are called bar), area, range, polar, funnel, scatter, polar, gauges; include 3D option (although I am not a fan of the 3D, it does provide some eye candy); make data labels optional (would be great for Pie charts which are just another bad visualization, but people like them); add zoom support to select and zoom in/out in portions of the reports.

7. More linking (connection) options – ability to link filters to each other to create cascading filter functionality, ability to link Analytical reports to other items (if I can link to them, why not the other way?)

8. Provide dashboard preview functionality in Dashboard Designer – why do I need to deploy the dashboard to view how it will look and function?  I can preview Report Builder, why not in Dashboard Designer as well?

9. Enhance Person Responsible property – tie this into Active directory and provide the ability to quickly interact and view that person’s contact information and status.  Currently have to use additional web parts for this functionality.  Ability to include mailto for hyperlink would be nice, but doesn’t appear to be an option like it was in BSM 2005.  Being able to view status and availability would be extremely nice and you can do this with SharePoint lists

10. Provide printing and export functionality for dashboard and report items – ability to export each item independently is there, but not the whole dashboard, plus printing options to print and scale a dashboard or individual report would be great feature.  There is a printing option you can purchase though – PerformancePoint Print WebPart.

11. Add slider filter option for Time filters – users continue to ask for options to be able to selected a start and end date type of filter.  By adding a slider filter option so that the user can select a starting and ending point range for the filter would be a very nice option.  This is available in other competing products.  So for example here is a slider control filter from Bing Travel that allows me to select a range of time

 slider filter

12. Bring in the Silverlight charts – end users like the eye candy and adding in the Decomposition Tree only brings about the question why not in the other Analytical Reports.  Enhance the visuals so that they look appealing to the consumers.  For me it is all about the data and information, but the consumers also want the experience and visuals (like the Pie chart).  In the end it is the end users that will be using the tools and pay for the product, if we can enhance and make the experience appealing and blow them away with neat visuals than this will only increase user adoption rates.  Unleash the Silverlight toolkit and bring it into PerformancePoint analytical reports.  Russell Christopher has a couple of outstanding examples of the sliding bubble chart and heat map – bring these visualizations into the product to provide more value.  We have all been waiting for the Hockey Treemap Visualization to become a reality and part of the product offering, why has the heat map not been added yet?

sliding bubble chartheat maptreemap visualization

13. Ok, I need one more for the bakers dozen.  Mobile version for phones.  Okay, you can view the SharePoint dashboard pages on the tablet and pad like devices (sorry I just can’t say the Apple words).  The functionality isn’t too bad.  What about for the phones?  Being able to provide a version or app that can be installed to display Analytical reports that have been developed would be a nice feature.  Maybe this can already be done and since I just use my phone for a phone and not all of the enhanced online capabilities it is just my lack of knowledge here.  We can definitely see that the push to mobile like devices and access to information away from the desk/office is definitely happening, so provide a solution leveraging the items that have already been created.

We definitely need a better way to communicate with the product team like there is for SQL Server so that bugs and suggestions can be easily shared and monitored.  Creating support cases and sending feedback into a black hole is not convenient.  Leaving comments on the product teams blog is not a good long term solution either, why can’t we use the Microsoft Connect site to communicate with the team like we do with SQL Server? (maybe we can and I am just not aware of it)

I am not the only one that is requesting similar features and functionality.  For more items (and some of these have been added in PerformancePoint 2010) take a look at this thread in the PPS M&A forum – PerformancePoint Monitoring Feature Suggestions.

If you have more items like you would like to see, please leave comments, who knows, maybe someone is listening and reading this and we can get these on the list for the next release of PerformancePoint assuming that this is still on the radar for maintaining this product and that it will not be swapped out with something else like ‘Crescent’.  I know that ‘Crescent’ is supposed to be a reporting tool, but it sure does provide a lot of analytical capabilities just like PowerPivot.

Posted in PerformancePoint | Tagged: , | 8 Comments »

PASSMN February 15, 2011 Meeting

Posted by denglishbi on February 7, 2011

The next Minnesota SQL Server User Group (PASSMN) meeting is coming up next week.  The topic will be about High Availability and Disaster Recovery options with SQL Server.

The sponsor for this meeting is Digineer:

Digineer - PASSMN SQL Summit & April, July 2010 Meeting Sponsor

“High Availability & DR Options for SQL Server

Agenda:

  • 2:45-3:00 : Registration / hospitality / networking
  • 3:00-3:15 : Announcements etc.
  • 3:15-5:00 : Tim Plas: High Availability & DR Options for SQL Server

Presentation:

High Availability & DR Options for SQL Server (by Tim Plas, Virteva) – A comparison of SQL HA & DR options, by a practitioner who has implemented & managed all the SQL HA and DR approaches (& various combinations thereof). Tim is an operational DBA, charged with keeping SQL servers up & running & optimized, for managed-services customers. We will compare trade-offs between the various SQL HA & DR options: for complexity, usability, hardware requirements, licensing, failover speed, initial costs, ongoing support costs, staff skill requirements, etc.

Also, as you may be aware, Microsoft has announced a set of very powerful “AlwaysOn” features for the upcoming version of SQL (“Denali”), features popularly referred to as “HADRON”  (“High Availability Disaster Recovery always ON”).  We’ll provide a brief overview of those features now, and will have a full presentation on that later in the year.

Posted in Training | Tagged: , | Leave a Comment »

PASS DW/BI Virtual Chapter Feb/Mar 2011 Calendar

Posted by denglishbi on February 1, 2011

More free training brought to you by the community through PASS Data Warehouse / Business Intelligence Chapter.  PASSChapterLogo100.jpg

  • February 14, Patrick Gallucci – Master Data Services Development with Office 2010
  • February 23, Paul Turley – Multidimensional Reporting: MDX Essentials for Report Design
  • March 9, Jessica M. Moss – Introducing Microsoft’s Enterprise Data Information Management Products
  • March 14, Patrick Gallucci – Governance Program using Master Data Services
  • March 16, Peter Myers – Working with Cube Writeback in Analysis Services
  • March 23, Patrick LeBlanc – SSRS and Report Builder: Now that is self service

Head over to the site to check out the full details on each session and the LiveMeeting link – http://bi.sqlpass.org/.

If you missed a session you can check it out in the presentation archive area.

Keep checking back on their site because a few more sessions might be added as well.  Lots of great content and it is all free.  Can’t beat that.

Posted in SQL Server, Training | Tagged: , | Leave a Comment »