Dan English's BI Blog

Welcome to my random thoughts in regards to Business Intelligence, databases, and other technologies

Archive for June, 2010

PASSMN July 20, 2010 Monthly Meeting

Posted by denglishbi on June 28, 2010

The next Minnesota SQL Server User Group meeting is on Tuesday, July 20.  This month we have a special speaker, Itzik Ben-Gan, who will be presenting on Query Tuning Tips.  The meeting time has also been modified to accommodate Itzik’s schedule and we will be starting at a later time of 5 p.m.

PASSMN_Logo

In order to RSVP for the event you will need to login to the national PASS site and click on the RSVP link.  The sponsor for next month’s event is Digineer.

digineer_Logo

PASSMN July 2010 Meeting – Query Tuning Tips

Agenda:

5:00 –5:15 : Announcements
5:15 –5:25 : Sponsor
5:25 –6:45 : Query Tuning Tips (Itzik)
6:45 –7:00 : Survey Collection and Giveaways

Presentation:

Query Tuning Tips (presented by Itzik Ben-Gan) – Given a SQL Server querying problem there’s much that you can do to enable a good performing solution. Tuning involves arranging an optimal physical environment, e.g., by creating supporting indexes, as well as writing the query in a way that it would get an optimal execution plan. Many factors can affect the efficiency of the solution including the availability of indexes, data distribution and density, and others. In different scenarios, a different solution could be the most efficient for the same querying problem. Query tuning could be considered an art. This session will provide various tips to do efficient query tuning and demonstrate those through specific tuning examples.

We will have some copies of Itzik’s books as giveaways as wellHot  So not only can you come and hear him talk in person, but you will have a chance to take him home as wellSmile

Hope to see you there! 

Also, don’t forget, if you are planning on attend the national PASS 2010 Summit that there are some additional discounts that you can take advantage of that expire at the end of June – SQL PASS 2010 Summit – June Discounts.

Posted in Training | Tagged: | Leave a Comment »

SQL PASS 2010 Summit – June Discounts

Posted by denglishbi on June 25, 2010

Are you planning on going to the PASS 2010 Summit conference in November but haven’t registered yet? Well hurry up because two major discounts are running out this month.  The current discounted price of $1,395 runs out on June 30 along with an additional $150 code that was part of the 24 Hours of PASS – 24HRR2.

So what are you waiting for?  Head over and sign up now to get in on the deal – http://www.sqlpass.org/summit/na2010/.  They also have a deal going on that once the session list is released next month that you can get your money back if you are not satisfied with the lineup – satisfaction guaranteed.

If you need any assistance in pitching this great learning opportunity to your management or have to fill out a justification form to get approval check out this great list – ROI Justification/Justify your Attendance.

PASS Community Summit 2009

And who knows, maybe I will see you there.

Posted in Training | Leave a Comment »

Training Course: Presenting Data and Information by Edward Tufte

Posted by denglishbi on June 24, 2010

Found out last week from a fellow PASSMN member that Edward Tufte will be coming to Minneapolis (and Chicago) to do a training course next month on July 19.

Topics to be covered in course:

  • fundamental strategies of analytical design
  • evaluating evidence used in presentations
  • statistical data: tables, graphics, and semi-graphics
  • business, scientific, research, and financial presentations
  • complexity and clarity
  • effective presentations: on paper and in person
  • interface design
  • use of PowerPoint, video, overheads, and handouts
  • multi-media, internet, and websites
  • credibility of presentations
  • animation and scientific visualizations
  • many practical examples

This course will be held at the following location in Minneapolis:

Minneapolis Marriott City Center
30 South 7th Street
Minneapolis, MN 55402

Along with the training you also get copies of four books

image

For more information about this course that is taught by Edward Tufte check out the following link – http://www.edwardtufte.com/tufte/courses.

Posted in Training | Leave a Comment »

Real-time Pizza Tracker by Domino’s

Posted by denglishbi on June 24, 2010

I will have to admit, I haven’t had a Domino’s pizza since I was in college.  Last night we were looking to get some dinner and my wife was checking out the Domino’s website.  She was impressed with all of the different offerings that were now available ranging from pizza of course to sandwiches, pasta, wings, bread bowls, etc.  We have both seen the new commercials in regards to them changing the recipe and basically starting over from scratch – thank you.  That is one reason I haven’t had a Domino’s in such a very long time, I just wasn’t impressed with their pizza.

The interesting thing about the new experience on the website was not only the pizza builder, but once you submit your order you can utilize their pizza tracker to watch the status of your order.  My wife clicked on the link to check it out and was really amazed.  We could both sit there and watch as our order moved through each stage of the process – order placed, prep, bake, quality check, and order has been received.

dominos_pizza_tracker

The other neat thing was that below the tracker messages were being displayed so that we could see who was prepping our order, who was delivering it – along with when they left, and at the end (displayed in picture above) that ‘MMM, IT’S THERE – We hope you’re enjoying your meal!’.  Very cool and it was neat to watch.  We were amazed at how quickly the turnaround was and couldn’t believe the service.

Not only was the real-time Domino’s Tracker tool impressive, but I will have to say I am now onboard with the new recipe – great decision to dump and start over.  The new pizza formula was fantastic and so was the cheesy bread and cinna stix.  I would definitely have to give Domino’s five out of five stars for not only the food, but also the experience.  Very impressed and love the tracker feature.

Posted in Personal Comments | 3 Comments »

Sorting PowerPivot Labels Manually

Posted by denglishbi on June 23, 2010

One of the items that you are going to run into is the sorting order of items in PowerPivot.  For the most part you might get by with items being in ascending or descending order, but there are going to be times when a more manual approach might be needed.  One particular case would be the order to calendar month labels in a report.  Initially it would be displayed in alphabetical order like the following:

image

So how can you get this in the order that you are used to seeing the months displayed?  As you can see this is not how you want it and it isn’t like Analysis Services where you can define an additional column to sort the data by.  One option would be to have an additional sort order column that you could concatenate with the month label like the posting here by the PowerPivot Team – Sorting month by natural sort order.  Another option is to use the manual sorting functionality that is available.  How does that work?  It isn’t extremely obvious and after a bit of playing around I finally figured it out.  Here is how you can do it:

1. Right-click the label and select Sort and then More Sort Options…

   image

2.  In the Sort dialog box select the Manual (you can drag items to rearrange them) option and then click OK

   image

3.  Now in the PivotTable select an item and place the cursor towards the right-hand side of the cell until you get the arrows displaying in all four directions

   image

4.  Now click on that cell and drag it to the desired spot where you would like it to be displayed

   image

5.  Then drop the item and now you have not only moved the item with this display, but everywhere it is being displayed in the PivotTable

   image

6. Now simply repeat the process until everything is in the proper order that you want for reporting.  Note: this is just for this PivotTable.

   image

There you go.  Just another option that you have for sorting items in your PowerPivot reports.

UPDATE (11/22/2010) – I have another blog posting here in regards to sorting that you might find interesting as well – Sorting PowerPivot Labels Continued

Posted in PowerPivot | Tagged: | 5 Comments »

PowerPivot Relationship Detection Experience

Posted by denglishbi on June 23, 2010

I just wanted to put together some notes and feedback with the relationship detection feature in PowerPivot.  I was working through the samples in the recently published Professional Microsoft PowerPivot for Excel and SharePoint book and had run into an issue with loading the data PowerPivot Memory Error – Not Enough Storage.  This caused me to have to import the last fact table, FactSalesQuota, separately.  When I did this none of the foreign key relationships were formed on import into PowerPivot and I realized this when I started to do the first PivotTable example and saw that the numbers for Sales Amount Quota were not being reflected properly:

image

Since the relationship Detection is turned on by default in the PowerPivot tab in the Ribbon:

image

I was prompted at the top of the Field List in the Excel worksheet that ‘Relationship may be needed’ and I can click on the Create button to have PowerPivot automatically try and detection and create relationships between the tables that are currently being used in the PivotTable analysis.

image

So I go ahead and run this option

image

And unfortunately it was not successful

image

So I have to go back into the PowerPivot window and manually create the relationship

image image

And when I go back into the Excel worksheet with the PivotTable I get a new message above the Field List stating ‘PowerPivot data was modified’ along with a button to Refresh.

image

After click on that the data in the PivotTable is updated and now I get the correct view of the data (minus the proper sorting of the calendar months, but that will be resolved in a later post):

image

So why didn’t the auto detection wizard work? This seems like an extremely easy detection since the column names have the same name minus the case – in the Date dimension the primary key is Datekey and in the Sales Quota fact table the foreign key DateKey (see the Create Relationship screenshot above) .  You will notice that in the Contoso Retail DW database not all of the columns have camel casing setup the same.  I thought this might be the issue, but I tried the same scenario with the Channel dimension with the Sales fact table

image

 image

and that was successful – in the Channel dimension the primary key is ChannelKey and in the Sales fact table the foreign key is channelKey.  Since this worked properly I would have to say that the issue then is with the data type.  I did find it really odd that the primary key for the Date dimension was a datetime data type

image

so as of now I would have to say that the auto relationship detection feature will not work with datetime fields.  If the fact table would have imported into PowerPivot the first time without the storage errors then the relationship would have been created automatically because of the relationships and constraints that are defined in the underlying database.  In my case though they were not and I was forced to create them manually.

If we hop over to the ever famous Adventure Works DW database and do a similar detection with the Date dimension and one of the fact tables it does work

image

image

image

but in this case the big difference is that this database uses a int data type versus the datetime like the new Contoso sample uses

image

I am assuming that they went with the datetime data type to simulate more real world data that an end-user might receive in a data feed or source extraction.  As long as the end-user knows how to create manual relationships this shouldn’t be a show stopping issue, just something to be aware of when working with different data types.

The other thing to note was that once I created the relationship manually in the Contoso example and later deleted it to try and perform some additional relationship testing, I was unable to get the relationship detection wizard to reappear.  It would be nice if you could some how force this to appear instead of having it magically display above the Field List.  It is just a wizard, so I would expect an option in the Detection portion of the PowerPivot Ribbon to be able to manually call this feature.  Well, this is version 1, so maybe in the next release this data type issue will be resolved and a new option will be available for detection.

If you think that this appears to be an issue and should be worked on please vote here (I posted as a suggestion instead of a bug, wasn’t sure what the proper classification was for this) – PowerPivot DateTime Relationship Detection.  Thanks for voting!

Posted in PowerPivot | Tagged: | 1 Comment »

Where did my PowerPivot Excel 2010 tab go?

Posted by denglishbi on June 21, 2010

The other day I launched Excel 2010 to do a quick calculation and changed my mind.  By changing my mind I didn’t mean that I didn’t want to perform the calculation anymore, just decided I didn’t want to wait for Excel 2010 to load.  The reason that it takes a while now to load is the additional add-ins that I have installed which includes PowerPivot.  As long as theses are activated they need to be loaded each time.  Needless to say I am somewhat impatient and decided I would simply use another application to perform the calculation so I cancelled Excel during the load process.

image

It is an option, not sure if it is the right one that you want to pick, but needless to say this is what I went with and this caused some other message boxes to be displayed.  And at this point I simply pick the option to close or cancel because once again I am a bit rather impatient sometimes.

image

image

So once that was done I open up my handy calculator and do the original calculation that I wanted.  Later on I go ahead and open up Excel 2010 because I want to actually do some PowerPivot work.  Well…where did my PowerPivot tab go?

image

What did I do?  Oh no, I lost my PowerPivot.  Did I corrupt it?  Do I need to re-install the add-in for Excel?  Help!  I decided to do a quick Bing search on the issue and I come across a posting by the one and only Dave Wickert, PowerPivotGeek, PowerPivot for Excel add-in is not coming up in Excel.  This tells me that maybe I need to activate the COM Add-in.  Well the only thing it doesn’t tell me is where to do that…minor detail.  First I need to find where the add-ins are even displayed.

  1. Click File—>Options and then select the Add-Ins page
  2. The in the Add-Ins page from the Manage drop down list select COM Add-ins and click the Go… button

image

    3.   Now check the box next to PowerPivot for Excel and click the OK button

image

Now the PowerPivot tab is back in the Excel 2010 Ribbon and you are back in business. 

image

I still don’t like having to wait for the PowerPivot add-in to load each time because I am not actually using PowerPivot every single time when I open Excel and as I have stated, I can tend to be a tad on the impatient side sometimes.  It would be nice to simply have a button that I could toggle to have this feature loaded on an as needed basis.  Now I will tend to shy away from using Excel just because of the load time.  I will look to leverage other applications like the calculator and the Internet more to handle the basic calculations that I used to use Excel 2007 for.

Here are a couple of additional resources for troubleshooting PowerPivot issues (in addition to the one I mentioned above):

Posted in PowerPivot | Tagged: | 10 Comments »

PowerPivot Memory Error – Not Enough Storage

Posted by denglishbi on June 21, 2010

I just got the Professional Microsoft PowerPivot for Excel and SharePoint book and I started working through the tutorials.  Well right out of the gate I received an error message – Memory error: Allocation failure : Not enough storage is available to process this command. . The operation has been cancelled.  My initial reaction was to check the hard drive space.  I had around 15GB of space available, so that wasn’t the issue.  I then checked my available memory (RAM).  And I had just about nothing left.  I went ahead and shutdown every service and application that was not needed, along with restarting my SQL Server database service, so that I just had Excel open.  At this point I had 1.6GB of RAM free and tried to reload the table, FactSalesQuota from the new Contoso Retail DW database, that I was unsuccessfully able to load initially.  Still no luck -

image

I was baffled.  Initially I was able to load the required dimension tables, FactInventory (over 8MM rows), FactSales (over 3MM rows), but I couldn’t load the FactSalesQuota (over 7MM rows) data.  I found it odd that the sales quota data was double the sales fact table, but i believe this is because of the scenario dimension.  Anyway, I was try to figure out what to do next.  I should rewind a bit and state that I am run the 32-bit (x86) version of Excel 2010 and PowerPivot add-in.  The reason for this was because it was the easiest upgrade path for me from Office 2007 since I didn’t want to uninstall the 32-bit version of Office 2007 in order to install the 64-bit version of Office 2010.

I should also mention that when I initially loaded all of my fact tables that were required for the first tutorial I had removed the ETLLoadID, LoadDate, and UpdateDate fields since these would not be needed for analysis and didn’t want them to consume memory and space.

image

So now that I was stuck in a rock and a hard place because of my 32-bit limitation I reviewed the fact table to see if I could possibly remove some unneeded data that was possibly outside the related fact sales information, but it was the same range of dates.  Then I check the columns in the table and realized that there was an identity field setup on the table.  Well I don’t need this column for analysis or relationships, so I went ahead and removed this column as well during the import process to load the data into PowerPivot.  Plus this column is unique per record, so no compression would be performed during import (I would recommend removing the other fact table identity columns as well).

image

And this time…success!  I was able to import an additional 7.5MM rows of data into PowerPivot.

image

After this import my Excel file was hovering around 700MB of RAM consumption.

image

Now I was off to the races in my 32-bit environment.  I would highly recommend going with Office 2010 64-bit at least for Excel if you are going to be doing anything with PowerPivot – at least this way I can experience what other users will go through and be able to help troubleshoot their issues.  I will have more posts coming out shortly related to the tutorials and my experiences and feedback in regards to PowerPivot, so stay tuned.  For more information in regards to PowerPivot memory consumption issues check out the following additional links:

Posted in PowerPivot | Tagged: | 4 Comments »

Microsoft North America TechEd & BI 2010 Conf Content Available

Posted by denglishbi on June 18, 2010

I have already tweeted about it and notified people, just thought I would also do a quick post about it as well.  If you didn’t get a chance to attend the Microsoft North America TechEd or Business Intelligence 2010 Conferences this year you are in luck.  The majority of the content was recorded and has been made available, and believe it or not, it is all Free!  Yes, I said it…Free!  I was completely surprise myself.  The Keynotes are typically always made available to everyone, but in this case just about every session has been posted.  Not only can you watch it online from the comfort of your favorite lazy boy or coffee shop, but you can also download the videos in three different formats – wmv, wmv high, or mp4 – and also the PowerPoint presentations.  You can also switch the view between thumbnails and list along with filtering the content from the menus at the top of the screen, so you can quickly filter and see all of the Business Intelligence sessions (including the keynote on day 2 where I am mentioned…5 Seconds of Fame at Microsoft BI Conference).

MS TechEd Online

I went ahead and downloaded all of the content from the Business Intelligence sessions, wmv high and slides, and it came out to be 11.1GB of material.

MS BI Content Size

So what are you waiting for, head on over and check it out – Microsoft TechEd Online and if you just want to take a look at the BI content then here is the link Business Intelligence – All Content.

MS BI Conf 2010 Logo

Posted in Training | Tagged: | Leave a Comment »

Informatica PowerCenter vs. Integration Services (SSIS) Comparison

Posted by denglishbi on June 12, 2010

This past year I got an opportunity to finally take the great Informatica product for a test drive.  I guess it wasn’t really a test drive since this was the ETL tool that was being used for the project, but it was my first time using the product (other than a brief hands-on with it during a vendor selection process when I worked at the casino).  As far as I know, this product hasn’t really changed since I previously saw it, so I was a bit surprised, especially with the rave reviews and with it being a leader according to Gartner – Magic Quadrant for Data Integration Tools November 2009.

Magic Quadrant for Data Integration Tools

Magic Quadrant for Data Integration Tools

Source: Gartner (November 2009)

The version of the product that I used was PowerCenter 8.6 (with a hotfix that provides the Mapping Analyst for Excel feature) and I know there is a new version out now that was recently released, but that is not the version that we were using and almost certain that the client will not be upgrading to new version going forward (will most likely be moving to Integration Services).

So let’s get back to my experience with the product.  Just a warning, this by no means is an extensive critique or in-depth review of the entire product, just my usage of the product.  In the past I used DataStage and DTS, but most recently I have been using SSIS for the past four years.

What I liked:

  • Versioning – has to be enabled out of the gate, but once turned on you have check-in/check-out versioning capabilities with the product which is a must in any development environment. one thing to note that the file and context menu options were not always available when I wanted them, so the user experience was a bit unpleasant.
  • Metadata repository – a huge plus I will have to admit. being able to search for items and access them for a central tool is very powerful and a nice feature – if you need this.
  • Reusable components – for example, items like sources and targets were centralized and reusable throughout the environment, so if you imported a new table definition of an existing table this would be reflected in any mappings it was being referenced in.
  • Mapping Architect for Visio – built-in feature even with Standard Edition that allows you to generate a Visio diagram of mapping
  • Mapping Analyst for Excel – another built-in feature once hotfix applied with Standard Edition that allows you to create source to target Excel files for mappings and this is bidirectional for creating mappings as well.
  • Built-in mapping documentation – has to be configured on server along with security permissions, but Data Analyzer report gets created with tabular report output off all the components in a mapping, not extremely useful, but it is something – no visuals like with DataStage documenation generator, but you can get that with the Visio generator component.
  • Traceability – you have the ability to select a column (port) in the mapping and trace it back or forward within the mapping to see where it was generated from or where it ends up in the mapping.

What I didn’t like:

  • Too clunky – not a good user experience or development environment. menu options too random or not where expected, seemed really buggy at times with context menus not being where I would expect.
  • ODBC data source setup – had to create DSNs for reference and utilize third party providers from DataDirect for connectivity
  • Not intuitive UI at all – even out of the gate when you open the Designer tool if you want to edit a mapping you first need to make sure you have opened the Mapping Designer Tool (can’t just double click on the mapping). Even to rename items this was buried at times and confusing – sometimes you could do this if you edited the item through context menu, other times you had to access from the file menu.  Even the warning and error icons used were confusing and didn’t provide tooltips – message displayed in separate windows.
    • Here is a invalid mapping – informatica mapping invalid
    • Here is a warning message for a mapping – informatica mapping warning
    • Validation message in output window – validate window
  • Confusing layout – within a mapping you can create multiple flows, but then I wondered which source to target flow will occur first and in what order will they happen. Well needless to say this was buried in the file menu under ‘Mappings’ –> ‘Target Load Plan…’ to get at the mapping execution sequence
  • Too many steps – in order to create an ETL mapping you need to do at least the minimum steps using two separate components: create repository folder, create ODBC DSN (local workstation and eventually on the server for a data source), import a source definition (source analyzer designer), import a target definition (target designer), create a mapping (mapping designer – source, source qualifier (required to define source query and columns – called ports), transforms, router, update, destination) – now if you want to run the mapping you need to open up more components to create a session, then a workflow, and finally you can execute the workflow and monitor the progress.  Wow, just wanted to do a simple dump from source to target and I just ate up half a day.
  • Can’t add annotations – on the mapping design surface you don’t have the ability to simply add a textbox to provide any comments or documentation about the flow. this would all have to be buried in each item in the designer in the description area. I find this feature in SSIS extremely useful and helpful for development and documentation purposes.
  • Single repository folder support – can only be connected to a single repository folder at a time. Unable to work or view two different mappings located in separate repository folders.
  • No ability to watch the data flow during runtime – you couldn’t watch the records and progress of a mapping while it was being execute. There is an ability to monitor stats, but you can’t see what is going on during each stage of the mapping visually like in DataStage or SSIS.  In the Monitor stats window you can only see what is going on for the source and target, not each step of the mapping like for expressions, lookups, routers, etc.
  • Extensibility / Flexibility – just wasn’t there for me. We had to implement some script files in order to create a looping mechanism to re-run a single mapping in a batch mode for historical load purposes – seemed like a basic operation that I could quickly setup in SSIS, but in Informatica I need to place two script files on the server and create two workflows to setup a looping process.  With SSIS I can easily extend the functionality with custom components, script components (.NET code), and open source components (CodePlex.com or SQLIS.com).
  • Trust but Verify – as one of my colleagues always told me (I think he knows who he is) you always need to double-check everything. If you modify a session in a workflow it might not be reflect in the task editor. I had this happen to me a few times.  Make sure you modify in the task editor and it should reflect properly in the workflow.
  • Viewing source and target data – now that I think about it I always went into an external tool to view my data. I don’t believe this functionality is available in the tool and if it is, I simply overlooked it or it wasn’t pointed out to me. You could validate your source qualifier queries, but that was about it.

Hopping between the different products within PowerCenter seemed really clunky to me.  I guess with SSIS at the most I might have two open with Business Intelligence Development Studio and SQL Server Management Studio, but with PowerCenter you would have at least three if not four if you need to access the Repository.  I also had a really hard time with the flexibility of the tool.  For the price of it I wanted to do some fairly basic items like adding annotations, dynamically setting source file names, looping, passing of variable values between mappings, watch mapping during process, add some custom code, etc. and it just wasn’t there for me or I just wasn’t aware of it.  Granted I did not go through any formal training and learned the product basically on my own in two days, but I was really blown away how even the DataStage product that I used over four or five years ago was light years ahead of this product in my opinion and SSIS is a dream.

Now with SSIS you don’t get a metadata repository or documentation, but with external add-ins (which most are free and open source – surprising that I bring up open source when mentioning Microsoft, but it is free and made available by the community).  Even the Metadata Repository Toolkit was upgraded for SQL 2008 and there is just a ton of SSIS components available on CodePlex along with numerous blog postings, forums help online, video tutorials, virtual labs, and much more all free with tons of samples to get you started. 

The fact that you get the SQL Server database engine, Integrations Services, Reporting Services, and Analysis Services (along with Data Mining) in the SQL Server 2008 product for a fraction of the price of just Informatica PowerCenter just blows me away; I just don’t get the comparison or logic as to why people would spend that kind of money to implement the product.  I can easily overlook some of the advantages of the Informatica product for the flexibility, extensibility, and additional features and functionality that I gain with SSIS any day of the week.  Clearly I was confused as far as the Magic Quadrant goes and couldn’t understand why SSIS was not higher up and pushing behind Informatica and IBM.sql server 2008

Additional references to check out:

Posted in Integration Services | Tagged: | 22 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 93 other followers