This past year I got an opportunity to finally take the great Informatica product for a test drive. I guess it wasn’t really a test drive since this was the ETL tool that was being used for the project, but it was my first time using the product (other than a brief hands-on with it during a vendor selection process when I worked at the casino). As far as I know, this product hasn’t really changed since I previously saw it, so I was a bit surprised, especially with the rave reviews and with it being a leader according to Gartner – Magic Quadrant for Data Integration Tools November 2009.
Magic Quadrant for Data Integration Tools
Source: Gartner (November 2009)
The version of the product that I used was PowerCenter 8.6 (with a hotfix that provides the Mapping Analyst for Excel feature) and I know there is a new version out now that was recently released, but that is not the version that we were using and almost certain that the client will not be upgrading to new version going forward (will most likely be moving to Integration Services).
So let’s get back to my experience with the product. Just a warning, this by no means is an extensive critique or in-depth review of the entire product, just my usage of the product. In the past I used DataStage and DTS, but most recently I have been using SSIS for the past four years.
What I liked:
- Versioning – has to be enabled out of the gate, but once turned on you have check-in/check-out versioning capabilities with the product which is a must in any development environment. one thing to note that the file and context menu options were not always available when I wanted them, so the user experience was a bit unpleasant.
- Metadata repository – a huge plus I will have to admit. being able to search for items and access them for a central tool is very powerful and a nice feature – if you need this.
- Reusable components – for example, items like sources and targets were centralized and reusable throughout the environment, so if you imported a new table definition of an existing table this would be reflected in any mappings it was being referenced in.
- Mapping Architect for Visio – built-in feature even with Standard Edition that allows you to generate a Visio diagram of mapping
- Mapping Analyst for Excel – another built-in feature once hotfix applied with Standard Edition that allows you to create source to target Excel files for mappings and this is bidirectional for creating mappings as well.
- Built-in mapping documentation – has to be configured on server along with security permissions, but Data Analyzer report gets created with tabular report output off all the components in a mapping, not extremely useful, but it is something – no visuals like with DataStage documenation generator, but you can get that with the Visio generator component.
- Traceability – you have the ability to select a column (port) in the mapping and trace it back or forward within the mapping to see where it was generated from or where it ends up in the mapping.
What I didn’t like:
- Too clunky – not a good user experience or development environment. menu options too random or not where expected, seemed really buggy at times with context menus not being where I would expect.
- ODBC data source setup – had to create DSNs for reference and utilize third party providers from DataDirect for connectivity
- Not intuitive UI at all – even out of the gate when you open the Designer tool if you want to edit a mapping you first need to make sure you have opened the Mapping Designer Tool (can’t just double click on the mapping). Even to rename items this was buried at times and confusing – sometimes you could do this if you edited the item through context menu, other times you had to access from the file menu. Even the warning and error icons used were confusing and didn’t provide tooltips – message displayed in separate windows.
- Here is a invalid mapping –
- Here is a warning message for a mapping –
- Validation message in output window –
- Confusing layout – within a mapping you can create multiple flows, but then I wondered which source to target flow will occur first and in what order will they happen. Well needless to say this was buried in the file menu under ‘Mappings’ –> ‘Target Load Plan…’ to get at the mapping execution sequence
- Too many steps – in order to create an ETL mapping you need to do at least the minimum steps using two separate components: create repository folder, create ODBC DSN (local workstation and eventually on the server for a data source), import a source definition (source analyzer designer), import a target definition (target designer), create a mapping (mapping designer – source, source qualifier (required to define source query and columns – called ports), transforms, router, update, destination) – now if you want to run the mapping you need to open up more components to create a session, then a workflow, and finally you can execute the workflow and monitor the progress. Wow, just wanted to do a simple dump from source to target and I just ate up half a day.
- Can’t add annotations – on the mapping design surface you don’t have the ability to simply add a textbox to provide any comments or documentation about the flow. this would all have to be buried in each item in the designer in the description area. I find this feature in SSIS extremely useful and helpful for development and documentation purposes.
- Single repository folder support – can only be connected to a single repository folder at a time. Unable to work or view two different mappings located in separate repository folders.
- No ability to watch the data flow during runtime – you couldn’t watch the records and progress of a mapping while it was being execute. There is an ability to monitor stats, but you can’t see what is going on during each stage of the mapping visually like in DataStage or SSIS. In the Monitor stats window you can only see what is going on for the source and target, not each step of the mapping like for expressions, lookups, routers, etc.
- Extensibility / Flexibility – just wasn’t there for me. We had to implement some script files in order to create a looping mechanism to re-run a single mapping in a batch mode for historical load purposes – seemed like a basic operation that I could quickly setup in SSIS, but in Informatica I need to place two script files on the server and create two workflows to setup a looping process. With SSIS I can easily extend the functionality with custom components, script components (.NET code), and open source components (CodePlex.com or SQLIS.com).
- Trust but Verify – as one of my colleagues always told me (I think he knows who he is) you always need to double-check everything. If you modify a session in a workflow it might not be reflect in the task editor. I had this happen to me a few times. Make sure you modify in the task editor and it should reflect properly in the workflow.
- Viewing source and target data – now that I think about it I always went into an external tool to view my data. I don’t believe this functionality is available in the tool and if it is, I simply overlooked it or it wasn’t pointed out to me. You could validate your source qualifier queries, but that was about it.
Hopping between the different products within PowerCenter seemed really clunky to me. I guess with SSIS at the most I might have two open with Business Intelligence Development Studio and SQL Server Management Studio, but with PowerCenter you would have at least three if not four if you need to access the Repository. I also had a really hard time with the flexibility of the tool. For the price of it I wanted to do some fairly basic items like adding annotations, dynamically setting source file names, looping, passing of variable values between mappings, watch mapping during process, add some custom code, etc. and it just wasn’t there for me or I just wasn’t aware of it. Granted I did not go through any formal training and learned the product basically on my own in two days, but I was really blown away how even the DataStage product that I used over four or five years ago was light years ahead of this product in my opinion and SSIS is a dream.
Now with SSIS you don’t get a metadata repository or documentation, but with external add-ins (which most are free and open source – surprising that I bring up open source when mentioning Microsoft, but it is free and made available by the community). Even the Metadata Repository Toolkit was upgraded for SQL 2008 and there is just a ton of SSIS components available on CodePlex along with numerous blog postings, forums help online, video tutorials, virtual labs, and much more all free with tons of samples to get you started.
The fact that you get the SQL Server database engine, Integrations Services, Reporting Services, and Analysis Services (along with Data Mining) in the SQL Server 2008 product for a fraction of the price of just Informatica PowerCenter just blows me away; I just don’t get the comparison or logic as to why people would spend that kind of money to implement the product. I can easily overlook some of the advantages of the Informatica product for the flexibility, extensibility, and additional features and functionality that I gain with SSIS any day of the week. Clearly I was confused as far as the Magic Quadrant goes and couldn’t understand why SSIS was not higher up and pushing behind Informatica and IBM.
Additional references to check out:
A Comparison of Enterprise Class Data Integration Tools (by Jamie Thomson and Michael Horne)