I was just out exploring the Microsoft Downloads area this morning to see if there was anything new to check out. And what do you know, I came across a technology preview developed by Microsoft Research of a new Add-in for Excel 2010 – Fuzzy Lookup Add-In for Excel. The Add-in provides users to be able to compare two sets of data to do some cleansing and to get at a single representation text value. The reason that this is needed is to compare sets of data that are possibly coming from two different sources or maybe because of spelling mistakes or because the data is being manually entered in different ways on the front-end into your source system. Like say my name was entered as Dan English, Daniel English, Mr. Dan English, English Dan, and so on.
Here is the introduction information available in the PDF File ‘Fuzzy Lookup Add-in for Excel’ that is provided in this download.
Introduction
A challenging problem in data management is that the same entity may be represented in multiple ways throughout the dataset. For instance, customer “Andy Hill” might also be present as “Mr. Andrew Hill” or “Hill, Andrew R.”. Variations can result from merging independent data sources, spelling mistakes, inconsistent naming conventions and abbreviations, or records with additional/missing information.
Fuzzy Lookup technology, developed by Microsoft Research, allows you to quickly identify data records which are textually similar. You can identify fuzzy duplicates within a single table or perform a fuzzy join between two different tables. The default configuration works well for a wide variety of data, but the matching may also be customized for specific domains.
Pretty cool. We have taken two sets of data that contained different names for the same items (in this case the company names) and the add-in has determined the correct matches so that we could come up with the total value of our stock portfolio.
Now there are additional advanced settings that are available in the Fuzzy Lookup pane which you can configure and the sample does provide an additional worksheet called Customization where you can go in and configure additional logic to translate items like say ‘Inc’ to ‘Incorporated’ and ‘USA’ to ‘United States of America’. This information can then be referenced in the Configure portion of the Fuzzy Lookup pane which is next to the ‘Go’ button.
This functionality is also available in the SQL Server Enterprise Edition (also in Datacenter and Developer Editions) of Integration Services (SSIS). If you are interested in checking out details on that here is a link to check out Fuzzy Lookup Transformation.
PDF Download Option for Walkthrough Postings
Posted by denglishbi on June 28, 2011
Last week I went ahead and decided to make a new PDF download option available for some of my more length step-by-step postings. I also modified the postings to only include two columns in the table layout instead of three. This was because with certain resolutions the table was being cutoff and I am assuming going forward with the mobile devices this will become more of a common problem. So, what I decided was to put together an option that the reader could download and view offline if needed and to use with eReader type devices as well.
So far I have just done this for four of my more commonly read postings in regards to Using PerformancePoint with Excel Services, Reporting Services, and PowerPivot along with using SharePoint List data source in SSRS for parameter source. Going forward I will make sure that I include this option for this type of a posting.
If you scroll to the bottom of these postings you will see the new option which includes a link to the PDF document that I have placed in my SkyDrive.
Example (just an image):
I have also enabled some more options on the posts to be able to tweet them, print them (not the greatest output though, go with the PDF option if available), and to email them (creates a intro to the post with a link to read the full article).
This Tweet option is made available with a plug-in for Live Writer for WordPress.com blogs – http://plugins.live.com/writer/detail/tweetmeme-for-wordpresscom-plugin. This is available from the home page and you do not need to click on the blog title to get at this Tweet option (only available on new blog posts or once that I repost through Live Writer).
This functionality is made available in Settings –> Sharing options in Administration settings for the WordPress.com blog account settings. There are more options available, I only picked these three and there seemed to be an issue with the LinkedIn option. In order to see these options you have to click on a blog posting and it will be available at the bottom of the posting.
The Email option will send you a snapshot of the posting like the following:
If there is a particular blog posting that you would like to see in a PDF format feel free to leave a comment and I can add this option. I just picked a few of the more recent and commonly viewed postings to start with. Going forward though I will make sure that I have this available.
Posted in Downloads, Personal Comments | Tagged: downloads | 3 Comments »