Dan English's BI Blog

Welcome to my random thoughts in regards to Business Intelligence, databases, and other technologies

Posts Tagged ‘fuzzy’

Microsoft Fuzzy Lookup Add-in for Excel 2010 Walkthrough

Posted by denglishbi on May 15, 2011

[tweetmeme source=”denglishbi” only_single=”false”]

I was just out exploring the Microsoft Downloads area this morning to see if there was anything new to check out.  And what do you know, I came across a technology preview developed by Microsoft Research of a new Add-in for Excel 2010 – Fuzzy Lookup Add-In for Excel.  The Add-in provides users to be able to compare two sets of data to do some cleansing and to get at a single representation text value.  The reason that this is needed is to compare sets of data that are possibly coming from two different sources or maybe because of spelling mistakes or because the data is being manually entered in different ways on the front-end into your source system.  Like say my name was entered as Dan English, Daniel English, Mr. Dan English, English Dan, and so on.

Here is the introduction information available in the PDF File ‘Fuzzy Lookup Add-in for Excel’ that is provided in this download.

Introduction

A challenging problem in data management is that the same entity may be represented in multiple ways throughout the dataset. For instance, customer “Andy Hill” might also be present as “Mr. Andrew Hill” or “Hill, Andrew R.”. Variations can result from merging independent data sources, spelling mistakes, inconsistent naming conventions and abbreviations, or records with additional/missing information.

Fuzzy Lookup technology, developed by Microsoft Research, allows you to quickly identify data records which are textually similar. You can identify fuzzy duplicates within a single table or perform a fuzzy join between two different tables. The default configuration works well for a wide variety of data, but the matching may also be customized for specific domains.

image image image
Files that are included in the download Installation wizard welcome page License agreement for the add-in v1 technology preview
image image image
Specify the installation location Confirm that you want to install the add-in Installation will begin and you will be prompted for the UAC prompt as well
image image image
Installation completion page Notice the add-in being loaded when you launch Excel 2010 (I am using x64 version) You will be prompted to complete the add-in install in Excel
image image image
A new tab will be in the Ribbon now for the Fuzzy Lookup This is the sample file that comes with the download to demo The first worksheet includes a stock portfolio along with number of shares
image image image
The second worksheet provides the stock price information along with tickers and company names By clicking on the Fuzzy Lookup button in the Ribbon you will open up the Fuzzy Lookup pane By clicking on the ‘Go’ button the options specified in the Fuzzy Lookup pane will be used and generate the results above.  Here I have also added a column to calculate out the Total Value of each line and get the total portfolio value of

$ 337,275,850.35

Pretty cool.  We have taken two sets of data that contained different names for the same items (in this case the company names) and the add-in has determined the correct matches so that we could come up with the total value of our stock portfolio.

Now there are additional advanced settings that are available in the Fuzzy Lookup pane which you can configure and the sample does provide an additional worksheet called Customization where you can go in and configure additional logic to translate items like say ‘Inc’ to ‘Incorporated’ and ‘USA’ to ‘United States of America’.  This information can then be referenced in the Configure portion of the Fuzzy Lookup pane which is next to the ‘Go’ button.

This functionality is also available in the SQL Server Enterprise Edition (also in Datacenter and Developer Editions) of Integration Services (SSIS).  If you are interested in checking out details on that here  is a link to check out Fuzzy Lookup Transformation.

Posted in Downloads | Tagged: , | 13 Comments »