Dan English's BI Blog

Welcome to my random thoughts in regards to Business Intelligence, databases, and other technologies

Microsoft Fuzzy Lookup Add-in for Excel 2010 Walkthrough

Posted by denglishbi on May 15, 2011

I was just out exploring the Microsoft Downloads area this morning to see if there was anything new to check out.  And what do you know, I came across a technology preview developed by Microsoft Research of a new Add-in for Excel 2010 – Fuzzy Lookup Add-In for Excel.  The Add-in provides users to be able to compare two sets of data to do some cleansing and to get at a single representation text value.  The reason that this is needed is to compare sets of data that are possibly coming from two different sources or maybe because of spelling mistakes or because the data is being manually entered in different ways on the front-end into your source system.  Like say my name was entered as Dan English, Daniel English, Mr. Dan English, English Dan, and so on.

Here is the introduction information available in the PDF File ‘Fuzzy Lookup Add-in for Excel’ that is provided in this download.

Introduction

A challenging problem in data management is that the same entity may be represented in multiple ways throughout the dataset. For instance, customer “Andy Hill” might also be present as “Mr. Andrew Hill” or “Hill, Andrew R.”. Variations can result from merging independent data sources, spelling mistakes, inconsistent naming conventions and abbreviations, or records with additional/missing information.

Fuzzy Lookup technology, developed by Microsoft Research, allows you to quickly identify data records which are textually similar. You can identify fuzzy duplicates within a single table or perform a fuzzy join between two different tables. The default configuration works well for a wide variety of data, but the matching may also be customized for specific domains.

image image image
Files that are included in the download Installation wizard welcome page License agreement for the add-in v1 technology preview
image image image
Specify the installation location Confirm that you want to install the add-in Installation will begin and you will be prompted for the UAC prompt as well
image image image
Installation completion page Notice the add-in being loaded when you launch Excel 2010 (I am using x64 version) You will be prompted to complete the add-in install in Excel
image image image
A new tab will be in the Ribbon now for the Fuzzy Lookup This is the sample file that comes with the download to demo The first worksheet includes a stock portfolio along with number of shares
image image image
The second worksheet provides the stock price information along with tickers and company names By clicking on the Fuzzy Lookup button in the Ribbon you will open up the Fuzzy Lookup pane By clicking on the ‘Go’ button the options specified in the Fuzzy Lookup pane will be used and generate the results above.  Here I have also added a column to calculate out the Total Value of each line and get the total portfolio value of

$ 337,275,850.35

Pretty cool.  We have taken two sets of data that contained different names for the same items (in this case the company names) and the add-in has determined the correct matches so that we could come up with the total value of our stock portfolio.

Now there are additional advanced settings that are available in the Fuzzy Lookup pane which you can configure and the sample does provide an additional worksheet called Customization where you can go in and configure additional logic to translate items like say ‘Inc’ to ‘Incorporated’ and ‘USA’ to ‘United States of America’.  This information can then be referenced in the Configure portion of the Fuzzy Lookup pane which is next to the ‘Go’ button.

This functionality is also available in the SQL Server Enterprise Edition (also in Datacenter and Developer Editions) of Integration Services (SSIS).  If you are interested in checking out details on that here  is a link to check out Fuzzy Lookup Transformation.

About these ads

13 Responses to “Microsoft Fuzzy Lookup Add-in for Excel 2010 Walkthrough”

  1. Thanks Dan. I just installed it. This will come in handy as we try to cleanup customer contact info to include it in our data warehouse for BI reports.

    • denglishbi said

      I was surprised when I came across this item yesterday. Pretty cool indeed. I have seen and used the functionality in SSIS, but now it is in Excel 2010:) Looking forward to checking out the new SSIS items coming out and should be announced this week at TechEd with Quality Data Services in SQL Server ‘Denali’.

  2. Hein Veenhof said

    Where can I find an MsExcel related manual that describes this Fuzzy matching tool in all details including all configuration details.

    • denglishbi said

      Included in the download zip from Microsoft is a some what brief paper that explains the add-in and there is also a readme file that goes over the installation.

  3. egui said

    i think there’s a bug in the new release, i tried the sample file provided with the zip file and the similarity column always have the same value 0.0001 despite the similarity XML shows the good value in the comparisonResult tag.

  4. Jani said

    Has anyone any idea about the performance of this add on? I have played around with it and it really does well in the matching (better than any of the other 15+ add ons/programs) i’ve used. My dilemma is that I have 261 000 thousand rows in both the tables (they are copies of each other). I am running it on a reasonably powerful desktop machine, but it seems like Excel eventually dies …. anyone that has seen documentation about performance, or possibly some hints on how to increase performance?

    Psst, is it easy to perform the same task with sql server? If I download the trial .. is the fuzzy lookup functionality included?

  5. The worksheet included does not match the one in the instructions

  6. Rod Beckett said

    I’m using and need to CLEAR my info in Fuzzy lookup – Can’t find a way to do it!
    I just wanted to include a column in a table. – I can’t add a column or remove a table!
    -Rod

  7. sam said

    I’m wondering if anyone has figured out how to use the advanced features – in particular, entering pre-defined string matches, e.g. “USA” = “united states of america”, “FC” = “football club”, etc…

  8. sam said

    Ok – after playing around some more, I figured out how to configure the ‘transformation rules’. But have not figured out how to save these custom configuration rules. Anyone have any idea? cheers..

  9. Does anyone know if the matching process looks at anything other than names. For example could i do name/address in one pass?

    Thanks.

  10. Super simple tool for Excel: Fuzzy Find and Replace

    http://officepowerups.com/2013/11/09/fuzzy-search-in-excel/

    It works like find/search and replace, but you control the amount of fuzzy you can tolerate.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 90 other followers

%d bloggers like this: