Better ways to find perfect matches and/or partial matches?

  • On one side we have a table of installed licenses. ( 2 millions+ rows)

    On the other side there is a table of licenses in inventory. ( 2 millions+ rows, a little larger than the installed licenses.)

    We have to compare the installed information with the inventory information and set the issue( text to indicate result of the compare) The compare involves 4 columns col1-col4 on both tables and its results reported in a new column called Issue.

    Possible scenarios:

    Perfect match: all 4 columns on the installed row equal 4 columns on the inventory row. Sample where clause:

    where installed.col1=inventory.col1 and

    installed.col2=inventory.col2 and

    installed.col3=inventory.col3 and

    installed.col4=inventory.col4

    ( in this scenario, we set Issue = None)

    Partial match: 1-3 columns on the installed row equal 1-3 columns on the inventory row.

    Sample where clause:

    where installed.col1=inventory.col1 and

    installed.col2=inventory.col2 and

    installed.col3=inventory.col3

    ( in this scenario, we set Issue = col4 unmatched)

    OR

    Sample where clause:

    where installed.col1=inventory.col1 and

    installed.col2=inventory.col2

    ( in this scenario, we set Issue = col3 and col 4 unmatched)

    OR

    Sample where clause:

    where installed.col1=inventory.col1

    ( in this scenario, we set Issue = col2-4 unmatched)

    Unmatched: No column on the installed row matches to the columns on the inventory row.

    ( in this scenario, we set Issue = col1-4 unmatched)

    Current query processL

    We begin with the "optimistic" approach (ie find the perfect matches first).

    From the installed license table let's call it Set1:

    - join query produces subset1.1 ( the 4-col perfect matches)

    - right join to find the 4-col mismatches subset 1.2

    Subset 1.2 now becomes the set to be screened for 3-col partial matches:

    - join query on 3 columns produces the 3-col matches) say subset 1.2.1

    - right join to find the unmatched rows on 3-cols ( subset 1.2.2)

    Subset 1.2.2 becomes the set to be screened for 2-col partial matches

    ....

    The above process is time consuming due to so many filtering iterations.

    I think it could use some improvements or even a redesign. I am open to any ideas that can improve it and even to a different approach to simplify it( indexed views : both tables can be prejoined and the resulting data set stored)?

    Thank you for spending the time to read a long post.

    Thank you very much in advance for any suggestions/ideas you can provide/share.

  • What about returning the four columns and running 4 queries looking for exact matches on one column each, then unioning them. You could list the results in some order and use the front end to determine if there are more than one match.

  • Thanks, Steve. I will give it a try.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply