Fuzzy logic matching routine problem

Question

Post reply

Fuzzy logic matching routine problem

Ian Tullie

SSC Journeyman

Points: 90
More actions
July 19, 2004 at 4:44 pm

#108505

Hi all,
My first post on SQLServerCentral, and it's a good 'un!
What I'm trying to do is find a clever way of writing a matching routine which refines the match until it gets one and only one match.
Test on rule one: if that finds one and only one match, insert a record into output table for that id. If it finds more than one match, add the next rule. If it finds no match, return no match.
Test on rule two: if that finds one and only one match based on rule one and rule two together, insert a record into the output table. If it finds more than one match using the two rules, add the next rule. If it find no match using these two rules, remove the second rule and add the third.
In other words, you can't simply keep adding criteria to the matching routine, as there is a need to omit a rule if it kills the matching routine - the idea is that the rules get added in the best combination (but only in a particular order) until it gets only one match.
Any thoughts, anyone? I'm presuming a cursor will be required (so this isn't going to be quick on a table of 160,000 records), but rather than code every possible combination of tests explicitly, I'm sure there's a clever way of doing this.
See, I told you it was a good one!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Marco Mombelli SSC Veteran Points: 239 More actions · Answer 1

Why use cursor if they're not necessary.

If i understand what you need is similar to this

set @count = select count(1) from table where row1 = value1

Case

When @count = 0 Then

return 'found nothing'

When @count = 1 Then

insert into table2(rows)

select rows

from table

where row1 = value1

When @count > 1 Then

Begin

set @count =

select count(1)

from table

where row1 = value1

and row2 = value2

End

and so on, probably you can use some kind of recursivity for your SP

David Burrows SSC Guru Points: 65142 More actions · Answer 2

Created on the fly no testing

Probable poor performance

Assumes complete sql (including rules/test) less than 4000 bytes

CREATE TABLE #tests (rowid int IDENTITY(1,1),test nvarchar(100))

INSERT INTO #tests (test) values ('where rule1')

INSERT INTO #tests (test) values ('or rule2')

INSERT INTO #tests (test) values ('or rule3')

INSERT INTO #tests (test) values ('or rule4')

DECLARE @startsql nvarchar(36), @sql nvarchar(4000), @test-2 nvarchar(4000), @count int, @maxct int, @result int

SET @startsql = 'SELECT @result=COUNT(*) FROM

'

SET @count = 1

SET @test-2 = =''

SELECT @test-2 = @test-2 + ' ' + test

FROM #tests

WHERE rowid = @count

ORDER BY rowid

SET @sql = @startsql + @test-2

EXEC sp_executesql @sql,N'@result int output',@result output

SELECT @maxct = MAX(rowid) FROM #tests

IF @result > 1

BEGIN

WHILE (@count < @maxct) AND (@result <> 1)

BEGIN

SET @test-2 = ''

SET @count = @count + 1

SELECT @test-2 = @test-2 + ' ' + test

FROM #tests

WHERE rowid <= @count

ORDER BY rowid

SET @sql = @startsql + @test-2

EXEC sp_executesql @sql,N'@result int output',@result output

IF @result = 0 THEN

BEGIN

DELETE FROM #tests WHERE rowid = @count

SET @result = 99

END

-- @result = 0 No Match

-- @result = 1 One Match and @test-2 = where clause

-- @result > 1 no test(s) resulted in a single match

Far away is close at hand in the images of elsewhere.
Anon.

Ian Tullie SSC Journeyman Points: 90 More actions · Answer 3

Thanks for the replies. The solution I had in mind was more like David's interpretation than Marco's, but both were gratefully received. My initial post wasn't all that straightforward to understand, I know...

I got this far in an attempt (rather clumsily) like the second before picking it up (excuse big code fragment at end of post). The complexity is that there are 11 tests, which I'm storing in a table so that the order in which they are applied can be changed by a user. However, I'm getting bogged down in string handling and caught in my nested loops as I'm trying to avoid cursors.

If anyone can make anything of the following, I'd be very happy to discuss further

-----------------------------------------------------------------------------------------------------------

declare @newdataid as int

declare @ruleid as int

declare @criteria as varchar (4000)

declare @sql as nvarchar (4000)

declare @updatesql as nvarchar(4000)

declare @result as int

declare @match as varchar(20)

declare @capcode as varchar(20)

declare @valuedate as varchar(20)

declare @currentcriteria as varchar(200)

declare @manufacturer as varchar(2)

declare @newprice as int

declare @fueltype as varchar(1)

declare @body as varchar(1)

declare @doors as int

declare @engine as int

declare @transmission as varchar(1)

declare @fueldelivery as varchar(1)

declare @trim as varchar(3)

set @newdataid = 1

set @ruleid = 2

set @criteria = 'valuedate = @valuedate and manufacturer = @manufacturer'

while @newdataid < (select max([id]) from newdata)+1

begin

set @capcode = (select capcode from newdata where [id] = @newdataid)

set @valuedate = (select valuedate from newdata where [id] = @newdataid)

set @manufacturer = (select manufacturer from newdata where [id] = @newdataid)

set @newprice = (select newprice from newdata where [id] = @newdataid)

set @fueltype = (select fueltype from newdata where [id]=@newdataid)

set @body = (select body from newdata where [id]=@newdataid)

set @doors = (select doors from newdata where [id]=@newdataid)

set @engine = (select engine from newdata where [id]=@newdataid)

set @transmission = (select transmission from newdata where [id]=@newdataid)

set @fueldelivery = (select fueldelivery from newdata where [id]=@newdataid)

set @trim = (select trim from newdata where [id]=@newdataid)

while @ruleid < (select max([id]) from rules)+1

begin

set @criteria = replace(@criteria,'@manufacturer',''''+@manufacturer+'''')

set @criteria = replace(@criteria,'@valuedate',''''+@valuedate+'''')

set @currentcriteria = (select criteria from rules where [id]=@ruleid)

set @currentcriteria = replace(@currentcriteria,'@manufacturer',''''+@manufacturer+'''')

set @currentcriteria = replace(@currentcriteria,'@valuedate',''''+@valuedate+'''')

set @currentcriteria = replace(@currentcriteria,'@newprice',@newprice)

set @currentcriteria = replace(@currentcriteria,'@fueltype',''''+@fueltype+'''')

set @currentcriteria = replace(@currentcriteria,'@body',''''+@body+'''')

set @currentcriteria = replace(@currentcriteria,'@doors',@doors)

set @currentcriteria = replace(@currentcriteria,'@engine',@engine)

set @currentcriteria = replace(@currentcriteria,'@transmission',''''+@transmission+'''')

set @currentcriteria = replace(@currentcriteria,'@fueldelivery',''''+@fueldelivery+'''')

set @currentcriteria = replace(@currentcriteria,'@trim',''''+@trim+'''')

set @criteria = @criteria + ' and ' + @currentcriteria

set @sql = N'insert into outputtable select count(*) from newdata where ' + @criteria

set @updatesql = N'insert into outputcode select capcode from newdata where ' + @criteria

select @sql

truncate table outputtable

exec sp_executesql @sql

set @result = (select [output] from outputtable)

select 'result is ',@result

if @result = 1

begin

--select 'inside the result 1 loop',@result

truncate table outputcode

exec sp_executesql @updatesql

set @match = (select [outputcode] from outputcode)

--select @match

update predecessorresults set match = @match where capcode = @capcode and valuedate = @valuedate

set @ruleid = 2

BREAK

end

if @result = 0

begin

--select 'inside the result 0 loop',@result

set @sql = substring(@sql,1,len(@sql)-(len(@currentcriteria)+5))

end

set @ruleid = @ruleid+1

--select 'onto id',@ruleid

end

set @ruleid = 2

set @newdataid = @newdataid + 1

end

----------------------------------------------------------------------------------------------------------

AndrewMurphy SSCertifiable Points: 5586 More actions · Answer 4

I think you're going the wrong direction....i think an easier solution would be to start at the other end....test for the presence of 1 record "given all conditions applied"....and if no records found...remove 1 criteria and start again...eventually you will arrive at the required matching record.....if you end up matching a record because 5 conditions are true....even though the 1st 3 conditions would have been sufficient to get a match....it won't influence the result...you will still find the same matching record.

David Burrows SSC Guru Points: 65142 More actions · Answer 5

Firstly I would remove the subqueries from the while statements and do them once at the beginning, ie

declare @maxdataid int, @maxruleid int

select @maxdataid = isnull(max([id]),0) from newdata

select @maxruleid = isnull(max([id]),0) from rules

then

while @newdataid <= @maxdataid

...

while @ruleid <= @maxruleid

...

Secondly replace the following

set @ruleid = 2

BREAK

with

set @ruleid = @maxruleid + 1

set @newdataid = @maxdataid + 1

to end the loops

Far away is close at hand in the images of elsewhere.
Anon.

Ian Tullie SSC Journeyman Points: 90 More actions · Answer 6

Thanks for everyones input - I got it working by my original method, posted here if anyone's interested. It took a long time to run (6 hours on a decent server, against a large dataset) but got there in the end.

declare @newdataid as int

declare @ruleid as int

declare @criteria as varchar (4000)

declare @sql as nvarchar (4000)

declare @updatesql as nvarchar(4000)

declare @result as int

declare @match as varchar(20)

declare @capcode as varchar(20)

declare @valuedate as varchar(20)

declare @currentcriteria as varchar(200)

declare @manufacturer as varchar(2)

declare @newprice as int

declare @fueltype as varchar(1)

declare @body as varchar(1)

declare @doors as int

declare @engine as int

declare @transmission as varchar(1)

declare @fueldelivery as varchar(1)

declare @trim as varchar(3)

declare @rule12sql as nvarchar(4000)

declare @min-2 as int

declare @max-2 as int

set @newdataid = 1

set @ruleid = 2

set @criteria = 'valuedate = @valuedate and manufacturer = @manufacturer'

while @newdataid <(select max([id]) from newdatatomatch)+1

begin

set @capcode = (select capcode from newdatatomatch where [id] = @newdataid)

set @valuedate = (select valuedate from newdatatomatch where [id] = @newdataid)

set @manufacturer = (select manufacturer from newdatatomatch where [id] = @newdataid)

set @newprice = (select newprice from newdatatomatch where [id] = @newdataid)

set @fueltype = (select fueltype from newdatatomatch where [id]=@newdataid)

set @body = (select body from newdatatomatch where [id]=@newdataid)

set @doors = (select doors from newdatatomatch where [id]=@newdataid)

set @engine = (select engine from newdatatomatch where [id]=@newdataid)

set @transmission = (select transmission from newdatatomatch where [id]=@newdataid)

set @fueldelivery = (select fueldelivery from newdatatomatch where [id]=@newdataid)

set @trim = (select trim from newdatatomatch where [id]=@newdataid)