Find the intersection of multiple table

  • In SQL Server 2005, INTERSECT works for two tables. In SQL Server 2000 there is no intersection command, so you have to use INNER JOIN, but depending on the order of the INNER JOINs the query may not work correctly.  The question is, what’s the best way to find the intersection of three or more tables (on one column) using 2000 or 2005.

  • Hmm... why shouldn't it work correctly? If by intersection you mean that the value is present in all tables, then it shouldn't make any difference in which order you look at the tables. Of course, if you want to find those values that are - let's say - in at least 4 tables out of 10, then it is different. But is that intersection?

  • Right, we're trying to find the common elements.  In an intersection it's always possible the one or more tables won't match anything in the other tables.  If that's the case the inner join will fail.

  • For SQL Server 2005, here is an intersect with multiple tables:

    select a1 , a2 from a

    intersect

    select b1 , b2 from b

    intersect

    select c1 , c2 from c

    Here is the SQL for 2000 to get the same results - notice that distincts need to be included or there will be a different number of rows.

    select distinct a1 , a2 from a

    join (select distinct b1 , b2 from b ) as B

    on B.b1 = A.a1

    and b.b2 = A.a2

    join (select distinct b1 , b2 from b ) as C

    on C.b1 = A.a1

    and C.b2 = A.a2

    The alternative is to use exists instead of joins.

    Reproduction:

    create Table A

    (A1varchar(255) not null

    ,A2varchar(255) not null

    )

    go

    create Table B

    (B1varchar(255) not null

    ,B2varchar(255) not null

    )

    create Table c

    (c1varchar(255) not null

    ,c2varchar(255) not null

    )

    insert into A (a1, a2)

    select 'First','Match' union all

    select 'Second','Match' union all

    select 'Third','a'

    insert into B (b1, b2)

    select 'First','Match' union all

    select 'Second','Match' union all

    select 'Third','b'

    insert into c (c1, c2)

    select 'First','Match' union all

    select 'Second','Match' union all

    select 'Third','c'

    SQL = Scarcely Qualifies as a Language

  • If an inner join has no rows won't the entire query fail?

  • Hey thanks...your example below works great.

    select distinct a1 , a2 from a

    join (select distinct b1 , b2 from b ) as B

    on B.b1 = A.a1

    and b.b2 = A.a2

    join (select distinct b1 , b2 from b ) as C

    on C.b1 = A.a1

    and C.b2 = A.a2

  • Chris

    I came across this problem recently (the same interview perhaps) and came up with the following. Three tables, 1 2 & 3, which share a pk across some rows in each table. This code also identifies rows which are in only one table, or in two of the three i.e. covers all scenarios, but these could be eliminated easily. I've used a derived table to provide rowcounts and totals across a value column, for each permutation.

    Cheers

    ChrisM

    -- Create sample data: three tables with a common pk and some common rows.
    IF OBJECT_ID('tempdb..#Table1') IS NOT NULL
     DROP TABLE #Table1
    CREATE TABLE #Table1 (pk int, TranVal money)
    INSERT INTO #Table1
    SELECT  1, 1 UNION ALL -- 1
    SELECT  2, 2 UNION ALL -- 1
    SELECT  3, 3 UNION ALL -- 1
    SELECT  4, 4 UNION ALL -- 1
    SELECT  5, 5 UNION ALL -- 1,2
    SELECT  6, 6 UNION ALL -- 1,2
    SELECT  7, 7 UNION ALL -- 1,2,3
    SELECT  8, 8 UNION ALL -- 1,2,3
    SELECT  9, 9 UNION ALL -- 1,2,3
    SELECT 10, 10 UNION ALL -- 1,2,3
    SELECT 11, 11 UNION ALL -- 1,3
    SELECT 12, 12 UNION ALL -- 1,3
    SELECT 13, 13 UNION ALL -- 1,3
    SELECT 14, 14 UNION ALL -- 1,3
    SELECT 15, 15 UNION ALL -- 1,3
    SELECT 16, 16 UNION ALL -- 1,3
    SELECT 17, 17 UNION ALL -- 1,3
    SELECT 18, 18 UNION ALL -- 1
    SELECT 19, 19 UNION ALL -- 1
    SELECT 20, 20 UNION ALL -- 1
    SELECT 42, 42 UNION ALL -- 1,3
    SELECT 43, 43 UNION ALL -- 1,3
    SELECT 44, 44   -- 1,3
    IF OBJECT_ID('tempdb..#Table2') IS NOT NULL
     DROP TABLE #Table2
    CREATE TABLE #Table2 (pk int, TranVal money)
    INSERT INTO #Table2
    SELECT  5, 5 UNION ALL -- 1,2
    SELECT  6, 6 UNION ALL -- 1,2
    SELECT  7, 7 UNION ALL -- 1,2,3
    SELECT  8, 8 UNION ALL -- 1,2,3
    SELECT  9, 9 UNION ALL -- 1,2,3
    SELECT 10, 10 UNION ALL -- 1,2,3
    SELECT 31, 31 UNION ALL -- 2
    SELECT 32, 32 UNION ALL -- 2
    SELECT 33, 33 UNION ALL -- 2,3
    SELECT 34, 34 UNION ALL -- 2,3
    SELECT 35, 35 UNION ALL -- 2,3
    SELECT 36, 36 UNION ALL -- 2,3
    SELECT 37, 37 UNION ALL -- 2
    SELECT 38, 38 UNION ALL -- 2
    SELECT 39, 39    -- 2
    IF OBJECT_ID('tempdb..#Table3') IS NOT NULL
     DROP TABLE #Table3
    CREATE TABLE #Table3 (pk int, TranVal money)
    INSERT INTO #Table3
    SELECT  7, 7 UNION ALL -- 1,2,3
    SELECT  8, 8 UNION ALL -- 1,2,3
    SELECT  9, 9 UNION ALL -- 1,2,3
    SELECT 10, 10 UNION ALL -- 1,2,3
    SELECT 11, 11 UNION ALL -- 1,3
    SELECT 12, 12 UNION ALL -- 1,3
    SELECT 13, 13 UNION ALL -- 1,3
    SELECT 14, 14 UNION ALL -- 1,3
    SELECT 15, 15 UNION ALL -- 1,3
    SELECT 16, 16 UNION ALL -- 1,3
    SELECT 17, 17 UNION ALL -- 1,3
    SELECT 33, 33 UNION ALL -- 2,3
    SELECT 34, 34 UNION ALL -- 2,3
    SELECT 35, 35 UNION ALL -- 2,3
    SELECT 36, 36 UNION ALL -- 2,3
    SELECT 40, 40 UNION ALL -- 3
    SELECT 41, 41 UNION ALL -- 3
    SELECT 42, 42 UNION ALL -- 1,3
    SELECT 43, 43 UNION ALL -- 1,3
    SELECT 44, 44 UNION ALL -- 1,3
    SELECT 45, 45    -- 3
    -- run query
    SELECT t.Tables, COUNT(*) AS CommonRows, 
     SUM(TranVal1) AS SUMTranVal1,
     SUM(TranVal2) AS SUMTranVal2,
     SUM(TranVal3) AS SUMTranVal3
    FROM(
     SELECT CASE WHEN t1.pk IS NULL THEN '_' ELSE '1' END + 
      CASE WHEN t2.pk IS NULL THEN '_' ELSE '2' END +
      CASE WHEN t3.pk IS NULL THEN '_' ELSE '3' END AS Tables,
      t1.pk AS pk1, ISNULL(t1.TranVal, 0) as TranVal1, 
      t2.pk AS pk2, ISNULL(t2.TranVal, 0) AS TranVal2, 
      t3.pk AS pk3, ISNULL(t3.TranVal, 0) as TranVal3
     FROM #Table1 t1 
     FULL OUTER JOIN #Table2 t2 ON t2.pk = t1.pk 
     FULL OUTER JOIN #Table3 t3 ON t3.pk = t1.pk OR t3.pk = t2.pk
    ) t
    GROUP BY t.Tables
    ORDER BY t.Tables 
    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris, thanks for your response.  I initial did a similar thing, then came up with this...

     

     declare @w0 varchar(32)

     declare @w1 varchar(32)

     declare @w2 varchar(32)

     declare @w3 varchar(32) 

     declare @w4 varchar(32)

     declare @w5 varchar(32)

     declare @w6 varchar(32)

     declare @w7 varchar(32)

     declare @w8 varchar(32) 

     declare @w9 varchar(32) 

     ----------------------------------------------------------------------------------------------------

     --  FINDPhraseV1

     ----------------------------------------------------------------------------------------------------

     SET NOCOUNT ON

     declare @wt varchar(32)

     declare @RetVal int

     declare @LoopCnt int

     declare @PhraseId int

     if @w0 is null set @w0 = 'I'

     if @w1 is null set @w1 = 'want'

     if @w2 is null set @w2 = 'to'

     if @w3 is null set @w3 = 'get'

     if @w4 is null set @w4 = 'screwed'

     if @w5 is null set @w5 = 'up'

     if @w6 is null set @w6 = ''

     if @w7 is null set @w7 = ''

     if @w8 is null set @w8 = ''

     if @w9 is null set @w9 = ''

     

     set @LoopCnt = 0

     while (@LoopCnt <= 9)

     begin

      --

      -- find the phrase that contains most or all of the words

      -- Notes

      -- setting @wn = '' insures that the inner join (join) returns rows ('%%' is always found)

      -- using a left join insures that the join returns row even when @wn is not found in the righthand table

      -- cycling thru the words (@wn) handles the case when @w0 has no matches

      --

      select distinct @PhraseId = a.Id from SysWordPhrase a

      join (select distinct b.Id from SysWordPhrase b where wordPhrase like '%'+@w1+'%' and UsageID = 2 and Weight > 0) as B

      on B.Id = A.Id

      join (select distinct c.Id from SysWordPhrase c where wordPhrase like '%'+@w2+'%' and UsageID = 2 and Weight > 0) as C

      on C.Id = A.Id

      join (select distinct d.Id from SysWordPhrase d where wordPhrase like '%'+@w3+'%' and UsageID = 2 and Weight > 0) as D

      on D.Id = A.Id

      left join (select distinct e.Id from SysWordPhrase e where wordPhrase like '%'+@w4+'%' and UsageID = 2 and Weight > 0) as E

      on E.Id = A.Id

      left join (select distinct f.Id from SysWordPhrase f where wordPhrase like '%'+@w5+'%' and UsageID = 2 and Weight > 0) as F

      on F.Id = A.Id

      left join (select distinct g.Id from SysWordPhrase g where wordPhrase like '%'+@w6+'%' and UsageID = 2 and Weight > 0) as G

      on G.Id = A.Id

      left join (select distinct h.Id from SysWordPhrase h where wordPhrase like '%'+@w7+'%' and UsageID = 2 and Weight > 0) as H

      on H.Id = A.Id

      left join (select distinct i.Id from SysWordPhrase i where wordPhrase like '%'+@w8+'%' and UsageID = 2 and Weight > 0) as I

      on I.Id = A.Id

      left join (select distinct j.Id from SysWordPhrase j where wordPhrase like '%'+@w9+'%' and UsageID = 2 and Weight > 0) as J

      on J.Id = A.Id

      where a.wordPhrase like '%'+@w0+'%' and a.UsageId = 2 and a.Weight > 0

      set @LoopCnt = @LoopCnt + 1

      --print 'PhraseId:' + convert(varchar(9), @PhraseId)

      if @PhraseId is null

      begin

       --print 'Loop:' + convert(varchar(9), @LoopCnt)

       set @wt = @w0

       set @w0 = @w1

       set @w1 = @w2

       set @w2 = @w3

       set @w3 = @w4

       set @w4 = @w5

       set @w5 = @w6

       set @w6 = @w7

       set @w7 = @w8

       set @w8 = @w9

       set @w9 = @wt

      end else

      begin

       set @LoopCnt = 99

      end 

     end

     select Id, WordPhrase, Weight, Predatory, Bullying from SysWordPhrase where Id = @PhraseId

     if @@RowCount = 1 set @RetVal = 0

     

     RETURN @retval

    END

    GO

    SET QUOTED_IDENTIFIER OFF

    GO

    SET ANSI_NULLS ON

    GO

    GRANT  EXECUTE  ON [dbo].[FINDPhraseV1] TO [public]

    GO

  • Chris, thanks for your response.  I initial did a similar thing, then came up with this...

     

     declare @w0 varchar(32)

     declare @w1 varchar(32)

     declare @w2 varchar(32)

     declare @w3 varchar(32) 

     declare @w4 varchar(32)

     declare @w5 varchar(32)

     declare @w6 varchar(32)

     declare @w7 varchar(32)

     declare @w8 varchar(32) 

     declare @w9 varchar(32) 

     ----------------------------------------------------------------------------------------------------

     --  FINDPhraseV1

     ----------------------------------------------------------------------------------------------------

     SET NOCOUNT ON

     declare @wt varchar(32)

     declare @RetVal int

     declare @LoopCnt int

     declare @PhraseId int

     if @w0 is null set @w0 = 'I'

     if @w1 is null set @w1 = 'want'

     if @w2 is null set @w2 = 'to'

     if @w3 is null set @w3 = 'get'

     if @w4 is null set @w4 = 'screwed'

     if @w5 is null set @w5 = 'up'

     if @w6 is null set @w6 = ''

     if @w7 is null set @w7 = ''

     if @w8 is null set @w8 = ''

     if @w9 is null set @w9 = ''

     

     set @LoopCnt = 0

     while (@LoopCnt <= 9)

     begin

      --

      -- find the phrase that contains most or all of the words

      -- Notes

      -- setting @wn = '' insures that the inner join (join) returns rows ('%%' is always found)

      -- using a left join insures that the join returns row even when @wn is not found in the righthand table

      -- cycling thru the words (@wn) handles the case when @w0 has no matches

      --

      select distinct @PhraseId = a.Id from SysWordPhrase a

      join (select distinct b.Id from SysWordPhrase b where wordPhrase like '%'+@w1+'%' and UsageID = 2 and Weight > 0) as B

      on B.Id = A.Id

      join (select distinct c.Id from SysWordPhrase c where wordPhrase like '%'+@w2+'%' and UsageID = 2 and Weight > 0) as C

      on C.Id = A.Id

      join (select distinct d.Id from SysWordPhrase d where wordPhrase like '%'+@w3+'%' and UsageID = 2 and Weight > 0) as D

      on D.Id = A.Id

      left join (select distinct e.Id from SysWordPhrase e where wordPhrase like '%'+@w4+'%' and UsageID = 2 and Weight > 0) as E

      on E.Id = A.Id

      left join (select distinct f.Id from SysWordPhrase f where wordPhrase like '%'+@w5+'%' and UsageID = 2 and Weight > 0) as F

      on F.Id = A.Id

      left join (select distinct g.Id from SysWordPhrase g where wordPhrase like '%'+@w6+'%' and UsageID = 2 and Weight > 0) as G

      on G.Id = A.Id

      left join (select distinct h.Id from SysWordPhrase h where wordPhrase like '%'+@w7+'%' and UsageID = 2 and Weight > 0) as H

      on H.Id = A.Id

      left join (select distinct i.Id from SysWordPhrase i where wordPhrase like '%'+@w8+'%' and UsageID = 2 and Weight > 0) as I

      on I.Id = A.Id

      left join (select distinct j.Id from SysWordPhrase j where wordPhrase like '%'+@w9+'%' and UsageID = 2 and Weight > 0) as J

      on J.Id = A.Id

      where a.wordPhrase like '%'+@w0+'%' and a.UsageId = 2 and a.Weight > 0

      set @LoopCnt = @LoopCnt + 1

      

      if @PhraseId is null

      begin

          set @wt = @w0

          set @w0 = @w1

          set @w1 = @w2

          set @w2 = @w3

          set @w3 = @w4

          set @w4 = @w5

          set @w5 = @w6

          set @w6 = @w7

          set @w7 = @w8

          set @w8 = @w9

          set @w9 = @wt

      end else

      begin

          set @LoopCnt = 99

      end 

     end

     select Id, WordPhrase, Weight, Predatory, Bullying from SysWordPhrase where Id = @PhraseId

  • That's interesting, and not at all what I was expecting. You're matching on one or more words in a varchar column in each table?

    I'm doing fuzzy deduping on company name using token matching, will take a look and evaluate if it may be of some use. Problem is it uses a triangular join:

    FROM vendors v1

    INNER JOIN vendors v2 ON dbo.GetFuzzyMatchScore(v1.VendorName, v2.VendorName) > 0.75

    - which isn't good for performance but doesn't matter with my small tables. It also uses some simple preprocessing (including word weighting). If this is beginning to sound useful to you, then let me know, I'll put some time into it and post the code.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM

    the problem simply stated is to find a phrase in the db and return it's weight. 

    If the phrase I'm looking for contains extra words, is missing words or both, it makes very hard to find the phrase directly.  My last post seems to be pretty fast compared to previous method and is pretty accurate.

    I'm still looking for the optimal solution.

    regards

    Chris 

  • Chris, can you post some representative sample data? Datatype of the column, Min / max length of contents and min/max word count would be good also.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris

    Try this. There's quite a few elements of this which can be adjusted to fit your data, for instance increasing the token length from 3 will add greater weight to the word order in the phrase (which of course is ignored if you simply look to match up words).

    Cheers

    ChrisM

    SET NOCOUNT ON
    -- Make some sample data
    IF OBJECT_ID('tempdb..#SysWordPhrase') IS NOT NULL
     DROP TABLE #SysWordPhrase
    CREATE TABLE #SysWordPhrase (ID int, WordPhrase varchar(32), UsageID int, Weight int, Predatory int, Bullying int)
    INSERT INTO #SysWordPhrase (ID, WordPhrase, UsageID, Weight, Predatory, Bullying)
    SELECT  1, 'I want to get messed up',   2, 1, 1, 1 UNION ALL
    SELECT  2, 'I want to get screwed up',   2, 1, 1, 1 UNION ALL
    SELECT  3, 'I want to get up screwed',   2, 1, 1, 1 UNION ALL
    SELECT  4, 'I want screwed up to get',   2, 1, 1, 1 UNION ALL
    SELECT  5, 'Screwed up I want to get',   2, 1, 1, 1 UNION ALL
    SELECT  6, 'I want to get screwed',   2, 1, 1, 1 UNION ALL
    SELECT  7, 'I get really angry with school',  2, 1, 1, 1 UNION ALL
    SELECT  8, 'I want to get drunk',   2, 1, 1, 1 UNION ALL
    SELECT  9, 'I want to get stoned',   2, 1, 1, 1 UNION ALL
    SELECT 10, 'I want a cappucino',   2, 1, 1, 1 UNION ALL
    SELECT 11, 'I want to get laid',   2, 1, 1, 1 UNION ALL
    SELECT 12, 'I want to be rich when I grow up',  2, 1, 1, 1 UNION ALL
    SELECT 13, 'I want a new bicycle',   2, 1, 1, 1 UNION ALL
    SELECT 14, 'Learn Ju Jitsu and fight bullies',  2, 1, 1, 1 
    -- Run the query
    DECLARE @Phrase varchar(32), @PhraseId int
    SET @Phrase = 'I want to get screwed up'
    SELECT TOP 1 dbo.GetFuzzyMatchScore(@Phrase, WordPhrase) AS Score, ID, WordPhrase, Weight, Predatory, Bullying
    FROM (
    -- Check the results of this inner query by eye
     SELECT dbo.GetFuzzyMatchScore(@Phrase, WordPhrase) AS Score, -- <-- Take this out after setting threshold score
      ID, WordPhrase, Weight, Predatory, Bullying   
     FROM #SysWordPhrase wp 
     WHERE dbo.GetFuzzyMatchScore(@Phrase, wp.WordPhrase) > 0.7 -- <-- adjust threshold score to best fit your data
      AND UsageID = 2 AND Weight > 0
    ) t
    ORDER BY 1 DESC
    --------------------------------------------------------
    CREATE FUNCTION dbo.GetFuzzyMatchScore (@TargetString NVARCHAR(60), @ReferenceString NVARCHAR(60))
    RETURNS DECIMAL (6,3)
    AS
      BEGIN
     DECLARE @iCounter INT, 
      @TokenCountTarget INT,
      @cString CHAR(3), 
      @SecondWordStart INT,
      @TargetStringLength int, 
      @ReferenceStringLength int, 
      @Result DECIMAL (6,3)
     SET @TargetString = UPPER(LTRIM(RTRIM(@TargetString)))
     SET @ReferenceString = UPPER(LTRIM(RTRIM(@ReferenceString)))
     IF @ReferenceString = @TargetString RETURN 1.000
     SET @TargetStringLength = LEN(@TargetString)
     SET @ReferenceStringLength = LEN(@ReferenceString)
     SET @TokenCountTarget = 0
     SET @iCounter = 1
     WHILE 1 = 1 
     BEGIN
      SET @cString =  SUBSTRING(@ReferenceString,  @iCounter, 3)
      IF CHARINDEX(@cString, @TargetString) > 0
      BEGIN
       SET @TokenCountTarget = @TokenCountTarget + 1
    /*
       -- See if the token exists anywhere else in the target string
       SET @SecondWordStart = CHARINDEX (@cString, @TargetString, @iCounter + 4) 
       IF @SecondWordStart > 0
        SET @TargetString = STUFF(@TargetString, @SecondWordStart, 3, '')
    */
      END
      SET @iCounter = @iCounter + 1
      IF @iCounter > @ReferenceStringLength BREAK
     END
     IF @TokenCountTarget > 0
     BEGIN
      SET @Result = CASE -- Account for special cases, may not apply here
       WHEN @TokenCountTarget > 0 
        THEN CAST(@TokenCountTarget AS DECIMAL (6,3)) / ((@TargetStringLength + @ReferenceStringLength)/2)
       ELSE 0.000
       END
     END
     ELSE 
      SET @Result = 0.000
     RETURN @Result
     END
    ----------------------------------------------------------------
    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here's a more set-based method - thanks for providing me with the incentive to do this - which uses a "tally" or "numbers" table. It's quite fast, get's a match out of 996 rows in about a second on our dev. box.

     

    -- Make a tally table (acknowledgements to Jeff Moden)
    IF EXISTS (select * from dbo.sysobjects where id = object_id(N'[dbo].[Numbers]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
     DROP TABLE dbo.Numbers
    --===== Create and populate the Tally table 
     SELECT TOP 100 
            IDENTITY(int,1,1) AS number
       INTO dbo.Numbers
       FROM master.dbo.syscolumns sc1,
            master.dbo.syscolumns sc2
    --===== Add a Primary Key to maximize performance
      ALTER TABLE dbo.Numbers
            ADD CONSTRAINT PK_numbers_number PRIMARY KEY CLUSTERED (number)
    --===== Allow the general public to use it
      GRANT SELECT ON dbo.Numbers TO PUBLIC
    ---------------------------------------------------------
    -- Make some sample data
    IF OBJECT_ID('tempdb..#SysWordPhrase') IS NOT NULL
     DROP TABLE #SysWordPhrase
    CREATE TABLE #SysWordPhrase (ID int IDENTITY (1,1), WordPhrase varchar(32), UsageID int, Weight int, Predatory int, Bullying int)
    INSERT INTO #SysWordPhrase (WordPhrase, UsageID, Weight, Predatory, Bullying)
    SELECT  'I want to get messed up',   2, 1, 1, 1 UNION ALL
    SELECT  'I want to get screwed up',   2, 1, 1, 1 UNION ALL
    SELECT  'I want to get up screwed',   2, 1, 1, 1 UNION ALL
    SELECT  'I want screwed up to get',   2, 1, 1, 1 UNION ALL
    SELECT  'Screwed up I want to get',   2, 1, 1, 1 UNION ALL
    SELECT  'I want to get screwed',   2, 1, 1, 1 UNION ALL
    SELECT  'I get really angry with school',  2, 1, 1, 1 UNION ALL
    SELECT  'I want to get drunk',    2, 1, 1, 1 UNION ALL
    SELECT  'I want to get stoned',   2, 1, 1, 1 UNION ALL
    SELECT 'I want a cappucino',    2, 1, 1, 1 UNION ALL
    SELECT 'I want to get laid',    2, 1, 1, 1 UNION ALL
    SELECT 'I want to be rich when I grow up',  2, 1, 1, 1 UNION ALL
    SELECT 'I want a new bicycle',    2, 1, 1, 1 UNION ALL
    SELECT 'Learn Ju Jitsu and fight bullies',  2, 1, 1, 1 
    /* Uncomment this to increase the rowcount of the sample table to 9,996 rows
    INSERT INTO #SysWordPhrase (WordPhrase, UsageID, Weight, Predatory, Bullying)
    SELECT LEFT(WordPhrase, 28) + LEFT(NEWID(), 4), UsageID, Weight, Predatory, Bullying
    FROM #SysWordPhrase
    INNER JOIN dbo.Numbers n ON n.number < 714
    -- Sanity check
    --SELECT * FROM #SysWordPhrase
    */
    -- Run the query (0:00:01 with 9,996 rows)
    DECLARE @Phrase varchar(32)
    SET @Phrase = 'I want to get screwed up'
    SELECT TOP 1 -- <-- Comment this out to see all matches exceeding match score
     r.ID, r.WordPhrase, r.Weight, r.Predatory, r.Bullying, 
     CAST(CAST(COUNT(*) AS DECIMAL(6,3)) / LEN(@Phrase) AS DECIMAL(6,3)) AS Score
    FROM dbo.Numbers n
    INNER JOIN #SysWordPhrase r 
     ON LEN(SUBSTRING(RTRIM(r.WordPhrase), n.number, 3)) > 0
     AND CHARINDEX(SUBSTRING(UPPER(r.WordPhrase), n.number, 3), UPPER(@Phrase)) > 0
    WHERE n.number < LEN(@Phrase)+1 
     AND UsageID = 2 AND Weight > 0
    GROUP BY r.ID, r.WordPhrase, r.Weight, r.Predatory, r.Bullying
    HAVING CAST(CAST(COUNT(*) AS DECIMAL(6,3)) / LEN(@Phrase) AS DECIMAL(6,3)) > 0.700 -- adjust match score, eyeball data
    ORDER BY 6 DESC
    
    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris

    Any feedback? What's your final solution?

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply