Grouping Issue

  • Hi,

    I am trying to select a records from a full text table by max(rank), then selecting the url the corresponds to the max record (which happens to be the key):

    For example, if I have the following data :-

    [Rank], [Url], [DomainName]

    92, ww.test.com/a

    91, ww.test.com/b

    87, ww.another.com/a

    88, ww.another.com/b

    The data I want to be returned is:

    92, ww.test.com/a, ww.test.com

    88, ww.another.com/b, ww.another.com

    However, when I use max(rank), followed by max(key) I get :

    92, ww.test.com/b, ww.test.com

    88, ww.another.com/b, ww.another.com

    Example failing query :-

    SELECT

    MAX(ss.[RANK]), MAX(ss.), docs.DomainName

    FROM ContainsTable(Machinery_Docs, *, 'Lathe') AS ss

    INNER JOIN Machinery_Docs docs ON ss. = docs.url

    GROUP BY docs.DomainName

    I understand why this is happening, but not sure how to solve it 🙂

    Any help would be appreciated.

    Thanks

    Andy

  • Try this

    SELECT b.[RANK], MAX(b.[Url]) AS [Url], b.[DomainName]

    FROM (SELECT MAX(ss.[RANK]) AS [RANK], docs.[DomainName]

        FROM ContainsTable(Machinery_Docs, *, 'Lathe') AS ss

        INNER JOIN Machinery_Docs docs ON ss. = docs.[Url]

        GROUP BY docs.[DomainName]) a

    INNER JOIN (SELECT ss2.[RANK], docs2.[Url], docs2.[DomainName]

                FROM ContainsTable(Machinery_Docs, *, 'Lathe')ss2

                INNER JOIN Machinery_Docs docs2 ON ss2. = docs2.[Url]) b

        ON b.[DomainName] = a.[DomainName] AND b.[RANK] = a.[RANK]

    GROUP BY b.[DomainName], b.[RANK]

    ORDER BY b.[RANK] DESC

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Plus it would be more efficient to put the results of the two tables combined into a temp table first and then using that table. This would avoid the multiple selects and subqueries

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I don't think I can make this simpler...

    -- Populate test data

    declare @t table (row int, path varchar(50))

    insert @t

    select 92, 'ww.test.com/a' union all

    select 91, 'ww.test.com/b' union all

    select 87, 'ww.another.com/a' union all

    select 88, 'ww.another.com/b'

    -- Do the work!

    SELECT  z.MaxRow,

      t.path,

      z.Domain

    FROM  (

       SELECT  SUBSTRING(Path, 1, CHARINDEX('/', Path) - 1) Domain,

         MAX(row) MaxRow

       FROM  @t

       GROUP BY SUBSTRING(Path, 1, CHARINDEX('/', Path) - 1)

      ) z

    INNER JOIN @t t ON t.row = z.MaxRow

    ORDER BY z.MaxRow


    N 56°04'39.16"
    E 12°55'05.25"

  • Thanks David

    I can see exactly where I was going wrong now 🙂

    Thanks

    Andy

  • Hi

    I have created a stored procedure based on the advice given earlier, I have change dthe sql to now return all the fields I require, however there are some performance issues and was wondering if anyone could suggest any improvments.

    David mentioned above about temporary tables, but I couldn't seem to get any improvment.

    Here is the stored procedure

    CREATE

    PROCEDURE spu_FullTextTest

    @Query

    nvarchar(600),

    @FirstTerm nvarchar(100),

    @OrigQuery nvarchar(600),

    @collection varchar(100),

    @applicationid int = 33

    AS

    SET

    NOCOUNT ON

    DECLARE

    @sql nvarchar(4000)

    SET

    NOCOUNT ON

    SET

    @sql = '

    select

    IsAdvertised = CASE WHEN NOT (ac.IsAdvertised IS NULL) THEN

    ac.IsAdvertised

    ELSE

    -1

    END,

    ac.AddressID,

    ac.CompanyName,

    docs.url, '''' AS summary, c.domainname as domain,

    docs.title, docs.filesize, docs.lastmodified, docs.meta AS metadescription,

    SUBSTRING(docs.description,PATINDEX(''%'

    + @FirstTerm + '%'',[description])-100, 200) AS [description]

    from machinery_docs docs inner join (

    SELECT b.[RANK], MAX(b.[Url]) AS [Url], b.[DomainName]

    FROM (SELECT MAX(ss.[RANK]) AS [RANK], docs.[DomainName]

    FROM ContainsTable(Machinery_Docs, *, ''' + @Query +''') AS ss

    INNER JOIN Machinery_Docs docs ON ss. = docs.[Url]

    GROUP BY docs.[DomainName]) a

    INNER JOIN (SELECT ss2.[RANK], docs2.[Url], docs2.[DomainName]

    FROM ContainsTable(Machinery_Docs, *, ''' + @Query +''')ss2

    INNER JOIN Machinery_Docs docs2 ON ss2. = docs2.[Url]) b

    ON b.[DomainName] = a.[DomainName] AND b.[RANK] = a.[RANK]

    GROUP BY b.[DomainName], b.[RANK]) c

    ON docs.url = c.url

    left join openquery(FHSQLPROD, ''SELECT *

    FROM MastersourceII..netsearch WHERE ApplicationID = ' + CAST(@applicationid as varchar(5)) + ''') AS ac

    ON c.domainname = ac.url COLLATE database_default

    order by isadvertised desc, rank desc'

    exec

    (@SQL)

    Any help is appreciated, at first I thought the delay might be due to using an openquery, but if I remove this, I gain no performance.

    Thanks

    Andy

  •  

    Getting the aggregates into table variables rather than using sub selects will probably improve things but cant see quite how to do it without knowing the format of the sql in the @query variable.

    It would make sense to  replace CASE WHEN NOT (ac.IsAdvertised IS NULL) THEN

    ac.IsAdvertised

    ELSE

    -1 end

    with

    coalesce(ac.IsAdvertised,-1) IsAdvertised

    Athough these variations probably execute identically..

     

    and

    2)replace exec (@SQL) with sp_executesql @sql

    this will allow reuse of execution plans and reduce recompiles thus improving performance.

    www.sql-library.com[/url]

  • The @Query just contains words that are used to search the full text engine, for example :-

    'Machining NEAR Centres'

    Also the @FirstTerm, would be the first word of @Query, so in the above example it would be 'Machining'

    I am just about to try your suggestions, thank you, I will let you know how I get on.

    Thanks again

    Andy

  • NOt sure if it's what you want, but try this:

    select

    IsAdvertised = coalesce(ac.IsAdvertised,-1) IsAdvertised,

    ac

    .AddressID, ac.CompanyName, docs.url, '' AS summary, docs.domainname as domain,

    docs

    .title, docs.filesize, docs.lastmodified, docs.meta AS metadescription,

    SUBSTRING

    (docs.description,PATINDEX('%' + @FirstTerm + '%',[description])-100, 200) AS [description]

    from

    machinery_docs docs

    join

    (select substring(max(str(b.[RANK],10) + str(b.KEY)),11,8000) [Url]

    from ContainsTable(Machinery_Docs, *, @Query) b
    group by b.domainname) V
    on docs.[Url] = V.[Url]

    left

    join FHSQLPROD.MastersourceII.dbo.netsearch ac

    ON docs.domainname = ac.url COLLATE database_default

    where

    ac.ApplicationID = @applicationid

     
    --are there performance problems with using 4-part name rather than openquery?
    --e.g. where clause not applied at source?
    --I presume the linked server's not 'collation compatible'.

    Tim Wilkinson

    "If it doesn't work in practice, you're using the wrong theory"
    - Immanuel Kant

  •  

    Thanks Tim

    I will have a look today at the SQL you provided. I couldn't get 4-Part Name to work, not sure if thats becuase the linked server is Oracle and unfortunately the linked server is not collation compatible.

    Thanks

    Andy

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply