Subqueries performance issue

  • I am trying to retrieve data from numerous tables and combine it to provide one result set. To do this

    using a number of sub queries that all access the same table (which is pretty huge) and a few other tables with correlated data. The code looks like:

    select Day, PV=(select sum(flex) from bigTable b where b.day=a.day)

    ,Looker=(select sum(flex) from bigTable c where c.day=a.day and ip in (select distinct ip from TT_IP_NON_BOOKER))

    ,Booker=(select sum(flex) from bigTable d where d.day=a.day and ip in (select distinct ip from TT_IP_BOOKER))

    ,Subnet=(select sum(flex) from bigTable e where e.day=a.day and subnet in (select distinct subnet from TT_SUBNET_IP_NON_BOOKER))

    from bigTable a

    where day >= dateadd(day,-6, '2010-04-12') and day <= '2010-04-12'

    group by day

    order by day

    This works fine ( even though it takes forever) in the in the query analyser, but when I'm running it through Excel using VBA, I get a run time error because it simply takes to long. I've read that I should use JOIN statements to provide the information with less lag time, but couldn't work out the best way to do it. Attempting to resolve the lag problem I ended up with numerous joins and unions, and no idea of how to end up with the same result set :pinch:

    I would put up the attempted solution code, but it's pretty bad and messy :blush:

    Any help in optimizing the code to run faster would be greatly appreciated 😀

    Thanks in advanced

    Below is the tables definitions

    =======================

    CREATE TABLE [dbo].[bigTable](

    [Day] [datetime] NULL,

    [Carrier] [varchar](10) NULL,

    127.0.0.1 [varchar](100) NULL,

    [subnet] [varchar](100) NULL,

    [referrer] [varchar](50) NULL,

    [PV] [int] NULL,

    [FLEX] [int] NULL,

    [BOOK] [int] NULL

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[TT_IP_BOOKER](

    127.0.0.1 [varchar](100) NULL,

    [book] [int] NULL

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[TT_IP_NON_BOOKER](

    127.0.0.1 [varchar](100) NULL,

    [book] [int] NULL

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[TT_SUBNET_IP_NON_BOOKER](

    [subnet] [varchar](100) NULL,

    [PV] [int] NULL,

    [FLEX] [int] NULL,

    [BOOK] [int] NULL

    ) ON [PRIMARY]

  • I think you should try to "flatten" a bit the query on BigTable

    Then only, I'd bother going onto the details of working out each sum.

    The advantage would be that you simply do one big scan on the BigTable (plus grouping...)

    Something along these lines

    SELECT day, IP, SubNet, SUM(Flex) AS SumFlex

    INTO #Smaller

    FROM BigTable

    WHERE day >= dateadd(day,-6, '2010-04-12') and day <= '2010-04-12'

    GROUP BY Day, IP, Subnet

    -- create any index(es) you may need (not day since you have only 6 days... Maybe IP?)

    SELECT PV = SUM(SumFlex)

    FROM #Smaller

    GROUP BY Day

    SELECT Day, Looker = Sum(SumFlex)

    FROM #Smaller

    WHERE IP IN (SELECT DISTINCT IP FROM TT_IP_NON_BOOKER)

    GROUP BY Day

    SELECT Day, Booker = Sum(SumFlex)

    FROM #Smaller

    WHERE IP IN (SELECT DISTINCT IP FROM TT_IP_BOOKER)

    GROUP BY Day

    SELECT Day, Looker = Sum(SumFlex)

    FROM #Smaller

    WHERE Subnet IN (SELECT DISTINCT SubNet FROM TT_SUBNET_IP_NON_BOOKER)

    GROUP BY Day

    In the first big select, your where clause may prevent the use of indexes so you may have to help by adding a hint on an index on Day if you have.

    (it also depends on whether the index covers the query, etc...)

    For #Smaller, you might want to add some index, depending on your data.

    You might also create your temp table before populating it (looks cleaner...)

    Finally, you could put the result of the last four query in temp tables again and then merge them all in another select (join...)

    Hope it makes sense...

    Cheers

    Eric

    PS: although the reading might be tough, I'd recommend reading "Inside SQL Server 2005: TSQL Querying". There is a 2008 version but I have not read it (yet?)

  • the lack of indexes, clustering indexes and foreign keys will indeed kill this kind of queries.

    Determine a primary key for every table and if you can, determine foreign keys that are supported by indexes.

    Avoid row level nested selects !

    Did you try this kind of query ?

    select a.Day

    , sum(a.flex) as PV

    , min(NB.sum_flex) as Looker

    , min(B.sum_flex) as Booker

    , min(S.sum_flex) as Subnet

    from bigTable a

    left join ( select day, sum(flex) as sum_flex

    from bigTable c

    where exists ( select *

    from TT_IP_NON_BOOKER NB1

    where NB1.ip = c.ip )

    group by day

    ) NB

    on NB.day = a.day

    left join ( select day, sum(flex) as sum_flex

    from bigTable d

    where exists ( select *

    from TT_IP_BOOKER B1

    where B1.ip = d.ip )

    group by day

    ) B

    on B.day = a.day

    left join ( select day, sum(flex) as sum_flex

    from bigTable e

    where exists ( select *

    from [TT_SUBNET_IP_NON_BOOKER] SNB

    where SNB.subnet = e.subnet )

    group by day

    ) S

    on S.day = a.day

    and S.subnet = a.subnet

    where a.day >= dateadd(dd, -6, '2010-04-12')

    and a.day <= '2010-04-12'

    group by a.day

    order by a.day

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hi, you asked about doing this with joins...

    select Day, PV=sum(flex)

    ,Looker=sum(CASE WHEN TINB.ip IS NULL THEN 0 ELSE a.flex END)

    ,Booker=sum(CASE WHEN TIB.ip IS NULL THEN 0 ELSE a.flex END)

    ,Subnet=sum(CASE WHEN TSINB.subnet IS NULL THEN 0 ELSE a.flex END)

    from bigTable a

    left outer join (select distinct ip from TT_IP_NON_BOOKER) TINB on TINB.ip = a.ip

    left outer join (select distinct ip from TT_IP_BOOKER) TIB on TIB.ip = a.ip

    left outer join (select distinct subnet from TT_SUBNET_IP_NON_BOOKER) TSINB on TSINB.subnet = a.subnet

    where day >= dateadd(day,-6, '2010-04-12') and day <= '2010-04-12'

    group by day

    order by day

    This produces a plan with only one read through bigTable and one each for the TT_??? tables. Depending on what indexes you have, this could perform a lot better than the subquery version.

    Another variant you might try is the APPLY one...

    select Day, PV=sum(a.flex)

    ,Looker=sum(ISNULL(TINB.flex,0))

    ,Booker=sum(ISNULL(TIB.flex,0))

    ,Subnet=sum(ISNULL(TSINB.flex,0))

    from bigTable a

    outer apply (select top 1 a.flex from TT_IP_NON_BOOKER x where x.ip = a.ip) TINB

    outer apply (select top 1 a.flex from TT_IP_BOOKER x where x.ip = a.ip) TIB

    outer apply (select top 1 a.flex from TT_SUBNET_IP_NON_BOOKER x where x.subnet = a.subnet ) TSINB

    where day >= dateadd(day,-6, '2010-04-12') and day <= '2010-04-12'

    group by day

    order by day

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I like that one indeed (the left outer join)!

    Cool...:Wow:

  • Ahhh, thanks for all the help guys, all great and clear input. 😀

    I look forward to testing/implementing your suggestions .

  • Viewing 6 posts - 1 through 5 (of 5 total)

    You must be logged in to reply to this topic. Login to reply