April 30, 2010 at 8:28 am
I am trying to retrieve data from numerous tables and combine it to provide one result set. To do this
using a number of sub queries that all access the same table (which is pretty huge) and a few other tables with correlated data. The code looks like:
select Day, PV=(select sum(flex) from bigTable b where b.day=a.day)
,Looker=(select sum(flex) from bigTable c where c.day=a.day and ip in (select distinct ip from TT_IP_NON_BOOKER))
,Booker=(select sum(flex) from bigTable d where d.day=a.day and ip in (select distinct ip from TT_IP_BOOKER))
,Subnet=(select sum(flex) from bigTable e where e.day=a.day and subnet in (select distinct subnet from TT_SUBNET_IP_NON_BOOKER))
from bigTable a
where day >= dateadd(day,-6, '2010-04-12') and day <= '2010-04-12'
group by day
order by day
This works fine ( even though it takes forever) in the in the query analyser, but when I'm running it through Excel using VBA, I get a run time error because it simply takes to long. I've read that I should use JOIN statements to provide the information with less lag time, but couldn't work out the best way to do it. Attempting to resolve the lag problem I ended up with numerous joins and unions, and no idea of how to end up with the same result set :pinch:
I would put up the attempted solution code, but it's pretty bad and messy :blush:
Any help in optimizing the code to run faster would be greatly appreciated 😀
Thanks in advanced
Below is the tables definitions
=======================
CREATE TABLE [dbo].[bigTable](
[Day] [datetime] NULL,
[Carrier] [varchar](10) NULL,
127.0.0.1 [varchar](100) NULL,
[subnet] [varchar](100) NULL,
[referrer] [varchar](50) NULL,
[PV] [int] NULL,
[FLEX] [int] NULL,
[BOOK] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[TT_IP_BOOKER](
127.0.0.1 [varchar](100) NULL,
[book] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[TT_IP_NON_BOOKER](
127.0.0.1 [varchar](100) NULL,
[book] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[TT_SUBNET_IP_NON_BOOKER](
[subnet] [varchar](100) NULL,
[PV] [int] NULL,
[FLEX] [int] NULL,
[BOOK] [int] NULL
) ON [PRIMARY]
May 1, 2010 at 7:08 am
I think you should try to "flatten" a bit the query on BigTable
Then only, I'd bother going onto the details of working out each sum.
The advantage would be that you simply do one big scan on the BigTable (plus grouping...)
Something along these lines
SELECT day, IP, SubNet, SUM(Flex) AS SumFlex
INTO #Smaller
FROM BigTable
WHERE day >= dateadd(day,-6, '2010-04-12') and day <= '2010-04-12'
GROUP BY Day, IP, Subnet
-- create any index(es) you may need (not day since you have only 6 days... Maybe IP?)
SELECT PV = SUM(SumFlex)
FROM #Smaller
GROUP BY Day
SELECT Day, Looker = Sum(SumFlex)
FROM #Smaller
WHERE IP IN (SELECT DISTINCT IP FROM TT_IP_NON_BOOKER)
GROUP BY Day
SELECT Day, Booker = Sum(SumFlex)
FROM #Smaller
WHERE IP IN (SELECT DISTINCT IP FROM TT_IP_BOOKER)
GROUP BY Day
SELECT Day, Looker = Sum(SumFlex)
FROM #Smaller
WHERE Subnet IN (SELECT DISTINCT SubNet FROM TT_SUBNET_IP_NON_BOOKER)
GROUP BY Day
In the first big select, your where clause may prevent the use of indexes so you may have to help by adding a hint on an index on Day if you have.
(it also depends on whether the index covers the query, etc...)
For #Smaller, you might want to add some index, depending on your data.
You might also create your temp table before populating it (looks cleaner...)
Finally, you could put the result of the last four query in temp tables again and then merge them all in another select (join...)
Hope it makes sense...
Cheers
Eric
PS: although the reading might be tough, I'd recommend reading "Inside SQL Server 2005: TSQL Querying". There is a 2008 version but I have not read it (yet?)
May 1, 2010 at 2:08 pm
the lack of indexes, clustering indexes and foreign keys will indeed kill this kind of queries.
Determine a primary key for every table and if you can, determine foreign keys that are supported by indexes.
Avoid row level nested selects !
Did you try this kind of query ?
select a.Day
, sum(a.flex) as PV
, min(NB.sum_flex) as Looker
, min(B.sum_flex) as Booker
, min(S.sum_flex) as Subnet
from bigTable a
left join ( select day, sum(flex) as sum_flex
from bigTable c
where exists ( select *
from TT_IP_NON_BOOKER NB1
where NB1.ip = c.ip )
group by day
) NB
on NB.day = a.day
left join ( select day, sum(flex) as sum_flex
from bigTable d
where exists ( select *
from TT_IP_BOOKER B1
where B1.ip = d.ip )
group by day
) B
on B.day = a.day
left join ( select day, sum(flex) as sum_flex
from bigTable e
where exists ( select *
from [TT_SUBNET_IP_NON_BOOKER] SNB
where SNB.subnet = e.subnet )
group by day
) S
on S.day = a.day
and S.subnet = a.subnet
where a.day >= dateadd(dd, -6, '2010-04-12')
and a.day <= '2010-04-12'
group by a.day
order by a.day
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 1, 2010 at 5:32 pm
Hi, you asked about doing this with joins...
select Day, PV=sum(flex)
,Looker=sum(CASE WHEN TINB.ip IS NULL THEN 0 ELSE a.flex END)
,Booker=sum(CASE WHEN TIB.ip IS NULL THEN 0 ELSE a.flex END)
,Subnet=sum(CASE WHEN TSINB.subnet IS NULL THEN 0 ELSE a.flex END)
from bigTable a
left outer join (select distinct ip from TT_IP_NON_BOOKER) TINB on TINB.ip = a.ip
left outer join (select distinct ip from TT_IP_BOOKER) TIB on TIB.ip = a.ip
left outer join (select distinct subnet from TT_SUBNET_IP_NON_BOOKER) TSINB on TSINB.subnet = a.subnet
where day >= dateadd(day,-6, '2010-04-12') and day <= '2010-04-12'
group by day
order by day
This produces a plan with only one read through bigTable and one each for the TT_??? tables. Depending on what indexes you have, this could perform a lot better than the subquery version.
Another variant you might try is the APPLY one...
select Day, PV=sum(a.flex)
,Looker=sum(ISNULL(TINB.flex,0))
,Booker=sum(ISNULL(TIB.flex,0))
,Subnet=sum(ISNULL(TSINB.flex,0))
from bigTable a
outer apply (select top 1 a.flex from TT_IP_NON_BOOKER x where x.ip = a.ip) TINB
outer apply (select top 1 a.flex from TT_IP_BOOKER x where x.ip = a.ip) TIB
outer apply (select top 1 a.flex from TT_SUBNET_IP_NON_BOOKER x where x.subnet = a.subnet ) TSINB
where day >= dateadd(day,-6, '2010-04-12') and day <= '2010-04-12'
group by day
order by day
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
May 2, 2010 at 5:11 am
I like that one indeed (the left outer join)!
Cool...:Wow:
May 2, 2010 at 6:02 am
Ahhh, thanks for all the help guys, all great and clear input. 😀
I look forward to testing/implementing your suggestions .
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply