January 31, 2014 at 2:19 am
I have been working on a project to establish how many people met certain conditions on any given day of a month to create a dashboard. The dashboard includes counts and averages of the numbers grouped by month and another condition. I'd written a proc to do this and my boss suggested that I use the proc to create a view because the data will be useful in other places later. I created the view by copying the code from my proc, removing any grouping and ordering, adding a couple of date columns to it and writing 'create view X as' on the top. I then made a copy of my dashboard proc, deleted the code from it and pointed it the view. In theory, the view should do exactly the same as the proc code, because it was the proc code. The problem is that the counts are coming back different and I've no idea how. When I run the view code it returns just over 180000 rows and when I run the proc without the grouping it returns exactly the same number of rows. The counts are all in the same general range but not the same. Has anybody got any suggestions as to why this could be happening or obvious things to check first.
I'll get some code posted as soon as I can, I'm just not sure how best to do it. I can't think how to post sample data because there's quite a few tables that need joining to create the view. While I know what correct data looks like, I'm not sure if 'correct data' will replicate the problem. I think it's either something really obvious or something very in depth that a relative newbie like myself wouldn't ever think of.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 31, 2014 at 2:29 am
Going to need to see the code. Can you post just the procedure and view initially, might need more later.
I created the view by copying the code from my proc, removing any grouping and ordering, adding a couple of date columns to it and writing 'create view X as' on the top. I then made a copy of my dashboard proc, deleted the code from it and pointed it the view. In theory, the view should do exactly the same as the proc code, because it was the proc code.
You said you removed the grouping and ordering? If you removed a group by clause, then the view code is no longer the same as the procedure code and could well return different results.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 31, 2014 at 3:29 am
I've attached the code now. I've had to do a bit of obfuscation, which I hope doesn't hide the logic. The organisation I work for is quite high profile and security conscious so I'm trying not to give too much away.
With regards to the liberal application of nolocks, what can I say. My boss is a firm believer in them and when don't put them in it's the first thing he mentions. I know they are potentially a bad thing and I'm looking for a cast iron case I can put to him about why we shouldn't rely on them. It would be nice if they were the cause of this little problem.
Edit Grammar
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 31, 2014 at 3:33 am
Could you rather put the code in your post, wrapped in the code tags (see left-hand panel)
Word docs from untrusted sources.... 🙁
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 31, 2014 at 3:41 am
Fair enough, although I'm a little hurt you don't trust me.... 🙂
The code to create the dashboard with the proc
set transaction isolation level read uncommitted
set dateformat ymd
select distinct
OrderType =
case pO.ordertypedesciptionid
when 'Bilge Rat' then 'BR'
when 'Bilge Rat - Long Tail' then 'BR - Long Tail'
when 'Scurvy Dog' then 'SD'
when 'Jolly Jack Tar' then 'JTT'
else pO.ordertypedesciptionid
end
,dd.YearMonth
,dd.MonthDateOfYear
,dd.DateYear
,ServiceDays = COUNT(1)
,AverageServiceDays = COUNT(1)/(select COUNT(1) from dbo.D_Date d2 where d2.YearMonth = dd.YearMonth)
,NumberOfDaysInCurrentMonth = (select COUNT(1) from dbo.D_Date d2 where d2.YearMonth = dd.YearMonth)
from
dbo.pirateorder po (nolock)
join dbo.vwSwordAllocation sa (nolock) on sa.OrderID = po.orderno_int
JOIN dbo.piratescontacts PC (nolock) on PC.contactid = po.subjectid
JOIN dbo.serviceatsea sas (nolock) on sas.pirateorderid_lk = po.pirateorderid
join dbo.D_Date dd (nolock) on dbo.timetoUTC(po.OrderStartDate) <= dd.fulldate
and dbo.timetoUTC(po.OrderEndDate_dt) >= dd.FullDate
cross apply
(
select
minH = (
select Min(cast(hff.HourStartDate as DATE))
from dbo.vwHoursbeforethemast hff (nolock)
where po.orderno_int = hff.orderid
)
,
maxH = (
select Max(cast(hff.HourEndDate as DATE))
from dbo.vwHoursbeforethemast hff (nolock)
where po.orderno_int = hff.orderid
)
) mH
where
pc.greenhorn = 0
and pc.lastname not like 'Test%'
and pc.firstname not like 'test%'
and pc.excludefromrations <> 1
and po.statusreason <> 3 --'void'
and dd.FullDate >= cast(sa.SwordIssued as date)
and dd.FullDate <= isnull(cast(sa.Swordreturned as date),sa.OrderEnd)
and dd.FullDate >= cast(sas.servicestartdate as DATE)
and dd.FullDate <= cast(sas.serviceenddate as date)
and dd.FullDate >= mh.minH
and dd.FullDate <= mh.maxH
and dd.FullDate <= GETDATE()
group by dd.YearMonth,pO.ordertypedesciptionid,dd.MonthDateOfYear,dd.DateYear
order by dd.YearMonth,dd.DateYear
The code to create the view.
Create view [dbo].[vw_DreadPirates] as
select distinct
Orderno = po.orderno_int
,SubjectID = pc.subjectno
,OrderType = pO.ordertypedesciptionid
,OrderStartDate = po.OrderStartDate
,OrderEndDate = po.OrderEndDate
,po.CreatedOn
,dd.FullDate
,Monitored = 1
,dd.DayOfWeek
,dd.DayNumberOfWeek
,dd.YearMonth
,dd.DateOfMonth
,dd.MonthOfYear
,dd.MonthDateOfYear
,dd.DateYear
from
dbo.pirateorder po (nolock)
join dbo.vwSwordAllocation sa (nolock) on sa.OrderID = po.orderno_int
JOIN dbo.piratescontacts pC (nolock) on pC.contactid = so.subjectid
JOIN dbo.serviceatsea sas (nolock) on sas.pirateorderid_lk = po.pirateorderid
join mibi.dbo.D_Date dd (nolock) on dbo.TimeToUTC(po.OrderStartDate) <= dd.fulldate
and dbo.TimetoUTC(po.OrderEndDate) >= dd.FullDate
cross apply
(
select
minH = (
select Min(cast(hff.HourStartDate as DATE))
from dbo.vwHoursbeforethemast hff (nolock)
where po.orderno_int = hff.orderid
)
,
maxH = (
select Max(cast(hff.HourEndDate as DATE))
from .dbo.vwHoursbeforethemast hff (nolock)
where po.orderno_int = hff.orderid
)
) mH
where
pc.greenhorn = 0
and pc.lastname not like 'Test%'
and pc.firstname not like 'test%'
and pc.excludefromrations <> 1
and po.statusreason <> 3 --'void'
and dd.FullDate >= cast(sa.swordissued as date)
and dd.FullDate <= isnull(cast(sa.Swordreturned as date),sa.OrderEnd)
and dd.FullDate >= cast(sas.servicestartdate as DATE)
and dd.FullDate <= cast(sas.vf_serviceenddate as date)
and dd.FullDate >= mh.minH
and dd.FullDate <= mh.maxH
and dd.FullDate <= GETDATE()
And last, the code that creates the dashboard using the view.
set transaction isolation level read uncommitted
set dateformat ymd
;
select distinct
OrderType =
case pO.ordertypedesciptionid
when 'Bilge Rat' then 'BR'
when 'Bilge Rat - Long Tail' then 'BR - Long Tail'
when 'Scurvy Dog' then 'SD'
when 'Jolly Jack Tar' then 'JTT'
else pO.ordertypedesciptionid
end
,mp.YearMonth
,mp.MonthDateOfYear
,mp.DateYear
,ServiceDays = COUNT(1)
,AverageServiceDays = COUNT(1)/(select COUNT(1) from dbo.D_Date d2 where d2.YearMonth = mp.YearMonth)
,NumberOfDaysInCurrentMonth = (select COUNT(1) from dbo.D_Date d2 where d2.YearMonth = mp.YearMonth)
from
dbo.vw_DreadPirates mp
group by mp.YearMonth,mp.OrderType,mp.MonthDateOfYear
,mp.DateYear
order by mp.YearMonth
,mp.DateYear
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 31, 2014 at 4:05 am
Can't see anything obvious (other than multiple unnecessary DISTINCTs)
Is the difference in rowcount consistent?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 31, 2014 at 4:25 am
GilaMonster (1/31/2014)
Can't see anything obvious (other than multiple unnecessary DISTINCTs)Is the difference in rowcount consistent?
If you mean 'is the difference in ServiceDays and AverageServiceDays the same every time' then yes. I'd expect some changes in the data for this month because it is altered pretty much constantly but the historic data should stay the same.
The distincts are to get round an problem where a pirate can have multiple sword allocations and multiple services starting on the same day but they only count as one pirate. The object of the view is one row per active pirate per day. Without the distinct I got one row per pirate per service per sword.
EDIT Changed some language
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 31, 2014 at 4:44 am
BWFC (1/31/2014)
GilaMonster (1/31/2014)
Can't see anything obvious (other than multiple unnecessary DISTINCTs)Is the difference in rowcount consistent?
If you mean 'is the difference in ServiceDays and AverageServiceDays the same every time' then yes.
No, I mean the difference in row counts between the procedure and view, the problem that we're trying to solve.
The distincts are to get round an problem where a pirate can have multiple sword allocations and multiple services starting on the same day but they only count as one pirate. The object of the view is one row per active pirate per day. Without the distinct I got one row per pirate per service per sword.
Distincts only remove complete duplicate rows, so if there are rows for the same pirate, the same sword and different services, distinct won't remove those. Group by, if you group by the pirate and sword will.
You don't need distinct at multiple levels, it's a waste, you're telling SQL again and again to remove duplicate rows, if there are really duplicates that need removing (which there won't be with a group by), you need a distinct once only, in the outer-most scope.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 31, 2014 at 6:11 am
No, I mean the difference in row counts between the procedure and view, the problem that we're trying to solve.
Of course you did, sorry, conference call brain fry.
There is no difference in row count between the proc and the view. The problem is, it's a little difficult to compare like with like. I've just run both versions of the dashboard, with the proc and with the view. I added the fulldate and the subjectno columns and then added those to the grouping. This brought back the same row count for both. When I ran the two as they should be, the counts and averages were different. I'm aware that changing the grouping will change the output, but that was the best way I could think of to make sure I was comparing the same things. I've still got a horrible feeling I'm doing something really daft.
Also, as a result of the conference call, I'm now to include pirates that are excluded from rations. At least that's just a matter of taking out a line in the proc and the view code.
Distincts only remove complete duplicate rows, so if there are rows for the same pirate, the same sword and different services, distinct won't remove those. Group by, if you group by the pirate and sword will.
That's what I thought distinct did. When I take out the distinct in the view code I gain about 14000 rows though and they're definitely duplicates. Is it safe to leave the distinct in the view code or am I better to remove it and exclude the duplicate rows another way?
The main reason that I've got them at multiple levels is that I forgot to remove them.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 31, 2014 at 6:48 am
You might get some performance lift by jigging around with your filters. If you're going down this route then do take on board Gail's comments about DISTINCT. You might benefit from rolling up one of those child tables in a derived table.
select distinct
OrderType =
case pO.ordertypedesciptionid
when 'Bilge Rat' then 'BR'
when 'Bilge Rat - Long Tail' then 'BR - Long Tail'
when 'Scurvy Dog' then 'SD'
when 'Jolly Jack Tar' then 'JTT'
else pO.ordertypedesciptionid
end
,dd.YearMonth
,dd.MonthDateOfYear
,dd.DateYear
,ServiceDays = COUNT(1)
,AverageServiceDays = COUNT(1)/dd.NumberOfDaysInCurrentMonth
,dd.NumberOfDaysInCurrentMonth
from dbo.pirateorder po (nolock)
cross apply (
select
minH = Min(cast(hff.HourStartDate as DATE)),
maxH = Max(cast(hff.HourEndDate as DATE))
from dbo.vwHoursbeforethemast hff (nolock)
where po.orderno_int = hff.orderid
) mH
CROSS APPLY (
select
NumberOfDaysInCurrentMonth = COUNT(*) OVER(PARTITION BY YearMonth),
FullDate,
YearMonth,
MonthDateOfYear,
DateYear
from dbo.D_Date
WHERE FullDate >= mh.minH
and FullDate <= mh.maxH
and FullDate <= GETDATE()
and dbo.timetoUTC(po.OrderStartDate) <= dd.fulldate
and dbo.timetoUTC(po.OrderEndDate_dt) >= dd.FullDate
) dd
join dbo.vwSwordAllocation sa (nolock)
on sa.OrderID = po.orderno_int
and dd.FullDate >= cast(sa.SwordIssued as date)
and dd.FullDate <= isnull(cast(sa.Swordreturned as date),sa.OrderEnd)
JOIN dbo.piratescontacts PC (nolock) on PC.contactid = po.subjectid
JOIN dbo.serviceatsea sas (nolock)
on sas.pirateorderid_lk = po.pirateorderid
and dd.FullDate >= cast(sas.servicestartdate as DATE)
and dd.FullDate <= cast(sas.serviceenddate as date)
where pc.greenhorn = 0
and pc.lastname not like 'Test%'
and pc.firstname not like 'test%'
and pc.excludefromrations <> 1
and po.statusreason <> 3 --'void'
group by dd.YearMonth,pO.ordertypedesciptionid,dd.MonthDateOfYear,dd.DateYear
order by dd.YearMonth,dd.DateYear
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 31, 2014 at 7:47 am
ChrisM@Work (1/31/2014)
You might get some performance lift by jigging around with your filters. If you're going down this route then do take on board Gail's comments about DISTINCT. You might benefit from rolling up one of those child tables in a derived table.
select distinct
OrderType =
case pO.ordertypedesciptionid
when 'Bilge Rat' then 'BR'
when 'Bilge Rat - Long Tail' then 'BR - Long Tail'
when 'Scurvy Dog' then 'SD'
when 'Jolly Jack Tar' then 'JTT'
else pO.ordertypedesciptionid
end
,dd.YearMonth
,dd.MonthDateOfYear
,dd.DateYear
,ServiceDays = COUNT(1)
,AverageServiceDays = COUNT(1)/dd.NumberOfDaysInCurrentMonth
,dd.NumberOfDaysInCurrentMonth
from dbo.pirateorder po (nolock)
cross apply (
select
minH = Min(cast(hff.HourStartDate as DATE)),
maxH = Max(cast(hff.HourEndDate as DATE))
from dbo.vwHoursbeforethemast hff (nolock)
where po.orderno_int = hff.orderid
) mH
CROSS APPLY (
select
NumberOfDaysInCurrentMonth = COUNT(*) OVER(PARTITION BY YearMonth),
FullDate,
YearMonth,
MonthDateOfYear,
DateYear
from dbo.D_Date
WHERE FullDate >= mh.minH
and FullDate <= mh.maxH
and FullDate <= GETDATE()
and dbo.timetoUTC(po.OrderStartDate) <= dd.fulldate
and dbo.timetoUTC(po.OrderEndDate_dt) >= dd.FullDate
) dd
join dbo.vwSwordAllocation sa (nolock)
on sa.OrderID = po.orderno_int
and dd.FullDate >= cast(sa.SwordIssued as date)
and dd.FullDate <= isnull(cast(sa.Swordreturned as date),sa.OrderEnd)
JOIN dbo.piratescontacts PC (nolock) on PC.contactid = po.subjectid
JOIN dbo.serviceatsea sas (nolock)
on sas.pirateorderid_lk = po.pirateorderid
and dd.FullDate >= cast(sas.servicestartdate as DATE)
and dd.FullDate <= cast(sas.serviceenddate as date)
where pc.greenhorn = 0
and pc.lastname not like 'Test%'
and pc.firstname not like 'test%'
and pc.excludefromrations <> 1
and po.statusreason <> 3 --'void'
group by dd.YearMonth,pO.ordertypedesciptionid,dd.MonthDateOfYear,dd.DateYear
order by dd.YearMonth,dd.DateYear
Thanks Chris, I gave that a try and unfortunately it almost doubles the running time. It also threw the grouping out when I used the calculated numberofdaysinthecurrentmonth column. That being said, I didn't know you could a)calculate two columns in the same cross apply select and b)use two cross applies in a query.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 31, 2014 at 9:35 am
That's okay, it was always going to be a long shot with no sample data blah blah.
I hope you can take away a few ideas from it though, and if you aren't already, then check the execution plan each time you change a query.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 31, 2014 at 10:43 am
ChrisM@Work (1/31/2014)
That's okay, it was always going to be a long shot with no sample data blah blah.I hope you can take away a few ideas from it though, and if you aren't already, then check the execution plan each time you change a query.
I'll definitely have a look at the execution plans, although they're not something I'm all that familiar with. I've used the multiple column cross apply thanks. I still get dodgy counts but I do at least get them a bit quicker!
I'll have a think about how to provide a data sample over the weekend. Like I said in my original post though, I know what the data should look like but it might be incorrect data causing the problem. I think I've narrowed it down to the counts now. I get the same number of rows when I run the proc and the view without the groups on them so I don't think there's anything either missing or duplicated.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 31, 2014 at 2:21 pm
Remove the Distinct in the view definition - that is going to eliminate rows from the grouping that are actually included in the procedure.
The distinct will only occur after the grouping has occurred - so in your procedure the 'duplicate' rows are counted and then your distinct is applied. Using the view, the distinct is applied to the rows before they are grouped in the procedure.
Also note: if you set transaction isolation level in the procedure you don't need to use nolock on every table.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
February 3, 2014 at 5:21 am
Jeffrey Williams 3188 (1/31/2014)
Remove the Distinct in the view definition - that is going to eliminate rows from the grouping that are actually included in the procedure.The distinct will only occur after the grouping has occurred - so in your procedure the 'duplicate' rows are counted and then your distinct is applied. Using the view, the distinct is applied to the rows before they are grouped in the procedure.
Also note: if you set transaction isolation level in the procedure you don't need to use nolock on every table.
Thanks Jeffrey, it was down to the distincts in the proc and the view. The distinct in the view means I get one row per pirate per day which is what I was aiming for. I was struggling to establish whether it was missing data in the view or duplicate data in the proc and we've worked out that it was duplicates in the proc (although it's more difficult to establish where they're coming from). I can go ahead now and use the view with confidence.
What transaction level should I be setting it to? I assume, to make sure it's clean data I should be using read committed. I know it's not great but we are quite heavily reliant on the 'silver bullet' nolock method so we don't tend to use the transaction levels. I'm very much the new boy so I'm open to all (good) advice.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply