June 13, 2014 at 12:14 pm
I got one that I just fired off a email about.
Double checking some reports that were previously built by the vendor. One of them kept crashing. Report Builder says next line read error. So I copy the main query into SSMS and try to run it and I get Msg 8114, Level 16, State 5, Line 3
Error converting data type Varchar to Numeric.
Hmm, I check and low and behold they have t1.VisitID=t2.VisitID on the JOIN. On one table its Numeric on the other its Varchar. Hmm that's weird wait a minute that Varchar one is not in a table its in a View. Dig,Dig,Dig 3 nested views later.
WTF, someone did a convert to Varchar on VisitID which happens to be a Primary Key on most of the tables in the db. Now I need to figure out how many things are dependent on this stupid dataset.
,VisitID = CASE WHEN c.VisitID is null THEN 'N/A' WHEN c.VisitId = 0 THEN 'N/A' ELSE convert(varchar,c.VisitID) END
***SQL born on date Spring 2013:-)
June 13, 2014 at 2:10 pm
thomashohner (6/13/2014)
I got one that I just fired off a email about.Double checking some reports that were previously built by the vendor. One of them kept crashing. Report Builder says next line read error. So I copy the main query into SSMS and try to run it and I get Msg 8114, Level 16, State 5, Line 3
Error converting data type Varchar to Numeric.
Hmm, I check and low and behold they have t1.VisitID=t2.VisitID on the JOIN. On one table its Numeric on the other its Varchar. Hmm that's weird wait a minute that Varchar one is not in a table its in a View. Dig,Dig,Dig 3 nested views later.
WTF, someone did a convert to Varchar on VisitID which happens to be a Primary Key on most of the tables in the db. Now I need to figure out how many things are dependent on this stupid dataset.
,VisitID = CASE WHEN c.VisitID is null THEN 'N/A' WHEN c.VisitId = 0 THEN 'N/A' ELSE convert(varchar,c.VisitID) END
Not very knowledgeable regarding SSRS, but couldn't the report handle changing nulls and zeros to 'N/A' instead of doing it in the SQL query?
June 13, 2014 at 2:25 pm
Sure Lynn, I believe that it can.
However, this issue goes beyond reports. This is the "Charges" dataset that this occurred on. Its not just SSRS that points to this, other things inside the application do as well. Including the in application reports.
This column that he converted is a NON-NULL, PK auto incrementing field. It is used as a PK/FK on many tables. I have no idea what the heck this guys thought process was.
I emailed a developer from the Vendor and he was as confused as I on what the heck that line of code is there for.
***SQL born on date Spring 2013:-)
June 13, 2014 at 4:35 pm
below86 (6/12/2014)
I'm just looking at our 'new' datawarehouse design, a former DBA helped start. All the table names start with 'tbl'. :w00t: This person also had the idea to store the dates as integers, like 20140612, and then join to a 'Date' table with this 'key' to get a real date field.
"All the table names start with 'tbl'" - there's a Motorhead song called "Die you Bastard". I have no idea why this suddenly started playing in my head <= LIE! LIE!
" store the dates as integers, like 20140612, and then join to a 'Date' table with this 'key' to get a real date field" - this still is a strong design smell, but if you're using a DateDim (or calendar table) in a DataWarehouse (or other) environment, ish. I think storing the integer format date, rather than an identity for the join is plain stupid. But a link into a DateDim / Calendar table is a lot less hassle and a bloody sight easier if you want to know working days, financial weeks, months, years. Do the complex calc once, then look it up. That's what database do. If the bugger's trying to do date calcs on that value, frankly, they're a buffoon and should be taken out and shot.
Slowly.
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
June 13, 2014 at 4:44 pm
Offhand, even in my warehouses, I never really saw a significant value in storing the date as a four byte int instead of the 8 byte datetime. Yes, I have used many a calendar table, and just keyed it off the datetime field. With a constraint to make sure the value equaled itself without time no 'bad' data would become included in either the fact or the dimension, and it allowed us to skip, completely, moving to the dimension table for date range based information that didn't require the subdata and allowed for more effective indexing against the fact.
I'm sorry, but I can't agree with the idea of surrogate keying a date. There's too much value to bringing the dimension in later in the execution plan to bother, especially if you're running ROLAP cubes.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
June 13, 2014 at 4:50 pm
Andrew Kernodle (6/12/2014)
Yep, we're in the same boat as far as compatibility and versions go. We're technically not supposed to be using 2008R2 by the vendor's contract (2008 is the latest version supported by them), but they mistakenly think that 2008 R2 is just a service pack level, so... :-P. No plans on the vendor's part to move beyond 2008, though.Also, their queries are fun, fun little things. The attached execution plan is one I pulled from some casual observation of the system. It may induce a feeling of overwhelming despair to see the first few operations :w00t:
Luckily, one of our vendors knows more about SQLServer than Paul Randall. They claimed a flaw in their crap was a database problem because one time, it fell over when the nightly DBCC CHECKDB job was running (and several other times during the day - randomly - when no maintenance was running).
"You have to stop using this DBCC thing as that's causing blocking"
"No that's not actually the case, that's not caused blocking since v7, IIRC - here's a link to a blog article which explains this <link to Paul's blog>"
"No - everyone knows it blocks, you have to stop running maintenance on your systems or we will not support them. I've seen it causes blocking"
"Err - the blog post, have you read it? This is Paul Randall, he wrote it. I'm not expecting you to believe me, but you know better than the guy who wrote it? Can we agree that this is not the problem and look into this in more detail"
"If you this process we will not support you, this ticket is closed"
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
June 13, 2014 at 5:02 pm
Koen Verbeeck (6/13/2014)
People come up with the nonsense because other people - like Kimball - tell them to.
Kimball advocates that every surrogate key should be a meaninglessinteger (which is a good thing). He is against using "smart surrogate keys", keys where there is some meaning baked into the surrogate key. However, he allows one exception: dates. He says it is OK to use something like 20140501 because it is still an integer, the meaning will never change (I think we will probably never change our gregorian calanders?) and it allows for easier debugging.
Hmm, I'd have to ask how, exactly - not of you Koen, I have the feeling your synapse are crying BS (not slaying Kimball in general - great work he's done) but in what way does this allow easier debugging?
You have a value it fails on. It's either in, or not in, the date dimension, Select from datedim where value is key - makes no difference when tracking the error ( in SQL Server, Informix, Oracle or Sybase anyway as far as I can think)
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
June 13, 2014 at 7:29 pm
Evil Kraig F (6/13/2014)
Offhand, even in my warehouses, I never really saw a significant value in storing the date as a four byte int instead of the 8 byte datetime. Yes, I have used many a calendar table, and just keyed it off the datetime field. With a constraint to make sure the value equaled itself without time no 'bad' data would become included in either the fact or the dimension, and it allowed us to skip, completely, moving to the dimension table for date range based information that didn't require the subdata and allowed for more effective indexing against the fact.I'm sorry, but I can't agree with the idea of surrogate keying a date. There's too much value to bringing the dimension in later in the execution plan to bother, especially if you're running ROLAP cubes.
Very good point - there's no end of abomination a poorly constructed DateDim can cause in a cube. I'd not thought about that one because I'd never consider implementing it. But that's not a design smell, well, stink - it's a design reek.
To be fair though, it's not quite as bad as storing it in a CHAR(8) - or that other good old fuckwit anti-pattern ... nah, we'll just store the address in a text / nvarchar(max) field ... what could possibly go wrong? (Is there an emoticon for "DIE, DIE, DIE!")
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
June 14, 2014 at 12:21 am
Quick thought on this, I think there is a slight mix-up in the conversation. Using integer in a YYYYMMDD format as a date dimension key in a datamart is not a bad idea. For sake of argument, lets say there are only 1x10^9 records in a typical "factSales". It would have TransactionDate, OrderDate, ShipDate, DeliveryDate and four indexes supporting the foreign key constraints.
Datetime = 8 x 8 x 10^9 = 64000000000 / 2^30 = 59.60 Gb
Integer = 4 x 8 x 10^9 = 32000000000 / 2^30 = 29.80 Gb
On the other hand, it's not brilliant in a transactional database (sales etc.).
😎
June 14, 2014 at 5:02 am
Sean Lange (6/13/2014)
The semantics here seem to be a little bit mixed. It seems that some of these integers are dates and others are keys. The keys one I have no issue with because they are keys.Keep in mind that the discussion between below86 and their coworker is that the coworker has some notion that storing dates as integers makes queries faster.
Well that's nonsense of course 🙂
Dates and integers are both 4 bytes.
The reason of my reply is that integer dates are fine in a data warehouse, as long as they are used as surrogate keys and nothing else.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 14, 2014 at 5:07 am
Jeff Moden (6/13/2014)
I really don't like integer dates. They're tough to do date math with and they have no built in checks for whether an entry is a legal date or not. For example, it's easy to add a date of 20140229 as an integer and no one would be the wiser. None of the DATE or DATETIME datatypes would allow it.
Well, if they are surrogate keys, they are not supposed to do math with.
Jeff Moden (6/13/2014)
As for surrogate keys necessarily being totally meaningless, I strongly disagree. What key would anyone use for a Customer table that would be more effective than a simple INTEGER IDENTITY or INTEGER SEQUENCE column?
Not sure what you are getting at? An IDENTITY or SEQUENCE is quite meaningless?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 14, 2014 at 5:12 am
jasona.work (6/13/2014)
WHY can't MS come up with a better way to store the duration of Agent jobs? Saving it as an INT? Really? I mean, I can see not using any of the date / date time, but it's a royal PITA if you're trying to average out how long a job has been taking...
That IS pretty annoying...
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 14, 2014 at 5:20 am
Michael Valentine Jones (6/13/2014)
Something no one has mentioned is DATETIME in the fact table for situations where it may be important. I have seen people break date and time into different columns. That just makes for horrible coding for queries when you need to select data for a DATETIME range. Example: from 10:00 am on the second day of the month through 6:00 pm yesterday. I believe Kimball called a dimensional column like this with no dimension table a "degenerate" dimension. Of course, you could also have Date and time of day dimensions in addition to the "degenerate" datetime dimension.
Typically you break a datetime into a date part and a time part and store it in two dimensions.
This allows you to solve analytical questions on the date part (with attributes stored in the date dimensions), such as influences of weekdays, holidays, seasonality and so on. In the time dimension, you can store attributes such as opening hours, part of the day (morning, midday, afternoon, evening, night) and so on. So the seperation is done for a reason 🙂
If you need the actual datetime for some reason (and you don't want to join two times), you can indeed store it along in the fact table as a degenerate dimension.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 14, 2014 at 5:26 am
andrew gothard (6/13/2014)
" store the dates as integers, like 20140612, and then join to a 'Date' table with this 'key' to get a real date field" - this still is a strong design smell, but if you're using a DateDim (or calendar table) in a DataWarehouse (or other) environment, ish. I think storing the integer format date, rather than an identity for the join is plain stupid. But a link into a DateDim / Calendar table is a lot less hassle and a bloody sight easier if you want to know working days, financial weeks, months, years.
What is the difference between having an integer key that has the value 12575 or the value 20140513?
The first one is just a sequence, while the second one is a sequence as well but with some gaps once in a while.
The second one helps debugging (no extra join needed when you quickly want to check something) and also helps with partitioning if necessary.
Why is the second one plain stupid? I'd like to be enlightened 🙂
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 15, 2014 at 12:45 pm
Select *
Define all varchar's as varchar(max) to be on the safe side.
Passing strings as date-parameters: where mydate='01/06/2005' -> the first of june or the sixth of january ?
Viewing 15 posts - 226 through 240 (of 271 total)
You must be logged in to reply to this topic. Login to reply