September 25, 2017 at 11:38 am
ScottPletcher - Monday, September 25, 2017 9:53 AMI suppose it's a matter of philosophy too. In my view, queries should not check the validity of data. Instead, at INSERT time, a trigger or other follow-on process should do that. Otherwise you have to write data checking into every query that uses the data -- assuming you knew about it at the time! -- which is just not logical really.
You and I don't differ in that philosophy at all. The table should actually have a check constraint on the column that enforces the format and the range of the two individual parts at the very least.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 25, 2017 at 11:42 am
Jason A. Long - Monday, September 25, 2017 10:42 AMI've gotta go w/ Scott of this one... At some point, you have to trust that your data is what it's supposed to be... Then again, I'd never store a date as a varchar to begin with (If I only cared about year and month, I'd just set the values to 1st of the month)... So there's that...In any case, I think it's safe to definitively say that the integer math method is substantially faster than the datepart(qq, cast(col as date)) method.
Agreed. The only reason why I brought it up is that we do a shedload of ETL from supposedly intelligent and "carefully scrubbed" data that causes things to blow up because the providers of the data frequently don't meet the letter of the agreed upon format. It's also why I never trust such external (or even internal data from a source that is outside our group) and ALWAYS use a staging table for the initial load and then validate it all before it goes anywhere near a production table. I wanted to make sure that no one that used either Scott's or my methods were thinking that it would provide any type of data validation at all.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 25, 2017 at 12:01 pm
Jeff Moden - Monday, September 25, 2017 11:42 AMJason A. Long - Monday, September 25, 2017 10:42 AMI've gotta go w/ Scott of this one... At some point, you have to trust that your data is what it's supposed to be... Then again, I'd never store a date as a varchar to begin with (If I only cared about year and month, I'd just set the values to 1st of the month)... So there's that...In any case, I think it's safe to definitively say that the integer math method is substantially faster than the datepart(qq, cast(col as date)) method.
Agreed. The only reason why I brought it up is that we do a shedload of ETL from supposedly intelligent and "carefully scrubbed" data that causes things to blow up because the providers of the data frequently don't meet the letter of the agreed upon format. It's also why I never trust such external (or even internal data from a source that is outside our group) and ALWAYS use a staging table for the initial load and then validate it all before it goes anywhere near a production table. I wanted to make sure that no one that used either Scott's or my methods were thinking that it would provide any type of data validation at all.
Now you're preaching to the choir Reverend... I'm about half way convinced that neither health care providers nor insurance companies have actual have databases... They just have some poor, miserable, basterd manually key their data files every time they're sent.
September 26, 2017 at 5:44 am
Jason A. Long - Monday, September 25, 2017 12:01 PMJeff Moden - Monday, September 25, 2017 11:42 AMJason A. Long - Monday, September 25, 2017 10:42 AMI've gotta go w/ Scott of this one... At some point, you have to trust that your data is what it's supposed to be... Then again, I'd never store a date as a varchar to begin with (If I only cared about year and month, I'd just set the values to 1st of the month)... So there's that...In any case, I think it's safe to definitively say that the integer math method is substantially faster than the datepart(qq, cast(col as date)) method.
Agreed. The only reason why I brought it up is that we do a shedload of ETL from supposedly intelligent and "carefully scrubbed" data that causes things to blow up because the providers of the data frequently don't meet the letter of the agreed upon format. It's also why I never trust such external (or even internal data from a source that is outside our group) and ALWAYS use a staging table for the initial load and then validate it all before it goes anywhere near a production table. I wanted to make sure that no one that used either Scott's or my methods were thinking that it would provide any type of data validation at all.
Now you're preaching to the choir Reverend... I'm about half way convinced that neither health care providers nor insurance companies have actual have databases... They just have some poor, miserable, basterd manually key their data files every time they're sent.
Hey, I resemble that remark! I most certainly do have databases. And most certainly do wrestle with the data that comes in to populate those databases.
In my experience, some of the worst data format offenders are government and government-contracted organizations. Really, the State expects us to manually examine an 1800 line spreadsheet to see if any of the data in it affects our members or providers? Because they sure don't send out consistent values; it changes every month.
September 27, 2017 at 1:51 pm
I would recommend setting up a calendar table (or adding columns to your existing calendar table) for this. The consist of three columns; starting date of a quarter, the final date of a quarter, and the name of the quarter. You already seem to have a format for started, but I would go ahead and use something like"yyyyQ[14]" you have the year, a separator (the Q is logical), and the number of the quarter within the year. I don't know what your needs are of course, but I found it going for 100 years is usually pretty safe and it keeps the table very small. You can locate the quarter of a given date with a simple between predicate. An ordinal quarter column, changes to the begin and end dates for special situations and so forth are advantages of the basic lookup table. I also strongly recommend talking to the accounting department and asking them if there's anything you should know about fiscal years or other irregular things.
Please post DDL and follow ANSI/ISO standards when asking for help.
September 27, 2017 at 2:20 pm
Jeff Moden - Monday, September 25, 2017 11:42 AMJason A. Long - Monday, September 25, 2017 10:42 AMI've gotta go w/ Scott of this one... At some point, you have to trust that your data is what it's supposed to be... Then again, I'd never store a date as a varchar to begin with (If I only cared about year and month, I'd just set the values to 1st of the month)... So there's that...In any case, I think it's safe to definitively say that the integer math method is substantially faster than the datepart(qq, cast(col as date)) method.
Agreed. The only reason why I brought it up is that we do a shedload of ETL from supposedly intelligent and "carefully scrubbed" data that causes things to blow up because the providers of the data frequently don't meet the letter of the agreed upon format. It's also why I never trust such external (or even internal data from a source that is outside our group) and ALWAYS use a staging table for the initial load and then validate it all before it goes anywhere near a production table. I wanted to make sure that no one that used either Scott's or my methods were thinking that it would provide any type of data validation at all.
I use a trigger to automatically validate such data, unless overridden by fiat (as, sadly, even a Sr. DBA often is). An error bit per column is all I put in the original table; any error detail(s) go in a separate table, because (1) there can be more than one error, even on a single column; and (2) you may want metadata about the error itself, such as when it was determined, who/what process determined it was error, who resolved it, etc; and (3) a history of the error(s) even after they are resolved.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 28, 2017 at 12:40 pm
>> I use a trigger to automatically validate such data, ..<<
I disagree with this approach. This assumes that your database will always be in one and only one particular product, so you can depend on its model of triggers. In the real world, even if your company is only on one and only one database in one on only one particular release of that database, your outside sources will be in other products (DB2, Oracle, Postgres, MySQL, etc) .
If you can do the validation of data with CHECK() constraints or REFERENCES clauses, then this is portable. Perhaps more importantly, it is understood by SQL programmers, as opposed to weirdo dialect programmers.
The reason we kept triggers in the ANSI/ISO standards was necessity. The original SQL engines were built on top of existing filesystems on mainframes. We had to use the procedural languages available on those filesystems, and we also had to embed them in the original early SQL standards. But as Dave McGovern said, "a committee never met a feature. It did not like", so the full ANSI/ISO standard definition of triggers is monstrously, procedurally elaborate. It would probably make Dr. Codd sick. We never intended SQL to be a full procedural language; it was just supposed to be database only. But then we had to add to the SQL/PSM because a guy from DEC submitted a paper based on ADA. Andrew had never programmed in ADA, but I was with AIRMICS when the Army went over to do this language. It's really a horrible language which has since been discontinued as a DOD mandate. This is also the basis of the proprietary Oracle product (DB2 from IBM canal compile the Oracle PL/SQL code)
My experiences been that people do too much in the database in the way of computations, display, control, and programming. But they don't do enough about data integrity. SQL is supposed to be database; it's job is to maintain the integrity of the data inside his little fiefdom, and when and unknown but valid outsider ask for that data, it puts it in the standard package and "throws it over the wall"to the requester. After that point, we don't care about it; we've done our job.
Please post DDL and follow ANSI/ISO standards when asking for help.
September 28, 2017 at 1:15 pm
>> I use a trigger to automatically validate such data, ..<<
I disagree with this approach. This assumes that your database will always be in one and only one particular product, so you can depend on its model of triggers. In the real world, even if your company is only on one and only one database in one on only one particular release of that database, your outside sources will be in other products (DB2, Oracle, Postgres, MySQL, etc) .
Not really. Other relational dbms's have triggers as well. And if you end up sending data to a store that didn't have triggers, you could re-engineer it then. Companies won't switch products without a really big reason anyway. And I've contracted to help businesses move from Oracle to SQL Server, since I know both, and, yes, triggers must be completely rewritten, but most are straightforward and can be done quickly anyway.
Furthermore, triggers can provide defaults and do all sorts of other things that a simple CHECK constraint can't.
If you can do the validation of data with CHECK() constraints or REFERENCES clauses, then this is portable. Perhaps more importantly, it is understood by SQL programmers, as opposed to weirdo dialect programmers.
But then any error prevents the entire transaction from occurring. That is one of the main points I tried to make about a trigger: Non-critical error(s) can be immediately flagged, but data still loads. This is often acceptable, as some data can be corrected later. One company I was at had huge international orders, 2000+ items, and it just wasn't possible to reject an entire order over an error or two. And if your developers can't figure out how maintain an already written trigger, you need better training for them.
The reason we kept triggers in the ANSI/ISO standards was necessity. The original SQL engines were built on top of existing filesystems on mainframes. We had to use the procedural languages available on those filesystems, and we also had to embed them in the original early SQL standards. But as Dave McGovern said, "a committee never met a feature. It did not like", so the full ANSI/ISO standard definition of triggers is monstrously, procedurally elaborate. It would probably make Dr. Codd sick. We never intended SQL to be a full procedural language; it was just supposed to be database only. But then we had to add to the SQL/PSM because a guy from DEC submitted a paper based on ADA. Andrew had never programmed in ADA, but I was with AIRMICS when the Army went over to do this language. It's really a horrible language which has since been discontinued as a DOD mandate. This is also the basis of the proprietary Oracle product (DB2 from IBM canal compile the Oracle PL/SQL code)
Good things often arise from accidents or unexpected twists and turns. Perhaps triggers join penicillin in that regard.
My experiences been that people do too much in the database in the way of computations, display, control, and programming. But they don't do enough about data integrity. SQL is supposed to be database; it's job is to maintain the integrity of the data inside his little fiefdom, and when and unknown but valid outsider ask for that data, it puts it in the standard package and "throws it over the wall"to the requester. After that point, we don't care about it; we've done our job.
Unquestionably, yes, data integrity is almost never as well enforced as it should be. But we also have to be realistic So while we should strive to accomplish lofty goals, we shouldn't allow that to prevent us from doing the useful work we need to do now. I can't refuse to load any data because it's not perfect.
Similarly, given that no relational dbms product even now meets all of Codd's requirements, I'm very glad that real-world practicality overrode theoretical thinking. Hail System R and all its descendants, despite their inherent horrors.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply