July 26, 2012 at 9:27 pm
Comments posted to this topic are about the item Summarizing Imported Data
July 27, 2012 at 2:01 pm
You've made a good start. I tend to query each column separately as I'm doing a conversion using templates, especially since value distributions are extremely helpful. Each data type has its quirks and custom aspects to look at. For example, for datetime data, I like to know if it is strictly dates or dates with times. And I like to know if the data fits in a smalldatetime. Knowing empty strings vs. nulls is important too.
Below is a template I use for imported datetime data interrogation
-- interrogate DateTime column
-- to see if it is a Time column, nullable, or an expected range
select
COUNT(*) as CountRows,
sum(case when <colname,,> is null then 1 else 0 end) as NullsCount,
sum(case when CONVERT(varchar(12), <colname,,>, 114) = '00:00:00:000' then 1 else 0 end) as DateNoTimeCount,
sum(case when CONVERT(varchar(12), <colname,,>, 114) > '00:00:00:000' then 1 else 0 end) as DateTimeCount,
sum(case when <colname,,> < '1/1/1900' then 1 else 0 end) as Before1900Count,
sum(case when <colname,,> between '1/1/1900' and '1/1/2000' then 1 else 0 end) as BetweenCount,
sum(case when <colname,,> >= '1/1/2000' then 1 else 0 end) as After2000Count
from
<tablename,,>
-- most frequent values
select top 100
<colname,,>,
COUNT(*) as frequencyCount
from
<tablename,,>
group by
<colname,,>
having
COUNT(*) > 1
order by
COUNT(*) desc
July 28, 2012 at 9:35 am
Very well written.
Regards,
Basit A. Farooq (MSC Computing, MCITP SQL Server 2005 & 2008, MCDBA SQL Server 2000)
http://basitaalishan.comAugust 7, 2012 at 8:28 am
Thanks, Bill. That's a great method for interrogating the datetime column. I'll bet you've amassed quite a collection of such in your work.
Do you primarily run these as needed and keep notes elsewhere on your conversion process, or run the interrogations once (each time a new import is done) and store results to tables? I used to always do the former, but lately have been finding benefit from keeping all in one place. Plus, it forces me to document my queries more rigorously, which is always a good thing. 🙂
Thanks again for the read, and for the comment!
August 7, 2012 at 12:20 pm
Storing your results in tables is a good idea; I just get once-and-done conversion tasks from new customers. The templates are sufficient for that. I really like using templates and have about 200 that I use regularly for database change requests from developers. It would be nice to find a shared library of templates somewhere.
Even better than templates though...
Custom procedures are big wasters of time for maintenance. Over 90% of procedures in my database projects are generated and maintained by 20 simple template procs which work together as a system. They use a template language for formatting and replaceable parameters. Just say "no" to codesmith, simplify and save a bundle.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply