May 10, 2018 at 8:45 am
Hi All. First time posting here and need some guidance.
I have a database that I am managing that was originally converted from an MS Access Data. When the database was migrated to MS SQL Server there was little or no rationalization of the data types used in the SQL tables. About 90+% of the columns are all set to nvarchar(255). The data source is a flat file that contains data from several sources. So going to a source system to identify the data types is not really an option. We have one table with 80 columns and over 3 million records. There are several others with more than 100K records and similar number of columns.
I want to perform a cleanup of this database and optimize the table structure and column sizing. Analyzing each table one column at a time would be major undertaking. I wanted to know if there T-SQL scripts or free tools I can use to analyze the existing data contained in an existing table and identify or suggest the best data types and sizes? I know that SSIS has the ability to analyze a data source and recommend a data types, but I am not aware of any way to do this on within an existing database. I was thinking of extracting a sub-set of data and run it through SSIS just to get an initial set of data type recommendations that I could then review and refine. However, this would be a fairly significant undertaking as well. I am thinking there must be a way to do this analysis using T-SQL directly, but am not a T-SQL Guru like many on this forum and website. Therefore, I came here looking for some guidance.
Any knowledgeable suggestions would be appreciated.
Andrew J
May 11, 2018 at 12:39 pm
For a first pass, you could try just ISDATE, IS NULL, ISNUMERIC and a few other key attributes:
CREATE TABLE dbo.column_profile ( ... );
INSERT INTO dbo.column_profile ( ... )
SELECT /* replace "column1" with the actual column name */
SUM(column1) AS column1_count,
MAX(LEN(column1)) AS column1_MAXLEN,
MIN(LEN(column1)) AS column1_MINLEN,
SUM(CASE WHEN column1 IS NULL THEN 1 ELSE 0 END) AS column1_NULL_count,
SUM(CASE WHEN ISDATE(column1) = 1 THEN 1 ELSE 0 END) AS column1_ISDATE_count,
SUM(CASE WHEN ISNUMERIC(column1) = 1 THEN 1 ELSE 0 END) AS column1_ISNUMERIC_Count
--,...column2...as above...
--,...column3...
FROM dbo.table_name
Those columns that were all NULL or ISDATE = 1 could be converted to a date.
Those columns that were all NULL or ISNUMERIC = 1 could be further analyzed -- TRY_CAST to int / bigint / decimal, for example -- or you just try converting them to int. Presumably most would work. The rest you could analyze further.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 11, 2018 at 1:23 pm
Thanks Scott. I will give that a try. I also have a query that for remaining text columns will find the longest record. That could help cut down on the size of the nvarchar(255) columns as well. Between the two I should be able to better optimize these tables.
Thanks again.
Andrew J
May 11, 2018 at 1:31 pm
Back in the day, I would do a pattern search to determine eligible data types:
SUM(CASE WHEN column NOT LIKE '%[^0-9]%' THEN 1 ELSE 0 END) AS column_has_only_digits,
etc.
but I think the TRY_CAST presumably would be more efficient.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply