November 9, 2017 at 8:02 am
I have been asked to help another team load some data into a new SQL Database to help them analyze a large data set on short notice. They have provided me a delimited flat file with millions of rows. Is there a relatively easy way to analyze a flat file and then create a table with the appropriate Data Types for the flat file without spending a bunch of time manually analyzing each field the file?
I know I could just identify the date columns and the numeric values columns and then load everything else into a VARCHAR(MAX) data types. However, I was curious if there was a better way to derive the appropriate data types to use when all you have to work with is a flat file from an unfamiliar data source?
Thanks for the guidance.
Andrew J
November 9, 2017 at 8:09 am
andrew.jones 69458 - Thursday, November 9, 2017 8:02 AMI have been asked to help another team load some data into a new SQL Database to help them analyze a large data set on short notice. They have provided me a delimited flat file with millions of rows. Is there a relatively easy way to analyze a flat file and then create a table with the appropriate Data Types for the flat file without spending a bunch of time manually analyzing each field the file?I know I could just identify the date columns and the numeric values columns and then load everything else into a VARCHAR(MAX) data types. However, I was curious if there was a better way to derive the appropriate data types to use when all you have to work with is a flat file from an unfamiliar data source?
Thanks for the guidance.
Andrew J
When the files are that large, you need to get this exercise right. Otherwise you are likely to err on the side of caution and use datatypes which are larger than what you need (Varchar(200) instead of Varchar(50), BIGINT instead of INT etc etc) leading to decreased query performance and unnecessary use of disk space.
Presumably the data in the files was extracted from another database? If so, can you get the datatypes from whoever performed the extract? Then everything should just work.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 9, 2017 at 8:14 am
andrew.jones 69458 - Thursday, November 9, 2017 8:02 AMI have been asked to help another team load some data into a new SQL Database to help them analyze a large data set on short notice. They have provided me a delimited flat file with millions of rows. Is there a relatively easy way to analyze a flat file and then create a table with the appropriate Data Types for the flat file without spending a bunch of time manually analyzing each field the file?I know I could just identify the date columns and the numeric values columns and then load everything else into a VARCHAR(MAX) data types. However, I was curious if there was a better way to derive the appropriate data types to use when all you have to work with is a flat file from an unfamiliar data source?
Thanks for the guidance.
Andrew J
If I had to do such a thing, I'd start with pretty much the way you said except I'd limit it to, say, the first 10K rows (which should load quite quickly). Once you have the column widths establish, do the import with errhandling enabled to sequester rows that don't fit in your target table so you can analyze those for a more precise indication of what the width of a column should actually be.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 9, 2017 at 9:47 am
Thanks Jeff and Phil for your helpful advice. Unfortunately I don't have access to the system where this data originally came from and it would take awhile to find the right person to be able to tell me the datatype for each column. I am pretty much resigned to having to figure it out based on the data file itself. One thought I had is that I could load a sub-set of data to a table that has datatypes of VARCHAR(MAX) and then run some type of T-SQL query that would give me the maximum length of the data in each column and from there I could create a second table with the appropriately sized data types and then drop the first table. Are you aware of a T-SQL query that would provide the maximum length of data in each column or even better determine the optimal data type for each column within a table that I could use to create the "real" table?
Andrew J
November 9, 2017 at 11:22 am
andrew.jones 69458 - Thursday, November 9, 2017 9:47 AMThanks Jeff and Phil for your helpful advice. Unfortunately I don't have access to the system where this data originally came from and it would take awhile to find the right person to be able to tell me the datatype for each column. I am pretty much resigned to having to figure it out based on the data file itself. One thought I had is that I could load a sub-set of data to a table that has datatypes of VARCHAR(MAX) and then run some type of T-SQL query that would give me the maximum length of the data in each column and from there I could create a second table with the appropriately sized data types and then drop the first table. Are you aware of a T-SQL query that would provide the maximum length of data in each column or even better determine the optimal data type for each column within a table that I could use to create the "real" table?Andrew J
SELECT Col1MaxLength = MAX(LEN(Col1))
etc etc
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 9, 2017 at 12:04 pm
Use the Data Import Wizard. Set it up to read from your flat file with the correct delimiter, then on the Advanced tab, press the 'Suggest Types' button. This will allow you to tell it how many rows to consider when suggesting types. For a file that large, you can tell it 200k rows if you want. From there, you can either use the wizard to do the actual import, review the columns individually in the advanced tab and see what the wizard came up with for suggestions, or continue through with the wizard to the destination setup and then have it spit out the table DDL for you.
November 9, 2017 at 12:12 pm
Thanks John. I will give that a try.
Andrew J
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply