Create Table with appropriate Data Types based on flat file

  • I have been asked to help another team load some data into a new SQL Database to help them analyze a large data set on short notice. They have provided me a delimited flat file with millions of rows. Is there a relatively easy way to analyze a flat file and then create a table with the appropriate Data Types for the flat file without spending a bunch of time manually analyzing each field the file?

    I know I could just identify the date columns and the numeric values columns and then load everything else into a VARCHAR(MAX) data types. However, I was curious if there was a better way to derive the appropriate data types to use when all you have to work with is a flat file from an unfamiliar data source?

    Thanks for the guidance.

    Andrew J

  • andrew.jones 69458 - Thursday, November 9, 2017 8:02 AM

    I have been asked to help another team load some data into a new SQL Database to help them analyze a large data set on short notice. They have provided me a delimited flat file with millions of rows. Is there a relatively easy way to analyze a flat file and then create a table with the appropriate Data Types for the flat file without spending a bunch of time manually analyzing each field the file?

    I know I could just identify the date columns and the numeric values columns and then load everything else into a VARCHAR(MAX) data types. However, I was curious if there was a better way to derive the appropriate data types to use when all you have to work with is a flat file from an unfamiliar data source?

    Thanks for the guidance.

    Andrew J

    When the files are that large, you need to get this exercise right. Otherwise you are likely to err on the side of caution and use datatypes which are larger than what you need (Varchar(200) instead of Varchar(50), BIGINT instead of INT etc etc) leading to decreased query performance and unnecessary use of disk space.

    Presumably the data in the files was extracted from another database? If so, can you get the datatypes from whoever performed the extract? Then everything should just work.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • andrew.jones 69458 - Thursday, November 9, 2017 8:02 AM

    I have been asked to help another team load some data into a new SQL Database to help them analyze a large data set on short notice. They have provided me a delimited flat file with millions of rows. Is there a relatively easy way to analyze a flat file and then create a table with the appropriate Data Types for the flat file without spending a bunch of time manually analyzing each field the file?

    I know I could just identify the date columns and the numeric values columns and then load everything else into a VARCHAR(MAX) data types. However, I was curious if there was a better way to derive the appropriate data types to use when all you have to work with is a flat file from an unfamiliar data source?

    Thanks for the guidance.

    Andrew J

    If I had to do such a thing, I'd start with pretty much the way you said except I'd limit it to, say, the first 10K rows (which should load quite quickly).  Once you have the column widths establish, do the import with errhandling enabled to sequester rows that don't fit in your target table so you can analyze those for a more precise indication of what the width of a column should actually be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff and Phil for your helpful advice. Unfortunately I don't have access to the system where this data originally came from and it would take awhile to find the right person to be able to tell me the datatype for each column. I am pretty much resigned to having to figure it out based on the data file itself. One thought I had is that I could load a sub-set of data to a table that has datatypes of VARCHAR(MAX) and then run some type of T-SQL query that would give me the maximum length of the data in each column and from there I could create a second table with the appropriately sized data types and then drop the first table. Are you aware of a T-SQL query that would provide the maximum length of data in each column or even better determine the optimal data type for each column within a table that I could use to create the "real" table?

    Andrew J

  • andrew.jones 69458 - Thursday, November 9, 2017 9:47 AM

    Thanks Jeff and Phil for your helpful advice. Unfortunately I don't have access to the system where this data originally came from and it would take awhile to find the right person to be able to tell me the datatype for each column. I am pretty much resigned to having to figure it out based on the data file itself. One thought I had is that I could load a sub-set of data to a table that has datatypes of VARCHAR(MAX) and then run some type of T-SQL query that would give me the maximum length of the data in each column and from there I could create a second table with the appropriately sized data types and then drop the first table. Are you aware of a T-SQL query that would provide the maximum length of data in each column or even better determine the optimal data type for each column within a table that I could use to create the "real" table?

    Andrew J

    SELECT Col1MaxLength = MAX(LEN(Col1))
    etc etc

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Use the Data Import Wizard.  Set it up to read from your flat file with the correct delimiter, then on the Advanced tab, press the 'Suggest Types' button.  This will allow you to tell it how many rows to consider when suggesting types.  For a file that large, you can tell it 200k rows if you want.  From there, you can either use the wizard to do the actual import, review the columns individually in the advanced tab and see what the wizard came up with for suggestions, or continue through with the wizard to the destination setup and then have it spit out the table DDL for you. 

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks John. I will give that a try.

    Andrew J

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply