November 22, 2002 at 10:35 am
RE: Import and transformations.
My database is fairly lightweight. I receive 2 monthly files @20MB and about 12 weekly files @350MB only. The monthly files are appended to existing tables, the weekly files overwrite the previous data except at the end of the year when a history is kept. All told it's gonna grow by under a GIG a year. There is no other data entry into this system. It's purely for reporting, data analysis and history.
That said, my instinct tells me that keeping the import/transformation atomic (all or none) should be fine as long as there is notification if (when) a failure occurs. I'm also proceding with the assumption that the importing of data should be triggered from within SQL Server instead of from a separate app/script etc.
please feel free to shoot me down or commend my intuition...
Greg.
November 22, 2002 at 10:39 am
The format info that I use is in the conversions. I store it in a table called ADMIN_Jobs which relates to another called Admin_JobsFields. I BCP the data into a single field, build the string to do conversions from my tables, and execute that against the temp table. I decided to not use the format files, as a lot of my data has multiple types of row definitions within the same file. I have one file in particular that breaks out into 9 different row types, and gets ultimately moved to 6 different tables. Format files would not work for situations like these. I also ran into issues with file changes using format files. Kept having to go and modify them. With them in tables, it's much easier to deal with on a field basis, rather than modifying files, and it kept the whole process self contained. We started out with simple files, and then the boss came down and said, we need to allow for header records and trailer records with different formats. Just gotta LOVE scope-creep.
November 22, 2002 at 1:09 pm
'one file in particular that breaks out into 9 different row types'...Now THAT sounds ugly!
Some questions/comments:
Was there a compelling reason to choose bcp over bulk insert? As I'm using them I get the feeling they're quite similar in action and performance.
I really like the idea of getting rid of Format files. Always seems messy having a db process rely on some little text file stored somewhere else...I'll have to make the switch.
And, re: Scope Creep...I suppose it's not a real database project until the scope creepeth!
November 22, 2002 at 10:47 pm
LMAO....Bulk Insert IS BCP functionality internal to SQL, same format files, same processes. I couldn't swear to it but I believe Bulk Insert is a wrapper for BCP to allow it to work internal to SQL rather than it needing its own thread externally. I could have just as easily went with Bulk Insert, but in my mind, creating a thread external to the server to call the BCP functionality was more consistent for working with the import files. If you notice, in my Import script, I scan for a file (external call), Import the file (external call), rename and archive the file (external file) and quit. I couldn't see placing an internal call function in the middle, as at one time, I had decided to split it out of the database completely. The ability to keep everything in one scheduling area (SQL's) kept it internal and in one place. The conversion processes pick it up from there (again scheduled with conditions), and everything from that point is internal. I like a high degree of modularity, as changes are frequent and common, and the ability to modify any part of the functionality without it impacting downline functions is quite neccessary in my design.
If yours only deals with limited files and this is controlled by you (IE. they can't change it up on you.) you may be better off without the control tables. My system had so many requirements that the data must go through, and so many files to deal with, that it made the cost of doing it this way way cheaper. One import process to deal with 58 files or varying nature. I wouldn't want to have to manage 58 seperate jobs and the hundred plus conversions and breakouts of different rowtypes that had to happen. Or 58 DTS packages, with the multiple variations to handle rowtypes, etc... But a single job for importing, hey, no problem... File changes, correct the proper field definitions in the table and off you go, no fuss, no muss, and I can have it all done by scripts on a schedule. If I know ahead of time of a change, create the changes in script and schedule it to happen that day, and forget it.
If you only have a couple files, with single rowtypes, you would probably be better off with the bulk insert method, and can go with a much less complicated design.
November 25, 2002 at 1:33 am
Hello
If you don't mind sending me the code, I'd like to take a look at your import processing. I have tried different aproaches. At the moment I'm using openquery (through ODBC) and rely on schema.ini. It requres a lot of tweaking to make it work and error messages usually don't tell much. I also tried DTS, but there were other portability problems with data driven query (Bound table stores database name and there is no way to alter it dynamically).
Best Regards, Jani
November 25, 2002 at 3:18 am
Scorpion 66.
I'd like a copy of your code too if you don't mind. We're migrating 1.2TB from DB2 to SQL Server. We have 48 hours to do the initial dataload and we're using bcp. Any ideas on automation would be very useful.
Thanks
Andy
November 25, 2002 at 3:18 am
Scorpion 66.
I'd like a copy of your code too if you don't mind. We're migrating 1.2TB from DB2 to SQL Server. We have 48 hours to do the initial dataload and we're using bcp. Any ideas on automation would be very useful.
Thanks
Andy
December 3, 2002 at 8:30 pm
This is a common task that many people face and it's too bad there's not better support for it from Mi¢ro$oft. What we typically do is this: create one "receiving" table that has a text column for each field in the input record. You can import anything into a text column. Then we "edit" each record and mark it "invalid" if it's not the write data type, is missing, etc. Generally, we make heavy use of user-defined functions to validate each column. Rows that pass all inspection are "promoted" to an import table where each column has the appropriate data type (INT, decimal, varchar, etc.) From there, the validated records are applied to the data base. It's a fairly labor intensive process to set up but runs well once it is.
Use use BULKINSERT over BCP. The only caveat is you must be sa to use BI.
December 3, 2002 at 10:11 pm
Thanks don1941,
I did end up using Bulk Insert for my data needs. I have delimited files which are quite clean. I can import them directly into a temp table with all of the field types defined. I also have fixed width files which I decided to import into single column temp tables, then use SUBSTRING to chop them up into their correct pieces. I still have a SQL 7 database so I don't think I can use user functions. (They're 2000 right?)
The data load isn't so quick now that the db is up to 8GB...I am dropping all indexes except the clustered one during the import, using code I found here on SQLServerCentral. As this is historical data I think I can have my final insert...select do an order by according to the clustered index. Maybe that will speed things up.
I did figure out how to send a UNC or directory path as an argument to my SP and have it get all files that it hasn't yet imported. Fun!
August 22, 2003 at 11:07 am
Hello,
Scorpion_66 or anyone else who has the scripts, can I take a look at your scripts? I need to:
1. Periodocally Import data from a csv file into a temp table.
2. Clean the data (which mainly would involve identifying values not
present in a lookup table and cleaning Postal Code info.)
3. Report any errors found so these can be fixed
4. Append the clean data to the destination table.
and what has been discussed sounds like something that I can use as a guideline.
August 22, 2003 at 1:36 pm
Hi everyone,
Me too, I would like to view the scripts. I too have a similiar situation where I am connecting to a secure FTP and pulling down a CSV file and I need to import the data into tables in my SQL dB. Problem is is the files are inconsistant. Here is an example of BCP format files I've tried for BCP.
Not pretty.
1 SQLCHAR 0 1 "" 0 "" ""
2 SQLCHAR 0 26 "\"," 1 ID_CALL_EVNT SQL_Latin1_General_CP1_CI_AS
I can't get this to stop wrapping, I hope you can read it...
I have the same issue I have convesred about in another thread regarding a binary file that I need to read off an Oracle server that will tell me (essentially) which records need to be loaded. I am unsure however, how to (get SQL to) read the file. Any help on that would be greatly appreciated. In the meantime I would love to take a look at the scripts.
Thanks
Aurora
October 6, 2003 at 12:32 pm
Can anyone provide me with the scripts mentioned in this topic. Thanks.
October 14, 2003 at 1:44 am
Hi Experts,
It will be a great help if i also get the scripts mentioned in this topic. Following is my requirement -
CSV file to Sqlserver 2000 sp3a Loading
I want to run DTS thru Sql server agent with cvs file name input. Multiple input files of same structure are coming from different locations.I am planning to input file name in same DTS to load one particular location file. It will be great if anybody can suggest alternative ways of doing the same.
Thanks,
Sheilesh
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply