July 2, 2012 at 2:23 am
Hi All
I have a CSV file which needs importing into SQL 2008, a row has 28 columns, text has quotes around and sometimes commas within the text, when importing the file sql imports the data up to column 8 where column 8 has the rest of the data in it ( doesn't seem to split on the columns.
This is the same problem I had it sql 2005, the only way I got around the problem was to script code that goes through row by row (RBAR) looking for a pattern matching and substituting the comma's within text for a |, when working with a file that over a million rows this is very time consuming, (sometimes SSIS crashes when the file size is large).
I have tried all the usual routes but to no avail
Excel 2008 has no problem importing the CSV file.
Any solutions would be much appreciated.:-D
July 2, 2012 at 6:18 am
Check this link: http://msdn.microsoft.com/en-us/library/ms174393.aspx for info about QUOTED_IDENTIFIER - I've not tried what you're describing before but I think that by setting it ON you can avoid the quotes problem. Not sure how you'd get around the comma issue though, maybe change your delimiter to | or TAB or something else?
---
Note to developers:Want to get the best help? Click here https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help (Jeff Moden)
My blog: http://uksqldba.blogspot.com
Visit http://www.DerekColley.co.uk to find out more about me.
July 3, 2012 at 3:58 am
If you're using SSIS to import the data via a flat file connection manager, then there is a setting in it's properties that may help. On the "general" section - under Format - change the "text qaulifier" field to a quote (e.g. change it to " instead of <none>).
This will force SSIS to ignore commas in each quote delimited column, or rather treat them as part of the text.
July 3, 2012 at 4:07 am
Right click the connection manager and select Edit and play with the settings. You can add more columns using the Advanced tab.
Tip: you don't have to stick to the delimiters in the dropdown, you can paste your own over it. We now use ยง as users kept mistyping | when they hit shift!
July 3, 2012 at 4:40 am
one thing i forgot to mentation is that rows in the data don't have the same number of delimiters(,), one row might have 20 delimiters and another row might have 4 delimiters, i don't this is helping the situation. Excel doesn't have problem dealing with the data.:-)
July 3, 2012 at 6:01 am
clucasi (7/3/2012)
one thing i forgot to mentation is that rows in the data don't have the same number of delimiters(,), one row might have 20 delimiters and another row might have 4 delimiters, i don't this is helping the situation. Excel doesn't have problem dealing with the data.:-)
When the rows are short, are they in the same positions as they are when the rows are long?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 4, 2012 at 2:13 am
I have the tables below, I have not included scripts to insert the data as that is what the problem is, the data is provided in one text file and I have to use a SSIS conditional split to import it into the correct tables( prefix 11 imports into STREET_RECORD_TYPE11
prefix 15 imports into STREET_DESCRIPTOR_RECORD-TYPE15
prefix 21 imports into BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21
prefix 24 imports into LAND_AND_PROPERTY_IDENTIFIER_RECORD24 )
Hope tis is enough information:-)
11,"I",5,33449328,2,4230,2,2009-11-06,1,4,0,2009-11-06,,2009-11-06,2009-11-06,379982.00,401724.00,379994.00,401728.00,1
15,"I",74784,33440849,"CLEAVLEY STREET PSSGE GBLE 9/10 WORSLEY ROAD PSSGE R/O 173-173D","","ECCLES","SALFORD","ENG"
24,"I",285165,10070915241,"4205L000144961","ENG",1,2008-04-28,,2008-11-11,2008-04-28,,"",,"","",6,"",,"","",3400512,"1","","GROUND, FIRST AND SECOND FLOOR",""
21,"I",285151,10070745433,1,2,2009-01-26,10004690408,382013,399704,1,4230,2009-01-26,,2009-01-26,2009-01-26,"C","M6 6BY",0
/****** Object: Table [dbo].[LAND_AND_PROPERTY_IDENTIFIER_RECORD24] Script Date: 07/04/2012 08:43:54 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[LAND_AND_PROPERTY_IDENTIFIER_RECORD24](
[RECORD_IDENTIFIER] [int] NULL,
[CHANGE_TYPE] [nvarchar](1) NULL,
[PRO_ORDER] [bigint] NULL,
[UPRN] [bigint] NULL,
[LPI_KEY] [nvarchar](14) NOT NULL,
[LANGUAGE] [nvarchar](3) NULL,
[LOGICAL_STATUS] [smallint] NULL,
[START_DATE] [smalldatetime] NULL,
[END_DATE] [smalldatetime] NULL,
[ENTRY_DATE] [smalldatetime] NULL,
[LAST_UPDATE_DATE] [smalldatetime] NULL,
[SAO_START_NUMBER] [smallint] NULL,
[SAO_START_SUFFIX] [nvarchar](2) NULL,
[SAO_END_NUMBER] [smallint] NULL,
[SAO_END_SUFFIX] [nvarchar](2) NULL,
[SAO_TEXT] [nvarchar](90) NULL,
[PAO_START_NUMBER] [smallint] NULL,
[PAO_START_SUFFIX] [nvarchar](2) NULL,
[PAO_END_NUMBER] [smallint] NULL,
[PAO_END_SUFFIX] [nvarchar](2) NULL,
[PAO_TEXT] [nvarchar](90) NULL,
[USRN] [int] NULL,
[LEVEL] [nvarchar](30) NULL,
[POSTALLY_ADDRESSABLE] [nvarchar](1) NULL,
[POSTCODE] [nvarchar](8) NULL,
[POST_TOWN] [nvarchar](30) NULL,
[OFFICIAL_FLAG] [nvarchar](1) NULL,
[CUSTODIAN_ONE] [smallint] NULL,
[CUSTODIAN_TWO] [smallint] NULL,
[CAN_KEY] [nvarchar](14) NULL,
CONSTRAINT [PK_LAND_AND_PROPERTY_IDENTIFIER_RECORD24] PRIMARY KEY CLUSTERED
(
[LPI_KEY] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21] Script Date: 07/04/2012 08:44:35 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21](
[RECORD_IDENTIFIER] [int] NULL,
[CHANGE_TYPE] [nvarchar](1) NULL,
[PRO_ORDER] [bigint] NULL,
[UPRN] [bigint] NOT NULL,
[LOGICAL_STATUS] [smallint] NULL,
[BLPU_STATE] [smallint] NULL,
[BLPU_STATE_DATE] [smalldatetime] NULL,
[BLPU_CLASS] [nvarchar](4) NULL,
[PARENT_UPRN] [bigint] NULL,
[X_COORDINATE] [decimal](18, 2) NULL,
[Y_COORDINATE] [decimal](18, 2) NULL,
[RPA] [smallint] NULL,
[LOCAL_CUSDODIAN_CODE] [smallint] NULL,
[START_DATE] [smalldatetime] NULL,
[END_DATE] [smalldatetime] NULL,
[LAST_UPDATE_DATE] [smalldatetime] NULL,
[ENTRY_DATE] [smalldatetime] NULL,
[ORGANISATION] [nvarchar](100) NULL,
[WARD_CODE] [nvarchar](10) NULL,
[PARISH_CODE] [nvarchar](10) NULL,
[CUSTODIAN_ONE] [smallint] NULL,
[CUSTODIAN_TWO] [smallint] NULL,
[CAN_KEY] [nvarchar](14) NULL,
[new] [geometry] NULL,
CONSTRAINT [PK_BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21] PRIMARY KEY CLUSTERED
(
[UPRN] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21] ADD CONSTRAINT [DF_BASIC_LAND_AND_PROPERTY_UNIT_RECORD_TYPE21_test_RECORD_IDENTIFIER] DEFAULT ((0)) FOR [RECORD_IDENTIFIER]
GO
/****** Object: Table [dbo].[STREET_RECORD_TYPE11] Script Date: 07/04/2012 08:45:09 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[STREET_RECORD_TYPE11](
[RECORD_IDENTIFIER] [int] NULL,
[CHANGE_TYPE] [nvarchar](1) NULL,
[PRO_ORDER] [bigint] NULL,
[USRN] [int] NOT NULL,
[RECORD_TYPE] [smallint] NULL,
[SWA_ORG_REF_NAMING] [smallint] NULL,
[STATE] [smallint] NULL,
[STATE_DATE] [smalldatetime] NULL,
[STREET_SURFACE] [smallint] NULL,
[STREET_CLASSISFICATION] [smallint] NULL,
[VERSION] [smallint] NULL,
[RECORD_ENTRY_DATE] [smalldatetime] NULL,
[LAST_UPDATE_DATE] [smalldatetime] NULL,
[STREET_START_DATE] [smalldatetime] NULL,
[STREET_END_DATE] [smalldatetime] NULL,
[STREET_START_X] [decimal](18, 2) NULL,
[STREET_START_Y] [decimal](18, 2) NULL,
[STREET_END_X] [decimal](18, 2) NULL,
[STREET_END_Y] [decimal](18, 2) NULL,
[STREET_TOLERANCE] [smallint] NULL,
CONSTRAINT [PK_STREET_RECORD-TYPE11] PRIMARY KEY CLUSTERED
(
[USRN] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[STREET_DESCRIPTOR_RECORD_TYPE15] Script Date: 07/04/2012 08:45:39 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[STREET_DESCRIPTOR_RECORD_TYPE15](
[RECORD_IDENTIFIER] [smallint] NULL,
[CHANGE_TYPE] [nvarchar](1) NULL,
[PRO_ORDER] [bigint] NULL,
[USRN] [int] NOT NULL,
[STREET_DESCRIPTOR] [nvarchar](100) NULL,
[LOCALITY_NAME] [nvarchar](35) NULL,
[TOWN_NAME] [nvarchar](30) NULL,
[ADMINISTRATIVE_AREA] [nvarchar](30) NULL,
[LANGUAGE] [nvarchar](3) NULL,
CONSTRAINT [PK_STREET_DESCRIPTOR_RECORD-TYPE15] PRIMARY KEY CLUSTERED
(
[USRN] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
July 4, 2012 at 4:18 am
Divide and conquer by extracting each set of records to its own file. Then import each file into the relevant table.
You can use a utility to extract each set of data out from the combined file. Shouldn't be too hard since your table type identifier is at the start of each line - e.g. from command line:
findstr /B "11" Input_Data.csv > Input_Data_11.csv
findstr /B "15" Input_Data.csv > Input_Data_15.csv
...
Would that work for you or do you have to use SSIS to split the records for some reason?
(Note: findstr does not support unicode, so if your text file is unicode then you'll need to use another util that does support it.)
July 4, 2012 at 4:42 am
thanks for your reply the only problem with that is in column3 is the process order, they have to be done in that order.
July 5, 2012 at 5:11 am
Hi clucasi
I do not know if you resolved your issue; your scenario can be handled by a flat file data source and one script component; It is written with BIDS 2008 using C#.
Your flat files are multi ragged records; your sample has four different record's types and each one with a different columns count.
This is the solution explanation:
DATA FLOW
It also shows the Row Count component with a data viewer just to test-validate the output.
IMPLEMENTATION
1. Select the flat file format as Ragged right as shown in the figure below
2. Now click at the Columns node and select the Record Length; I used 300 characters, so you should scroll the screen to the right to select that length.
3. As a verification click on the advance node and look at the Column 0 length; it should show 300 bytes.
NOTE It is up to you to select the record length, I am using 300 bytes because I am not familiar with your data ๐
4. Click on its Input Columns node and select the Column 0 assigning it the alias Column_SrcRec
NOTE The alias name Column_SrcRec is used later in the script (attached), so be sure you type the name as it appears in the image below:
5. Step 5
Now, click on the Script node, then on the Edit Script button; download the attached code ScriptCode.txt copy everything in this file and replace everything in the script code.
You can add a Row Count component with a data viewer, so you can monitor the results, anyway the next image shows it.
FINAL NOTES
I did not use your columns names, you can fine-tune this solution by using more meaningful columns names, if you do, be aware you will have to change the script as well.
You will notice by the last picture, that the columns appeared backwards, I mean, Column 1 with the first column of information in the flat file is the very last on the pipe, it should not be a problem.
Let us know if this solution works for you.
Cheers,
Hope this helps,
Rock from VbCity
July 5, 2012 at 9:43 am
Thanks for your solution, iam am getting the following error
:w00t:
July 5, 2012 at 10:11 am
It seems there is not script code in your script component double click on it and:
1. make sure you selected C# code, if you did not, or if you are using VB.Net, then the code attached to my previous reply will not work.
2. Click on the Edit Script button, then download the ScriptCode.txt file attached to my previous reply, copy its content, and replace everything inside the script code with the code from the ScriptCode.txt, in other words, copy and paste.
3. Click on the OK button, that should be it.
Would it be possible to confirm if you are using BIDS 2008, or BIDS 2008 R2?
Cheers,
Hope this helps,
Rock from VbCity
July 5, 2012 at 10:22 am
Cheers that seem to solve that problem,
now for the next error:-)
July 6, 2012 at 1:04 am
Its worked:-):hehe:
I found the error, the text file had a blank row at the end of the text file.
thanks
July 6, 2012 at 4:14 am
Just a quick question will the solution you supplied sort the problem of comma's inside text eg '21,2000,"5 Bomley,way",xxxxx
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply