August 28, 2018 at 6:11 pm
Can someone help me with loading the attached csv file database.
The issue here is the CSV file has data with Double quotes and comma seperated values but there are some instances where one of the column has Doule quotes in the data also.(Show with Bold)
---FYI.. This is a single record in my CSV File....
"33043000030000","24","MAGNOLIA PETROLEUM CO.","NORTH DAKOTA STATE A 1","NORTH DAKOTA STATE A","1","MAGNOLIA PETROLEUM CO.","NORTH DAKOTA STATE "A" #1","10/21/1950","5609","KIDDER","141 N","73 W","36","SENE","1980 FNL 810 FEL","WILDCAT","","","","VERTICAL","46.990158000000001","-99.864013","OG","DRY","","12/24/1950"
I read online and people are suggesting me to write VB code.
Can somone help me on this.
Thank you.
August 28, 2018 at 7:28 pm
kashyap4007 - Tuesday, August 28, 2018 6:11 PMCan someone help me with loading the attached csv file database.
The issue here is the CSV file has data with Double quotes and comma seperated values but there are some instances where one of the column has Doule quotes in the data also.(Show with Bold)---FYI.. This is a single record in my CSV File....
"33043000030000","24","MAGNOLIA PETROLEUM CO.","NORTH DAKOTA STATE A 1","NORTH DAKOTA STATE A","1","MAGNOLIA PETROLEUM CO.","NORTH DAKOTA STATE "A" #1","10/21/1950","5609","KIDDER","141 N","73 W","36","SENE","1980 FNL 810 FEL","WILDCAT","","","","VERTICAL","46.990158000000001","-99.864013","OG","DRY","","12/24/1950"I read online and people are suggesting me to write VB code.
Can somone help me on this.
Thank you.
Whoever provided the data didn't follow the rules for CSV and double up on the quotes that were embedded in a field. And people suggesting the VB route apparently don't actually have a clue either because VB will split the line as if it had 2 extra fields, as well.
Are all the lines screwed up like this? If they're consistent, then no need for VB or anything else crazy. But ALL the lines need to have the same very consistent mistake and then we can salvage the data pretty easily.
So is the mistake consistent on every line or not?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2018 at 7:52 pm
Thank you Guru for the quick reply.
I am attaching the sample data here.
To answer your question, this is not consistent.
August 28, 2018 at 8:22 pm
You've still not attached any data but... I'm not sure that you should. Is there any private or sensitive information in the file? And "guru" is just the level of my participation, not my name or handle. My name is Jeff Moden... Look above people's avatars for their names.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2018 at 8:37 pm
Jeff,
This is public data which i got from website so I think i can share.
Below is 2 more records.
3.31E+13 | 41 | AMERADA HESS CORPORATION | KERMIT HALVERSON 1 | KERMIT HALVERSON | 1 | AMERADA PETROLEUM CORP. | KERMIT HALVERSON #1 | 11/30/1951 | 9705 | MCKENZIE | 152 N | 96 W | 26 | NESE | 1830 FSL 660 FEL | CLEAR CREEK | VERTICAL | 47.953663 | -102.921996 | OG | DRY | 4/22/1952 | ||||
3.31E+13 | 42 | AMERADA HESS CORPORATION | BEAVER LODGE-MADISON UNIT V-26 | BEAVER LODGE-MADISON UNIT | V-26 | AMERADA PETROLEUM CORP. | N. DAKOTA STATE A" #1" | 12/11/1951 | 8595 | WILLIAMS | 156 N | 95 W | 16 | NESE | 1980 FSL 660 FEL | BEAVER LODGE | MADISON | 911269|459128|0 | 02/11/1952|521|18|641 | VERTICAL | 48.332938 | -102.895553 | OG | PA | 8/3/1984 |
August 28, 2018 at 8:44 pm
The file has been built incorrectly.
No parsing tool can fix it.
Must be corrected at the source.
"NORTH DAKOTA STATE A"",""1",
"NORTH DAKOTA STATE ""A"" #1",
and so on.
_____________
Code for TallyGenerator
August 29, 2018 at 6:43 am
kashyap4007 - Tuesday, August 28, 2018 8:37 PMJeff,
This is public data which i got from website so I think i can share.
Below is 2 more records.
3.31E+13 41 AMERADA HESS CORPORATION KERMIT HALVERSON 1 KERMIT HALVERSON 1 AMERADA PETROLEUM CORP. KERMIT HALVERSON #1 11/30/1951 9705 MCKENZIE 152 N 96 W 26 NESE 1830 FSL 660 FEL CLEAR CREEK VERTICAL 47.953663 -102.921996 OG DRY 4/22/1952 3.31E+13 42 AMERADA HESS CORPORATION BEAVER LODGE-MADISON UNIT V-26 BEAVER LODGE-MADISON UNIT V-26 AMERADA PETROLEUM CORP. N. DAKOTA STATE A" #1" 12/11/1951 8595 WILLIAMS 156 N 95 W 16 NESE 1980 FSL 660 FEL BEAVER LODGE MADISON 911269|459128|0 02/11/1952|521|18|641 VERTICAL 48.332938 -102.895553 OG PA 8/3/1984
You seem to have left the delimiters out.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 29, 2018 at 7:00 am
kashyap4007 - Tuesday, August 28, 2018 8:37 PMJeff,
This is public data which i got from website so I think i can share.
Below is 2 more records.
3.31E+13 41 AMERADA HESS CORPORATION KERMIT HALVERSON 1 KERMIT HALVERSON 1 AMERADA PETROLEUM CORP. KERMIT HALVERSON #1 11/30/1951 9705 MCKENZIE 152 N 96 W 26 NESE 1830 FSL 660 FEL CLEAR CREEK VERTICAL 47.953663 -102.921996 OG DRY 4/22/1952 3.31E+13 42 AMERADA HESS CORPORATION BEAVER LODGE-MADISON UNIT V-26 BEAVER LODGE-MADISON UNIT V-26 AMERADA PETROLEUM CORP. N. DAKOTA STATE A" #1" 12/11/1951 8595 WILLIAMS 156 N 95 W 16 NESE 1980 FSL 660 FEL BEAVER LODGE MADISON 911269|459128|0 02/11/1952|521|18|641 VERTICAL 48.332938 -102.895553 OG PA 8/3/1984
Whoever was saving the data in the CSV file made a rookie mistake of doing + '"' + strValue + '"'
instead of + QUOTENAME(strValue, '"')
or, if strValue can be possibly longer than 128 characters,+ '"' + REPLACE(strValue, '"', '""') + '"'
Put these 2 rows into Excel and save the file as CSV.
You'll see how it should be done correctly.
_____________
Code for TallyGenerator
August 29, 2018 at 7:17 am
Sergiy - Tuesday, August 28, 2018 8:44 PMThe file has been built incorrectly.
No parsing tool can fix it.
Must be corrected at the source."NORTH DAKOTA STATE A"",""1",
"NORTH DAKOTA STATE ""A"" #1",
and so on.
If the data is inconsistent in it's mistake, then I absolutely agree. This is a rookie mistake on the part of whomever assembles the data and it's easily fixed at the source.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 29, 2018 at 11:45 am
Is this a one off, or will this be regularly scheduled load?
Try specifying just the comma delimiter in SSIS, load into a staging table and write a stored procedure to clean it up and load to the target table.
August 29, 2018 at 11:53 am
Joe Torre - Wednesday, August 29, 2018 11:45 AMIs this a one off, or will this be regularly scheduled load?
Try specifying just the comma delimiter in SSIS, load into a staging table and write a stored procedure to clean it up and load to the target table.
Better yet, do a BULK INSERT and sequester the bad rows as they occur and fix them separately. There are "switches" in BULK INSERT that will allow that. I imagine that SSIS can also do similar. That would make it so only the "problem" rows would need any form of rework.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 29, 2018 at 12:58 pm
Thank you all for the help.
This is how I was able to resolve the issue.
I did set the TextQualified to False on the columns where the issue is happening and used a Derived Column Transformation to remove the surrounding """" data.
August 29, 2018 at 1:43 pm
kashyap4007 - Wednesday, August 29, 2018 12:58 PMThank you all for the help.
This is how I was able to resolve the issue.
I did set the TextQualified to False on the columns where the issue is happening and used a Derived Column Transformation to remove the surrounding """" data.
The original bad row that you posted had a comma embedded in a "field", did it not? If so, this doesn't fix that.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 29, 2018 at 1:59 pm
There is comma also in few fiends Jeff but it did not cause me any issues.
Below are some wierd data in some of the columns:
NORTH DAKOTA STATE A" #1"
DEVONIAN,MADISON,SILURIAN
324488|33755|0,180041|70622|0,81960|76|0
August 29, 2018 at 2:15 pm
kashyap4007 - Wednesday, August 29, 2018 1:59 PMThere is comma also in few fiends Jeff but it did not cause me any issues.
Below are some wierd data in some of the columns:
NORTH DAKOTA STATE A" #1"
DEVONIAN,MADISON,SILURIAN
324488|33755|0,180041|70622|0,81960|76|0
I guess I don't understand how those commas didn't cause any issues. I thought the data was comma delimited. Considering your recent posts, is the data actually tab delimited?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 26 total)
You must be logged in to reply to this topic. Login to reply