July 30, 2020 at 2:09 am
That particular problem is because there's a head row that you're not skipping. Add the following to the options in the BULK INSERT command...
FIRSTROW = 2,
I tried the same thing an a different error showed up complaining that the data in column 330 (the certification date column) was too long. I'm still trying to figure that out but give the FIRSTROW = 2 thing a try.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 30, 2020 at 4:39 am
Ok... I got this working BUT...
The bastards lied on their record layout. The "HEALTHCARE_PROV_TAXONOMY_GROUP_xx" columns aren't 10 wide in the data... they're up to 43 characters wide. I changed the table for those 15 columns to VARCHAR(50) and it's working but the data has another surprise there. It looks like they blended both the code and the description for HEALTHCARE_PROV_TAXONOMY_GROUP's together. You folks might want to change things to separate the code from the description in your destination table.
Here's what those entries look like... the code on the left is 10 characters long... including the space after that, the description looks to be up to 40 characters (to be on the safe side.
193200000X MULTI-SPECIALTY GROUP
193400000X MULTIPLE SINGLE SPECIALTY GROUP
There's also an anomaly in the data somewhere (probably right at the start) that causes the following error. Very fortunately, we can fix it but it takes a bit of extra time because we have to use PowerShell to make a repaired copy of the file.
Once fixed, it takes 00:03:34 to load all 6,448,360 rows on my laptop. The PowerShell fix for the file copy takes about 00:03:02 to make the file copy on my laptop.
Ironically, I found that fix by total luck. The file is way too big to look at in Word, Excel, Notepad, and even NotePad++ so I went search for something that would copy just the first thousand rows and did all my import testing with that. When I switched back to the bigger file, it kept giving me the following error.
So. with all that being said, here's the BULK INSERT I used (with some comments). I didn't use 3 part naming because I don't have the databases you do. Add the database name back in if you need to.
TRUNCATE TABLE dbo.SBHN_NPI_DATA
;
BULK INSERT dbo.SBHN_NPI_DATA
FROM 'C:\Temp\NPPES_Data_Dissemination_July_2020\npidata_pfile_20050523-20200712.csv'
WITH (
CODEPAGE = 'RAW' --For a little extra speed... like not having any collation
,DATAFILETYPE = 'char' --I always include this out of habit.
--,BATCHSIZE = 50000 --This just gives you something to watch so you know it's working. I don't use it because I want Minimal Logging.
,FIRSTROW = 2 --This essentially skips the header row, which they didn't tell us about. It actually does it by counting delimniters.
,FORMAT = 'CSV' --New for 2017 and I'm liking it. It only took the 3 bloody decades to do it.
,FIELDQUOTE = '"' --Same here... new for 2017.
,FIELDTERMINATOR = ',' --CSV field delimiter
,ROWTERMINATOR = '\n' --Use to shift the control to next row... NOPE! It's just a delimiter. BULK INSERT does actually know what a row is.
,TABLOCK
)
;
I've attached the modfied CREATE TABLE as a .txt file (still amazed that an SQL Forum doesn't allow attaching an SQL file).
And here's the PowerShell code I used to fix the bad file...
powershell -command "& {get-content '%~1' -TotalCount 2000000000|set-content -path '%~dp1%~n1_FIXED%~x1'}"
The "-TotalCount 2000000000" thing is just to remind me that if I change the number on that, I can quickly create a partial copy of the file to work with like the first 1000 rows or whatever.
Let me know how it works out for you. This was fun... seriously. And, yes... Calling that fun means that I don't actually have a life. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
July 30, 2020 at 1:23 pm
Thanks. For the powershell part, do I need to modify it for my PC or what do I need to do?
July 30, 2020 at 1:36 pm
The Powershell code will work on any PC. Although it doesn't need to be, I installed it as a .bat file in the same directory as the file. That makes is so can execute it by providing the file name (and just the file name with extension) as part of the call to the .bat file or, from the file explorer in Windows, you can literally click and drag the CSV file onto the .bat file and that will work, as well.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 30, 2020 at 1:38 pm
p.s. Try it with my changes to the table and the BULK INSERT command first without doing the powershell stuff. I don't know if I just had a bit of a bad download or if it's actually necessary.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 30, 2020 at 3:14 pm
BTW... I left a bit of a test in the table creation code and you should change it. The first column should be a BIGINT and not VARCHAR(11).
--Jeff Moden
Change is inevitable... Change for the better is not.
July 30, 2020 at 3:16 pm
It still didn't work without the powershell. Is the attached what I need to do for the powershell? It doesn't seem to do anything.
July 30, 2020 at 3:45 pm
It seemed to work if I change the rowterminator to below. Does the sample look right?
,ROWTERMINATOR = '0x0a'
July 30, 2020 at 6:32 pm
It worked find for me as just \ n (without the space). The 0x0a thing will also work. For those that are reading this and don't know, that is the same as CHAR(10), which is the "LineFeed" or "NewLine" character, which \ n also represents.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 30, 2020 at 7:00 pm
Thanks again for all your help. Did the sample look ok?
August 5, 2020 at 2:59 am
Thanks again for all your help. Did the sample look ok?
A cursory glance says it looks fine. I didn't do a deep dive on it.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 5, 2020 at 9:12 am
smattiko83 wrote:Thanks again for all your help. Did the sample look ok?
A cursory glance says it looks fine. I didn't do a deep dive on it.
I'm shocked!! A CURSORy glance? From you? Surely a set-based glance? 😀
"Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
— Samuel Johnson
I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?
August 5, 2020 at 9:09 pm
Jeff Moden wrote:smattiko83 wrote:Thanks again for all your help. Did the sample look ok?
A cursory glance says it looks fine. I didn't do a deep dive on it.
I'm shocked!! A CURSORy glance? From you? Surely a set-based glance? 😀
😀
--Jeff Moden
Change is inevitable... Change for the better is not.
August 5, 2020 at 11:32 pm
try adding firstrow option to start data at row 2 - row 1 is headers so it would blow the size of the columns you defined on that table.
BULK INSERT DE_IDENTIFIED.DBO.SBHN_NPI_DATA
FROM 'C:\SQL\NPI_DATA\NPI_DATA.csv'
WITH
(
FORMAT = 'CSV',
FIELDQUOTE = '"',
FIRSTROW = 2,
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
TABLOCK
)
August 6, 2020 at 6:48 pm
try adding firstrow option to start data at row 2 - row 1 is headers so it would blow the size of the columns you defined on that table.
BULK INSERT DE_IDENTIFIED.DBO.SBHN_NPI_DATA
FROM 'C:\SQL\NPI_DATA\NPI_DATA.csv'
WITH
(
FORMAT = 'CSV',
FIELDQUOTE = '"',
FIRSTROW = 2,
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
TABLOCK
)
Agreed... I made that suggestion at the top of page 2 of this thread but was also a shedload of other issues including stray characters in the original file and mis-documented column sizes in the documentation. I got all that worked out and posted the correct target table and code to fix the stray character issue that was causing a most unusual error.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply