December 22, 2016 at 1:50 am
I have a few million records in a csv file.
This has only one field which is a driver md5 ( eg. 0x00001DE0033358C743199217832D895C0 )
I need to look up this in a table which has billions of records.
Currently this is what I do:
1. split this into batches of 10000
2. BULK INSERT in a temp table
3. Join that temp table with lookup table and get the result
Is there any way to make it faster ?
The columns in both tables are in binary format.
December 22, 2016 at 7:21 am
What you are doing seems to be the right way. Can you elaborate on the requirement, why you are doing this.
December 26, 2016 at 11:35 pm
Thanks.
This was to lookup for some information only. I got it done using a stored procedure and TVP. It's working fine. Thanks for your time in replying.
December 27, 2016 at 7:25 am
Why are you doing this in batches? It's usually faster to do a single load than multiple ones.
Are the columns indexed?
Are you really trying to return a million rows result set? What for?
December 27, 2016 at 7:38 am
I would import the records from the .csv file into a user table (not a temp table), and then create a primary key on DriverMD5 column. The other table, the one with billions of rows, should also have an index on DriverMD5. Once done, I would expect a medium powered server to complete a join of these tables, not quickly, but at least within a timeframe of several minutes.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
December 27, 2016 at 9:18 am
Eric M Russell (12/27/2016)
I would import the records from the .csv file into a user table (not a temp table), and then create a primary key on DriverMD5 column. The other table, the one with billions of rows, should also have an index on DriverMD5. Once done, I would expect a medium powered server to complete a join of these tables, not quickly, but at least within a timeframe of several minutes.
If the DB can be taken to BULK LOGGED recovery, build the table with the CI in place and using a minimally-logged BULK INSERT. If you turn on TF 610, the import doesn't even need to be sorted in the same order as the CI to get minimal logging.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2016 at 10:57 pm
Valid point. I have already removed batch processing.
I wanted result of all the hashes. But it can be in batches as well. Not necessarily in one shot.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply