How can I query it fast ?

Question

Post reply

How can I query it fast ?

SQL Kidu

Ten Centuries

Points: 1296
More actions
December 22, 2016 at 1:50 am

#401253

I have a few million records in a csv file.
This has only one field which is a driver md5 ( eg. 0x00001DE0033358C743199217832D895C0 )
I need to look up this in a table which has billions of records.
Currently this is what I do:
1. split this into batches of 10000
2. BULK INSERT in a temp table
3. Join that temp table with lookup table and get the result
Is there any way to make it faster ?
The columns in both tables are in binary format.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

rajprabu11 Valued Member Points: 68 More actions · Answer 1

What you are doing seems to be the right way. Can you elaborate on the requirement, why you are doing this.

SQL Kidu Ten Centuries Points: 1296 More actions · Answer 2

Thanks.

This was to lookup for some information only. I got it done using a stored procedure and TVP. It's working fine. Thanks for your time in replying.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Why are you doing this in batches? It's usually faster to do a single load than multiple ones.

Are the columns indexed?

Are you really trying to return a million rows result set? What for?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Eric M Russell SSC Guru Points: 125623 More actions · Answer 4

I would import the records from the .csv file into a user table (not a temp table), and then create a primary key on DriverMD5 column. The other table, the one with billions of rows, should also have an index on DriverMD5. Once done, I would expect a medium powered server to complete a join of these tables, not quickly, but at least within a timeframe of several minutes.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 5

Eric M Russell (12/27/2016)
I would import the records from the .csv file into a user table (not a temp table), and then create a primary key on DriverMD5 column. The other table, the one with billions of rows, should also have an index on DriverMD5. Once done, I would expect a medium powered server to complete a join of these tables, not quickly, but at least within a timeframe of several minutes.

If the DB can be taken to BULK LOGGED recovery, build the table with the CI in place and using a minimally-logged BULK INSERT. If you turn on TF 610, the import doesn't even need to be sorted in the same order as the CI to get minimal logging.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SQL Kidu Ten Centuries Points: 1296 More actions · Answer 6

Valid point. I have already removed batch processing.

I wanted result of all the hashes. But it can be in batches as well. Not necessarily in one shot.