Rank a large table

Question

Rank a large table

rash3554

SSCommitted

Points: 1984
More actions
October 31, 2017 at 9:24 am

#341005

Hi there,
I am trying to figure out the best way to rank a table that has 600 million rows.
This is data about Company financial info from different companies.
Eg: ID=1 (there is a reference table for ID to CompanyName) ; iDate (date when data was received) ; FinancialInfoName can be sales, Loans or Deposits; Value is value of the FinancialInfoName;
Create table dbo.tbl_needs_ranking(ID int not null, iDate Smalldatetime null,Value Decimal(22,2),FinancialInfoName varchar(100) null, Rnk int null)
This table gets data from 2 different work tables.
dbo.table1 and dbo.table2 each have about 300 million rows.
dbo.table1 has data from 2016 and dbo.table2 from 2015.
This is how I insert
TRUNCATE TABLE dbo.tbl_needs_ranking INSERT INTO dbo.tbl_needs_ranking(ID,iDate,Value,FinancialInfoName,Rnk) SELECT ID,iDate,Value,FinancialInfoName,NULL FROM dbo.table1
INSERT INTO dbo.tbl_needs_ranking(ID,iDate,Value,FinancialInfoName,Rnk) SELECT ID,iDate,Value,FinancialInfoName,NULL FROM dbo.table2
Rnk is a column that is rank by ID and Date.
;WITH CTE AS ( SELECT Rnk ,[ID] ,[iDate] ,DENSE_RANK() over (ORDER BY id, iDate) AS ranking FROM dbo.tbl_needs_ranking WITH (TABLOCK) ) UPDATE CTE SET Rnk=CTE.ranking
It's been 3 hours and not yet finished running.
What's the best way to rank this table?
Thank you
MR

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99133 More actions · Answer 1

600M records is a LOT of records. Even if your disc speed (both read and write) is great, upi have loads of RAM, etc, it is going to take a long time.

Doing an INSERT and then an UPDATE isn't going to help matter though. That means, firstly, all the data needs to be read and written (in the tables) for the initial INSERT(s), but then read again (in your new table) and then written to with the new Rank.

I would firstly ensure that you do the whole process in one go:
TRUNCATE TABLE dbo.tbl_needs_ranking;

--UNION the data first WITH CoreData AS ( SELECT ID, iDate, Value, FinancialInfoName, FROM dbo.table1 UNION ALL SELECT ID, iDate, Value, FinancialInfoName FROM dbo.table2), --Now calculate the rankings RankedData AS ( SELECT *, DENSE_RANK() OVER (ORDER BY id, idate) AS Ranking FROM CoreData) INSERT INTO dbo.tbl_needs_ranking (ID,iDate,Value,FinancialInfoName,Rnk) SELECT ID, iDate, Value, FinancialInfoName, Ranking FROM RankedData;

This, won't, necessarily, be that quick still; depending on what indexes you have on your tables. You're going to need as appropriate INDEX that the server can use to be able to ORDER by id and idate columns in both tables.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 2

What indexes do you have on table1, table2 and tbl_needs_ranking?

Have you considered something like this - doing the ranking and both inserts all in one operation? Your transaction log and/or tempdb may not thank you, but it's worth seeing how it compares on a test database with the same data.
WITH Combined AS ( SELECT ID,iDate,Value,FinancialInfoName FROM dbo.table1 UNION SELECT ID,iDate,Value,FinancialInfoName FROM dbo.table2 ) INSERT INTO dbo.tbl_needs_ranking(ID,iDate,Value,FinancialInfoName,Rnk) SELECT ID,iDate,Value,FinancialInfoName, DENSE_RANK() over (ORDER BY id, iDate) FROM Combined;

John

ScottPletcher SSC Guru Points: 101121 More actions · Answer 3

The best way to do this is to cluster the table on ( id, iDate ); that is, the clustered index on the table should be on those columns. And, as long as you look up by id and/or ( id, iDate ) at least half the time on that table, then that's what you should do. Since I/we don't see other queries, I/we don't know whether you need nonclus index(es) or not.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

rash3554 SSCommitted Points: 1984 More actions · Answer 4

I only have non-clustered index on dbo.table1 and dbo.table2.
And after the update (it never did finish) I have clustered index on dbo.tbl_needs_ranking

Thom A SSC Guru Points: 99133 More actions · Answer 5

Aclustered index on the base tables is going to be a huge improvement. I'd suggest following Scott's advice.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

ScottPletcher SSC Guru Points: 101121 More actions · Answer 6

If you know that all rows in table1 come before rows in table2, you can go ahead and assign the ranking as part of the load of the table. It's faster to sort 300M rows and 300M rows than to sort 600M rows.

TRUNCATE TABLE dbo.tbl_needs_ranking;

--drop any nonclus index on dbo.tbl_needs_ranking CREATE CLUSTERED INDEX tbl_needs_ranking__CL ON dbo.tbl_needs_ranking ( ID, iDate ) WITH ( FILLFACTOR = 99 ) ON [PRIMARY];

INSERT INTO dbo.tbl_needs_ranking(ID,iDate,Value,FinancialInfoName,Rnk) WITH (TABLOCK) SELECT ID,iDate,Value,FinancialInfoName,NULL FROM dbo.table1

INSERT INTO dbo.tbl_needs_ranking(ID,iDate,Value,FinancialInfoName,Rnk) WITH (TABLOCK) SELECT ID,iDate,Value,FinancialInfoName,NULL FROM dbo.table2

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".