How to generate new Int ID for multiple columns

Question

How to generate new Int ID for multiple columns

Phillip.Putzback

Ten Centuries

Points: 1251
More actions
April 29, 2016 at 2:04 pm

#307056

I am working on staging data for loading a tabular model and I have to merge transactions and the parent account number into my Fact table. But because the Transactions and accounts could have matching IDs in other source systems I want to renumber them. In addition to improving the performance of the model with a smaller Int key instead of a key made up of a mix of 30 some characters.
So if in my source table I have
TXID AccountID
3fsd3 12
fdgd 12
23da 12
34fa 13
324c 13
vcvdfg 44
How do I get it to be
TXID AccountID NewTXID NewAcctID
3fsd3 12 1 1
fdgd 12 2 1
23da 12 3 1
34fa 13 4 2
324c 13 5 2
vcvdfg 44 6 3
I have been tinkering with the Row_Number() Over (order By TransactionID) -T-SQL and it works fine for creating the TXID. But I can't seem to figure out the Partition part of the T-SQL to only create a NewACctID only when the account ID changes.
The only way I have been able to do that is to create two tables with a incremental ID column, load Transactions into one, accounts into the other, and then merge them to get the IDs. But I am hoping there is way to do it in the TQL.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 1

I believe you'd be looking for the DENSE_RANK or RANK functions to do what you're trying to do with the Account ID's, keep in mind that will break if you ever get files on different days that have different accounts in them.

Are you actually noticing performance issues that make you think switching to an int is necessary?

DonlSimpson SSCertifiable Points: 7043 More actions · Answer 2

You can try this:

declare @t table (TXID varchar(10), AccountID int)

insert @t

(TXID, AccountID)

values

('3fsd3', 12), ('fdgd', 12), ('23da', 12), ('34fa', 13), ('324c', 13), ('vcvdfg', 44)

select t.TXID,

t.AccountID,

row_number() over (order by t.AccountID, t.TXID),

dense_rank() over (order by t.AccountID)

from @t t

Don Simpson

I'm not sure about Heisenberg.

Phillip.Putzback Ten Centuries Points: 1251 More actions · Answer 3

I am new to tabular modeling and from what I have read is that queries shouldn't take longer than a few seconds. My transaction table is just short of a 200 million rows and the Account table is about 6 million. The keys are varchar and quite long and from what I have read in the Performance tuning guide, if I can use get the system to value encoding instead of hash, I'll get better performance. They should also take up less space.

Phillip.Putzback Ten Centuries Points: 1251 More actions · Answer 4

The Dense_rank is perfect. It allowed me to run it against all my columns. I even replaced my Row_number with it.

SELECT t.TransactionId, dense_rank() over (order by T.TransactionID) as TXKey,

t.FacilityAccountID, dense_rank() over (order by t.FacilityAccountID) as FAKey,

T.attendingProvider, dense_rank() over (order by t.attendingProvider) as AttPrvKey,

T.ReferringProvider, dense_rank() over (order by t.ReferringProvider) as RefPrvKey,

t.PostDate

from Transactions T

Is there a reason to use Row_number instead of dense_rank on the first column. It appears dense_rank did the same thing.

DonlSimpson SSCertifiable Points: 7043 More actions · Answer 5

Phillip.Putzback (4/29/2016)
Is there a reason to use Row_number instead of dense_rank on the first column. It appears dense_rank did the same thing.

When TXID is unique, they will do the same thing. I suspect that row_number probably has a little less overhead.

Don Simpson

I'm not sure about Heisenberg.

Alan Burstein SSC Guru Points: 61152 More actions · Answer 6

Is there a reason to use Row_number instead of dense_rank on the first column. It appears dense_rank did the same thing.

ROW_NUMBER does not take ties into consideration, RANK and DENSE_RANK do.

To better understand the difference between ROW_NUMBER, RANK and DENSE_RANK consider the following queries (note my comments):

-- Insert unique values into table

DECLARE @numbers TABLE(SomeNumber int);

INSERT @numbers VALUES (1), (2), (3), (4);

-- ROW_NUMBER, RANK and DENSE_RANK are all the same

SELECT

SomeNumber,

[ROW_NUMBER] = ROW_NUMBER() OVER (ORDER BY SomeNumber),

[RANK] = RANK() OVER (ORDER BY SomeNumber),

[DENSE_RANK] = DENSE_RANK() OVER (ORDER BY SomeNumber)

FROM @numbers;

-- Now add some duplicate values

INSERT @numbers VALUES (2), (2), (3), (3);

-- Now let's see how each function produces different values

SELECT

SomeNumber,

[ROW_NUMBER] = ROW_NUMBER() OVER (ORDER BY SomeNumber),

[RANK] = RANK() OVER (ORDER BY SomeNumber),

[DENSE_RANK] = DENSE_RANK() OVER (ORDER BY SomeNumber)

FROM @numbers;

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

drew.allen SSC Guru Points: 77014 More actions · Answer 7

Alan.B (5/4/2016)
ROW_NUMBER does not take ties into consideration, RANK and DENSE_RANK do.

I would phrase that differently. They all take ties into consideration: RANK and DENSE_RANK treat ties as being indistinguishable and assign them all the same value whereas ROW_NUMBER treats ties as distinct and orders them in a non-deterministic manner before assigning distinct values.

The way you phrased it could potentially be interpreted as ROW_NUMBER discards ties, which is most definitely NOT the case.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Alan Burstein SSC Guru Points: 61152 More actions · Answer 8

drew.allen (5/4/2016)
Alan.B (5/4/2016)
ROW_NUMBER does not take ties into consideration, RANK and DENSE_RANK do.
I would phrase that differently. They all take ties into consideration: RANK and DENSE_RANK treat ties as being indistinguishable and assign them all the same value whereas ROW_NUMBER treats ties as distinct and orders them in a non-deterministic manner before assigning distinct values.
The way you phrased it could potentially be interpreted as ROW_NUMBER discards ties, which is most definitely NOT the case.
Drew

True that.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Phillip.Putzback Ten Centuries Points: 1251 More actions · Answer 9

Thanks for all the tips. Since the tables are so wide that I am importing into a stage table I am having to break down columns that I want keys on into there own tables and creating the keys off of an Identity column since using dense_rank was killing the server when trying to run a table with a dozen columns that I want keys for.

Jez-448386 SSCarpal Tunnel Points: 4596 More actions · Answer 10

Did you consider sequence tables? I've not used them myself but would they have done what you wanted?

https://msdn.microsoft.com/en-us/library/ff878091.aspx

Jez

Phillip.Putzback Ten Centuries Points: 1251 More actions · Answer 11

I'll have to hunt down some other examples because I really don't get what the sequence object is used for in their examples.

Thanks.