April 29, 2016 at 2:04 pm
I am working on staging data for loading a tabular model and I have to merge transactions and the parent account number into my Fact table. But because the Transactions and accounts could have matching IDs in other source systems I want to renumber them. In addition to improving the performance of the model with a smaller Int key instead of a key made up of a mix of 30 some characters.
So if in my source table I have
TXID AccountID
3fsd3 12
fdgd 12
23da 12
34fa 13
324c 13
vcvdfg 44
How do I get it to be
TXID AccountID NewTXID NewAcctID
3fsd3 12 1 1
fdgd 12 2 1
23da 12 3 1
34fa 13 4 2
324c 13 5 2
vcvdfg 44 6 3
I have been tinkering with the Row_Number() Over (order By TransactionID) -T-SQL and it works fine for creating the TXID. But I can't seem to figure out the Partition part of the T-SQL to only create a NewACctID only when the account ID changes.
The only way I have been able to do that is to create two tables with a incremental ID column, load Transactions into one, accounts into the other, and then merge them to get the IDs. But I am hoping there is way to do it in the TQL.
April 29, 2016 at 2:42 pm
I believe you'd be looking for the DENSE_RANK or RANK functions to do what you're trying to do with the Account ID's, keep in mind that will break if you ever get files on different days that have different accounts in them.
Are you actually noticing performance issues that make you think switching to an int is necessary?
April 29, 2016 at 3:18 pm
You can try this:
declare @t table (TXID varchar(10), AccountID int)
insert @t
(TXID, AccountID)
values
('3fsd3', 12), ('fdgd', 12), ('23da', 12), ('34fa', 13), ('324c', 13), ('vcvdfg', 44)
select t.TXID,
t.AccountID,
row_number() over (order by t.AccountID, t.TXID),
dense_rank() over (order by t.AccountID)
from @t t
Don Simpson
April 29, 2016 at 7:11 pm
I am new to tabular modeling and from what I have read is that queries shouldn't take longer than a few seconds. My transaction table is just short of a 200 million rows and the Account table is about 6 million. The keys are varchar and quite long and from what I have read in the Performance tuning guide, if I can use get the system to value encoding instead of hash, I'll get better performance. They should also take up less space.
April 29, 2016 at 8:28 pm
The Dense_rank is perfect. It allowed me to run it against all my columns. I even replaced my Row_number with it.
SELECT t.TransactionId, dense_rank() over (order by T.TransactionID) as TXKey,
t.FacilityAccountID, dense_rank() over (order by t.FacilityAccountID) as FAKey,
T.attendingProvider, dense_rank() over (order by t.attendingProvider) as AttPrvKey,
T.ReferringProvider, dense_rank() over (order by t.ReferringProvider) as RefPrvKey,
t.PostDate
from Transactions T
Is there a reason to use Row_number instead of dense_rank on the first column. It appears dense_rank did the same thing.
May 4, 2016 at 11:13 am
Phillip.Putzback (4/29/2016)
Is there a reason to use Row_number instead of dense_rank on the first column. It appears dense_rank did the same thing.
When TXID is unique, they will do the same thing. I suspect that row_number probably has a little less overhead.
Don Simpson
May 4, 2016 at 12:56 pm
Is there a reason to use Row_number instead of dense_rank on the first column. It appears dense_rank did the same thing.
ROW_NUMBER does not take ties into consideration, RANK and DENSE_RANK do.
To better understand the difference between ROW_NUMBER, RANK and DENSE_RANK consider the following queries (note my comments):
-- Insert unique values into table
DECLARE @numbers TABLE(SomeNumber int);
INSERT @numbers VALUES (1), (2), (3), (4);
-- ROW_NUMBER, RANK and DENSE_RANK are all the same
SELECT
SomeNumber,
[ROW_NUMBER] = ROW_NUMBER() OVER (ORDER BY SomeNumber),
[RANK] = RANK() OVER (ORDER BY SomeNumber),
[DENSE_RANK] = DENSE_RANK() OVER (ORDER BY SomeNumber)
FROM @numbers;
-- Now add some duplicate values
INSERT @numbers VALUES (2), (2), (3), (3);
-- Now let's see how each function produces different values
SELECT
SomeNumber,
[ROW_NUMBER] = ROW_NUMBER() OVER (ORDER BY SomeNumber),
[RANK] = RANK() OVER (ORDER BY SomeNumber),
[DENSE_RANK] = DENSE_RANK() OVER (ORDER BY SomeNumber)
FROM @numbers;
-- Itzik Ben-Gan 2001
May 4, 2016 at 2:02 pm
Alan.B (5/4/2016)
ROW_NUMBER does not take ties into consideration, RANK and DENSE_RANK do.
I would phrase that differently. They all take ties into consideration: RANK and DENSE_RANK treat ties as being indistinguishable and assign them all the same value whereas ROW_NUMBER treats ties as distinct and orders them in a non-deterministic manner before assigning distinct values.
The way you phrased it could potentially be interpreted as ROW_NUMBER discards ties, which is most definitely NOT the case.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 4, 2016 at 2:08 pm
drew.allen (5/4/2016)
Alan.B (5/4/2016)
ROW_NUMBER does not take ties into consideration, RANK and DENSE_RANK do.I would phrase that differently. They all take ties into consideration: RANK and DENSE_RANK treat ties as being indistinguishable and assign them all the same value whereas ROW_NUMBER treats ties as distinct and orders them in a non-deterministic manner before assigning distinct values.
The way you phrased it could potentially be interpreted as ROW_NUMBER discards ties, which is most definitely NOT the case.
Drew
True that.
-- Itzik Ben-Gan 2001
May 4, 2016 at 4:29 pm
Thanks for all the tips. Since the tables are so wide that I am importing into a stage table I am having to break down columns that I want keys on into there own tables and creating the keys off of an Identity column since using dense_rank was killing the server when trying to run a table with a dozen columns that I want keys for.
May 5, 2016 at 6:30 am
Did you consider sequence tables? I've not used them myself but would they have done what you wanted?
https://msdn.microsoft.com/en-us/library/ff878091.aspx
Jez
May 5, 2016 at 8:34 pm
I'll have to hunt down some other examples because I really don't get what the sequence object is used for in their examples.
Thanks.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply