September 25, 2017 at 8:37 pm
Comments posted to this topic are about the item Compressing Integers
September 25, 2017 at 10:58 pm
Really interesting question, thanks Steve.
Learned something...
____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”
September 26, 2017 at 12:32 am
Interesting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if a BIGINT is really needed.
September 26, 2017 at 1:34 am
Thanks for today's QOTD Steve.
Tricky question - just 29% correct so far.
September 26, 2017 at 2:07 am
Here the script:
CREATE TABLE dbo.LotsofIntegers1 (ID INT) WITH (DATA_COMPRESSION = PAGE);
CREATE TABLE dbo.LotsofIntegers2 (ID BIGINT) WITH (DATA_COMPRESSION = ROW);
CREATE TABLE dbo.LotsofIntegers3 (ID BIGINT) WITH (DATA_COMPRESSION = PAGE);
CREATE TABLE dbo.LotsofIntegers4 (ID BIGINT, INDEX LotsofIntCDX CLUSTERED COLUMNSTORE) ;
go
insert into LotsofIntegers1 select o.object_id from sys.objects o, sys.columns
insert into LotsofIntegers2 select o.object_id from sys.objects o, sys.columns
insert into LotsofIntegers3 select o.object_id from sys.objects o, sys.columns
insert into LotsofIntegers4 select o.object_id from sys.objects o, sys.columns
go 1000
DBCC UPDATEUSAGE (0,LotsofIntegers1)
DBCC UPDATEUSAGE (0,LotsofIntegers2)
DBCC UPDATEUSAGE (0,LotsofIntegers3)
DBCC UPDATEUSAGE (0,LotsofIntegers4)
go
exec sp_spaceused 'LotsofIntegers1'
exec sp_spaceused 'LotsofIntegers2'
exec sp_spaceused 'LotsofIntegers3'
exec sp_spaceused 'LotsofIntegers4' -- the winner
September 26, 2017 at 3:44 am
Rune Bivrin - Tuesday, September 26, 2017 12:32 AMInteresting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if a BIGINT is really needed.
In the referenced blog post, each row group (about 1M rows) gets the same value. So 1 million row groups go into the table to create the 1 trillion row table. However, since the entire row group has the same value, compression is in its most optimal state and of course the columnstore index won't take much space. If each row had a unique value, the dictionary of the columnstore index would probably explode and I'm not sure if the CCI would perform much better than row/page compression.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
September 26, 2017 at 7:43 am
Rune Bivrin - Tuesday, September 26, 2017 12:32 AMInteresting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if a BIGINT is really needed.
Does it matter? It's about creating the smallest table. Maybe the smallest table can't be compressed, but you couldn't know that. Even if there's not much of a difference, it's about finding the smallest table.
September 26, 2017 at 7:50 am
Steve Jones - SSC Editor - Tuesday, September 26, 2017 7:43 AMRune Bivrin - Tuesday, September 26, 2017 12:32 AMInteresting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if a BIGINT is really needed.Does it matter? It's about creating the smallest table. Maybe the smallest table can't be compressed, but you couldn't know that. Even if there's not much of a difference, it's about finding the smallest table.
It would matter if you didn't really need the BIGINT; that would have made the table even smaller. Maybe even a TINYINT would have been sufficient.
September 26, 2017 at 7:51 am
Steve Jones - SSC Editor - Tuesday, September 26, 2017 7:43 AMRune Bivrin - Tuesday, September 26, 2017 12:32 AMInteresting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if a BIGINT is really needed.Does it matter? It's about creating the smallest table. Maybe the smallest table can't be compressed, but you couldn't know that. Even if there's not much of a difference, it's about finding the smallest table.
What is your definition of the smallest table?
I'd think the smallest table would occupy the least amount of disk size. And for columnstore indexes, it all depends on how well the data can be compressed.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
September 26, 2017 at 7:56 am
I don't get this:
One trillion integers won't fit in an INT column so a BIGINT type is needed.
Can someone explain?
Be still, and know that I am God - Psalm 46:10
September 26, 2017 at 7:59 am
david.gugg - Tuesday, September 26, 2017 7:56 AMI don't get this:One trillion integers won't fit in an INT column so a BIGINT type is needed.
Can someone explain?
The maximum value in an integer column is 2,147,483,647, which is considerably less than one trillion.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
September 26, 2017 at 8:01 am
david.gugg - Tuesday, September 26, 2017 7:56 AMI don't get this:One trillion integers won't fit in an INT column so a BIGINT type is needed.
Can someone explain?
Rephrasing that as "Integers up to the value of one trillion won't fit in an INT column, so a BIGINT type is needed" makes more sense. Steve will have to say if that was what was intended; otherwise it's just nonsensical.
September 26, 2017 at 9:00 am
I wasn't trying to overthink this too deeply. Just adding a large amount of data, with random values greater than 2^9, a 32bit int. Added some verbiage to clarify these are random values up to 2 ^ 10
September 26, 2017 at 10:55 am
Steve Jones - SSC Editor - Tuesday, September 26, 2017 9:00 AMI wasn't trying to overthink this too deeply. Just adding a large amount of data, with random values greater than 2^9, a 32bit int. Added some verbiage to clarify these are random values up to 2 ^ 10
2^10 = 1024. That will fit in a smallint, which covers from -(2^15) to 2^15-1.
Perhaps you meant 10^10 . . .
September 26, 2017 at 11:00 am
t.ovod-everett - Tuesday, September 26, 2017 10:55 AM2^10 = 1024. That will fit in a smallint, which covers from -(2^15) to 2^15-1.
Perhaps you meant 10^10 . . .
I had the same problem with the question, guess I'm too mathy :hehe:
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply