Compressing Integers

  • Comments posted to this topic are about the item Compressing Integers

  • Really interesting question, thanks Steve.
    Learned something...

    ____________________________________________
    Space, the final frontier? not any more...
    All limits henceforth are self-imposed.
    “libera tute vulgaris ex”

  • Interesting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if  a BIGINT is really needed.


    Just because you're right doesn't mean everybody else is wrong.

  • Thanks for today's QOTD Steve.
    Tricky question - just 29% correct so far.

  • Here the script:

    CREATE TABLE dbo.LotsofIntegers1 (ID INT) WITH (DATA_COMPRESSION = PAGE);
    CREATE TABLE dbo.LotsofIntegers2 (ID BIGINT) WITH (DATA_COMPRESSION = ROW);
    CREATE TABLE dbo.LotsofIntegers3 (ID BIGINT) WITH (DATA_COMPRESSION = PAGE);
    CREATE TABLE dbo.LotsofIntegers4 (ID BIGINT, INDEX LotsofIntCDX CLUSTERED COLUMNSTORE) ;
    go
    insert into LotsofIntegers1 select o.object_id from sys.objects o, sys.columns
    insert into LotsofIntegers2 select o.object_id from sys.objects o, sys.columns
    insert into LotsofIntegers3 select o.object_id from sys.objects o, sys.columns
    insert into LotsofIntegers4 select o.object_id from sys.objects o, sys.columns
    go 1000

    DBCC UPDATEUSAGE (0,LotsofIntegers1)
    DBCC UPDATEUSAGE (0,LotsofIntegers2)
    DBCC UPDATEUSAGE (0,LotsofIntegers3)
    DBCC UPDATEUSAGE (0,LotsofIntegers4)
    go
    exec sp_spaceused 'LotsofIntegers1'
    exec sp_spaceused 'LotsofIntegers2'
    exec sp_spaceused 'LotsofIntegers3'
    exec sp_spaceused 'LotsofIntegers4' -- the winner

  • Rune Bivrin - Tuesday, September 26, 2017 12:32 AM

    Interesting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if  a BIGINT is really needed.

    In the referenced blog post, each row group (about 1M rows) gets the same value. So 1 million row groups go into the table to create the 1 trillion row table. However, since the entire row group has the same value, compression is in its most optimal state and of course the columnstore index won't take much space. If each row had a unique value, the dictionary of the columnstore index would probably explode and I'm not sure if the CCI would perform much better than row/page compression.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Rune Bivrin - Tuesday, September 26, 2017 12:32 AM

    Interesting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if  a BIGINT is really needed.

    Does it matter? It's about creating the smallest table. Maybe the smallest table can't be compressed, but you couldn't know that. Even if there's not much of a difference, it's about finding the smallest table.

  • Steve Jones - SSC Editor - Tuesday, September 26, 2017 7:43 AM

    Rune Bivrin - Tuesday, September 26, 2017 12:32 AM

    Interesting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if  a BIGINT is really needed.

    Does it matter? It's about creating the smallest table. Maybe the smallest table can't be compressed, but you couldn't know that. Even if there's not much of a difference, it's about finding the smallest table.

    It would matter if you didn't really need the BIGINT; that would have made the table even smaller. Maybe even a TINYINT would have been sufficient.


    Just because you're right doesn't mean everybody else is wrong.

  • Steve Jones - SSC Editor - Tuesday, September 26, 2017 7:43 AM

    Rune Bivrin - Tuesday, September 26, 2017 12:32 AM

    Interesting. However, the question doesn't actually say if those are one trillion different integers, or if there are repetitions. If there are one trillion different values, I doubt there will be that much of a difference. If there are repetitions, we can't really know from the question if  a BIGINT is really needed.

    Does it matter? It's about creating the smallest table. Maybe the smallest table can't be compressed, but you couldn't know that. Even if there's not much of a difference, it's about finding the smallest table.

    What is your definition of the smallest table?
    I'd think the smallest table would occupy the least amount of disk size. And for columnstore indexes, it all depends on how well the data can be compressed.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I don't get this: 

    One trillion integers won't fit in an INT column so a BIGINT type is needed.

    Can someone explain?

    Be still, and know that I am God - Psalm 46:10

  • david.gugg - Tuesday, September 26, 2017 7:56 AM

    I don't get this: 

    One trillion integers won't fit in an INT column so a BIGINT type is needed.

    Can someone explain?

    The maximum value in an integer column is 2,147,483,647, which is considerably less than one trillion.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • david.gugg - Tuesday, September 26, 2017 7:56 AM

    I don't get this: 

    One trillion integers won't fit in an INT column so a BIGINT type is needed.

    Can someone explain?

    Rephrasing that as "Integers up to the value of one trillion won't fit in an INT column, so a BIGINT type is needed" makes more sense. Steve will have to say if that was what was intended; otherwise it's just nonsensical.


    Just because you're right doesn't mean everybody else is wrong.

  • I wasn't trying to overthink this too deeply. Just adding a large amount of data, with random values greater than 2^9, a 32bit int. Added some verbiage to clarify these are random values up to 2 ^ 10

  • Steve Jones - SSC Editor - Tuesday, September 26, 2017 9:00 AM

    I wasn't trying to overthink this too deeply. Just adding a large amount of data, with random values greater than 2^9, a 32bit int. Added some verbiage to clarify these are random values up to 2 ^ 10

    2^10 = 1024.  That will fit in a smallint, which covers from -(2^15) to 2^15-1.
    Perhaps you meant 10^10 . . .

  • t.ovod-everett - Tuesday, September 26, 2017 10:55 AM

    2^10 = 1024.  That will fit in a smallint, which covers from -(2^15) to 2^15-1.
    Perhaps you meant 10^10 . . .

    I had the same problem with the question, guess I'm too mathy :hehe:

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply