How is data stored or indexed in Clustered and nonclustered Columnstore Index?

Question

Post reply

How is data stored or indexed in Clustered and nonclustered Columnstore Index?

freddyism00

SSC Veteran

Points: 225
More actions
September 12, 2014 at 3:36 am

#299176

hi,
I would like to know what the underlying data structure of the Nonclustered and clustered Columnstore Index (In-Memory) is.
I mean the column entries are somehow compressed and loaded into memory in both cases, but how is the data itself structure in the memory?
Are there indexes created first from columnvalues, or is there a hashtable being constructed from each of the column values ?
and from the columnvalues, how are the rows built together again? I mean are they numbered sequentielly in parallel for each column? so if I ask 20th entry from column x, then it looks automatically for the 20th column entry in y ?

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Sean Pearce SSCoach Points: 15999 More actions · Answer 1

Columnstore Indexes Described

The SQL Guy @ blogspot[/url]

@SeanPearceSQL

About Me[/url]

freddyism00 SSC Veteran Points: 225 More actions · Answer 2

I already looked into Microsoft Documentation but that wasnt helpful -.-

the inner structure of the Columnstore Index and how data in memory is represented is not written there... hope someone can provide

more sources or informations ....

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

Columnstores aren't in-memory structures. They're disk-based.

Have you read all the whitepapers on columnstore? If not, hit google, a search for whitepaper columnstore finds them.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 4

There's not much on the detailed internals of the in-memory indexes yet. The non-clustered index is still a b-tree, sort of. They also call it a bw-tree. The bw stands for buzz word because it's not really a b-tree the way we think of them. There's also the hash indexes. The basic structures there are well defined. Your key is hashed and put into the number of buckets that you define for that index. Define too many buckets, you have a very wide index that gets hurt on performance because it has to look through so much stuff to find your data. Define too few, you get a lot of scans within a bucket because of hash collisions, again, hurting performance.

But, the nitty-gritty, these are the compression algorithms we're using, kind of information, I haven't seen it.

I'm addressing just the in-memory stuff. Columnstore is different.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

freddyism00 SSC Veteran Points: 225 More actions · Answer 5

hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..

I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...

I didn't really get behind it yet... I will do some more research about it

brad.mason5 Hall of Fame Points: 3299 More actions · Answer 6

freddyism00 (9/12/2014)
hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..
I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...
I didn't really get behind it yet... I will do some more research about it

Yes I agree. I think all the "in memory" references really are not in memory only it is just a different way Microsoft stores the data on disk.

From article

http://blogs.msdn.com/b/mvpawardprogram/archive/2014/06/16/updateable-column-store-indexes-in-sql-server-2014.aspx

The column store indexes are part of Microsoft In-Memory Technologies because they use xVelocity engine for data compression optimization and its implementation is based on a columnar structure such as PowerPivot and SSAS Tabular. Data in column store indexes are organized by column, each memory page stores data from a single column, so each column can be accessed independently. This means that SQL Server Storage Engine will be able to fetch the only columns it needs. In addition, data is highly compressed, so more data will fit in memory and the I/O operations can greatly decrease.

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 7

freddyism00 (9/12/2014)
hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..
I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...
I didn't really get behind it yet... I will do some more research about it

Well, they're "in memory" as much as a standard index is there as well. But they're not a part of the in-memory tables and indexes introduced in SQL Server 2014 (formerly code named Hekaton).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 8

That's what you get when marketing gets to name features.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

freddyism00 SSC Veteran Points: 225 More actions · Answer 9

ok I think I kind of figured it out, it's sort of memory-optimized.... they compress the columns into segments and store them on disk as BLOB's, the columnstore index itself is in-memory just as usual indexes, but the data itself is still on disk, but when needed in Querying or analysis services, the data (or rather the segment BLOB's to be more precise) can be leveraged partially as needed into memory for further processing, am I right?

and I totally agree on the marketing part, this is very confusing, and I guess that's not the only thing on Microsoft technologies with confusing names... PowerPivot, PowerPivot, DTS, BIDS, SSDT, SSDT-BI...etc. for beginners this is really hard to go through.. often there is not even a clear documentary or specification about their technology.. like in the SSDT/-BI confusion:

http://www.jamesserra.com/archive/2012/04/ssdt-installation-confusion/

If Steve Jobs would have dont this to their customers.. they probably wouldn't exist anymore today

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 10

Quick note, Alberto Ferrari gave a very good talk on optimizing DAX queries at the last SQLBits where he explained in good details the Vertipaq/xVelocity compression, rowgroups, deltastores etc. Also Stephane Haby wrote a nice blog piece on the matter. Not certain that this what you are after but it's relative and informative.

😎

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 11

freddyism00 (9/13/2014)
ok I think I kind of figured it out, it's sort of memory-optimized....

Not really.

they compress the columns into segments and store them on disk as BLOB's, the columnstore index itself is in-memory just as usual indexes, but the data itself is still on disk, but when needed in Querying or analysis services, the data (or rather the segment BLOB's to be more precise) can be leveraged partially as needed into memory for further processing, am I right?

Normal indexes are read into memory to be processed, they have to be, a computer's CPU can't operate directly on data on disk, it has to be in memory to be processed at all. We don't call normal clustered and nonclustered indexes 'in memory' just because they're read into the buffer pool before processing.

Stop trying to classify column store as 'in memory' and you'll probably find it easier to understand how it works.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass