September 12, 2014 at 3:36 am
hi,
I would like to know what the underlying data structure of the Nonclustered and clustered Columnstore Index (In-Memory) is.
I mean the column entries are somehow compressed and loaded into memory in both cases, but how is the data itself structure in the memory?
Are there indexes created first from columnvalues, or is there a hashtable being constructed from each of the column values ?
and from the columnvalues, how are the rows built together again? I mean are they numbered sequentielly in parallel for each column? so if I ask 20th entry from column x, then it looks automatically for the 20th column entry in y ?
September 12, 2014 at 4:24 am
September 12, 2014 at 5:37 am
I already looked into Microsoft Documentation but that wasnt helpful -.-
the inner structure of the Columnstore Index and how data in memory is represented is not written there... hope someone can provide
more sources or informations ....
September 12, 2014 at 6:03 am
Columnstores aren't in-memory structures. They're disk-based.
Have you read all the whitepapers on columnstore? If not, hit google, a search for whitepaper columnstore finds them.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 12, 2014 at 8:28 am
There's not much on the detailed internals of the in-memory indexes yet. The non-clustered index is still a b-tree, sort of. They also call it a bw-tree. The bw stands for buzz word because it's not really a b-tree the way we think of them. There's also the hash indexes. The basic structures there are well defined. Your key is hashed and put into the number of buckets that you define for that index. Define too many buckets, you have a very wide index that gets hurt on performance because it has to look through so much stuff to find your data. Define too few, you get a lot of scans within a bucket because of hash collisions, again, hurting performance.
But, the nitty-gritty, these are the compression algorithms we're using, kind of information, I haven't seen it.
I'm addressing just the in-memory stuff. Columnstore is different.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 12, 2014 at 4:51 pm
hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..
I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...
I didn't really get behind it yet... I will do some more research about it
September 12, 2014 at 6:47 pm
freddyism00 (9/12/2014)
hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...
I didn't really get behind it yet... I will do some more research about it
Yes I agree. I think all the "in memory" references really are not in memory only it is just a different way Microsoft stores the data on disk.
From article
The column store indexes are part of Microsoft In-Memory Technologies because they use xVelocity engine for data compression optimization and its implementation is based on a columnar structure such as PowerPivot and SSAS Tabular. Data in column store indexes are organized by column, each memory page stores data from a single column, so each column can be accessed independently. This means that SQL Server Storage Engine will be able to fetch the only columns it needs. In addition, data is highly compressed, so more data will fit in memory and the I/O operations can greatly decrease.
September 13, 2014 at 6:11 am
freddyism00 (9/12/2014)
hm ok, but why is it then said that columnstore indexes are "in-memory" ??that confuses me..I read part of the whitepaper, it said the data columns are stored as BLOBs in a blob page on the disk... but somehow the index itself seems to be in memory? and there are also differences in the storage structure of nonclustered and clustered columnstore indexes...
I didn't really get behind it yet... I will do some more research about it
Well, they're "in memory" as much as a standard index is there as well. But they're not a part of the in-memory tables and indexes introduced in SQL Server 2014 (formerly code named Hekaton).
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 13, 2014 at 11:12 am
That's what you get when marketing gets to name features.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 13, 2014 at 2:16 pm
ok I think I kind of figured it out, it's sort of memory-optimized.... they compress the columns into segments and store them on disk as BLOB's, the columnstore index itself is in-memory just as usual indexes, but the data itself is still on disk, but when needed in Querying or analysis services, the data (or rather the segment BLOB's to be more precise) can be leveraged partially as needed into memory for further processing, am I right?
and I totally agree on the marketing part, this is very confusing, and I guess that's not the only thing on Microsoft technologies with confusing names... PowerPivot, PowerPivot, DTS, BIDS, SSDT, SSDT-BI...etc. for beginners this is really hard to go through.. often there is not even a clear documentary or specification about their technology.. like in the SSDT/-BI confusion:
http://www.jamesserra.com/archive/2012/04/ssdt-installation-confusion/
If Steve Jobs would have dont this to their customers.. they probably wouldn't exist anymore today
September 13, 2014 at 3:05 pm
Quick note, Alberto Ferrari gave a very good talk on optimizing DAX queries at the last SQLBits where he explained in good details the Vertipaq/xVelocity compression, rowgroups, deltastores etc. Also Stephane Haby wrote a nice blog piece on the matter. Not certain that this what you are after but it's relative and informative.
😎
September 13, 2014 at 5:15 pm
freddyism00 (9/13/2014)
ok I think I kind of figured it out, it's sort of memory-optimized....
Not really.
they compress the columns into segments and store them on disk as BLOB's, the columnstore index itself is in-memory just as usual indexes, but the data itself is still on disk, but when needed in Querying or analysis services, the data (or rather the segment BLOB's to be more precise) can be leveraged partially as needed into memory for further processing, am I right?
Normal indexes are read into memory to be processed, they have to be, a computer's CPU can't operate directly on data on disk, it has to be in memory to be processed at all. We don't call normal clustered and nonclustered indexes 'in memory' just because they're read into the buffer pool before processing.
Stop trying to classify column store as 'in memory' and you'll probably find it easier to understand how it works.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply