May 14, 2015 at 12:07 am
Comments posted to this topic are about the item Column Store vs Row Store Indexes
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
May 14, 2015 at 2:24 am
An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:
'FacTransaction_ColumnStore' – needs an extra 't'
'Compression ration' – ratio
'foorprint' – footprint
'datawarhouse' – missing 'e'
'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain
'proccessing' – too many c's
Not trying to detract from the article, just trying to help.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 14, 2015 at 3:05 am
Phil Parkin (5/14/2015)
An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:'FacTransaction_ColumnStore' – needs an extra 't'
'Compression ration' – ratio
'foorprint' – footprint
'datawarhouse' – missing 'e'
'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain
'proccessing' – too many c's
Not trying to detract from the article, just trying to help.
Thanks Phil. I just submitted the new updates.
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
May 14, 2015 at 7:20 am
Phil Parkin (5/14/2015)
An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:'FacTransaction_ColumnStore' – needs an extra 't'
'Compression ration' – ratio
'foorprint' – footprint
'datawarhouse' – missing 'e'
'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain
'proccessing' – too many c's
Not trying to detract from the article, just trying to help.
OK since we are going that direction, just pulling from the "Conclusion" section here is what I noticed:
datawarhouse -- should be datawarehouse or perhaps data warehouse
would need to be rebuild every time -- rebuilt
which may be prohibited if -- prohibitive
a good partition startegy -- strategy
But the article was really good.
May 14, 2015 at 7:53 am
One thing that is missing from your analysis is the type of Aggregation of the data in the data warehouse. Column Store indexes are mimicking the behavior of Key Value Pair (Casandra) or Column Store databases (InfiniDB/InfoBright). These are 5x faster than traditional RDBMS if and only if you are looking at aggregating a single distinct metric. If you need to have many metrics (measures) in a row, then RDBMS is faster so calculations can be performed across the row in the Kernel. Row Calculations is where this technology fails.
Another consideration is that you need to compare the loading / Indexing time for inserting data. Yes, Column Store is useful because it is basically calculating the indexes all at once during creation using a bit mapped B-Tree. It knows how many of the distinct values will be in each extent and the ranges. Therefore, it can minimize the IO for retrieval. But the cost of building this up front can be high if you are working with streams or high number of new inserts in a short time. To get around this, the only way I have found to handle this is via Partitioning. It will drop the IO even more on retrieval.
Otherwise, it was a good analysis.
Regards, Rob
May 14, 2015 at 11:55 am
I just finished this article and learned a few things. Well done.
-- Itzik Ben-Gan 2001
May 15, 2015 at 1:34 am
Thanks to all for your comments.
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
May 15, 2015 at 3:21 am
Hi,
I can't find the scripts that you mentioned in the article?
Regards
Jari
May 15, 2015 at 3:28 am
lahteenj (5/15/2015)
Hi,I can't find the scripts that you mentioned in the article?
Regards
Jari
I uploaded them again! Once the moderator will refresh the article, they should be there. Sorry for this
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -
December 7, 2015 at 1:17 pm
Grasshopper said "f you are looking at aggregating a single distinct metric. If you need to have many metrics (measures) in a row, then RDBMS is faster so calculations can be performed across the row in the Kernel. Row Calculations is where this technology fails."
>5 years ago we moved from SQL Server to Infobright. While there were several compromises that we had to take into consideration moving to a columnar database, calculations across rows was in no way one of them. I literally have hundreds of reports and most of them have at least 5 metrics. these reports typically run in < 4 seconds. Largest fact table approx 175 million rows.
Compromises that we had to take into consideration was that Insert Update and Delete were horribly long. We overcame this by utilizing drop and rebuild table or partitioning (In our BI Software not in the database).
One Huge advantage of the Columnar database was/is that we really do not have to be as selective of what we include in the table. if the table is 100 columns wide it is irrelevant because there is no IO overhead on columns not selected in the query. The other huge advantage of Columns is that I do 0 (Zero, Zilch, Nada) maintenance/performance tuning on the database. nice for a small shop!
We are, in fact, going to move back to SQL Server 2016 if the Column Store Indexes are as improved as the Marketing folks claim.
December 30, 2016 at 4:54 am
In SQL 2016 both clustered and non-clustered columnstore indexes are updateable, so there is no need to disable them prior to updates to the underlying table:
Columnstore Indexes Versioned Feature Summary
Thanks for the article, lots of interesting info in there!
__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]
December 30, 2016 at 8:56 am
We have SQL 2016 and using column store indexes. You are spot on with seeks. We found that if a stored proc only wants a small range of values, a seek is preferred and much faster than scanning entire column store index. Aggregates work great for column store.
We recently implemented Niko's technique of clustered column store, but then add index views on top. This allows for seeks on b-tree if small range, but if we have query that needs to scan will hit column store. For this to work all the time, need to take a look at OPTION(RECOMPILE). But that hit for us is not bad so far. The other thing is once you add index views, updates/inserts/deletes will take longer on the table. If the table is constantly updated, will need to test to see if index views work for you.
http://www.nikoport.com/2016/10/10/columnstore-indexes-part-87-indexed-views/
January 3, 2017 at 7:24 am
Even though an old article, still a good read.
January 20, 2017 at 6:54 pm
I just got done reading Hugo Kornelis's Stairway to Columnstore Indexes (http://www.sqlservercentral.com/stairway/121631/ ) when the email nudge for this article (i.e. Column Store vs Row Store Indexes) showed up. It was a nice complement to Hugo's great overview (and his suggestions on optimizations).
Thanks, Brian!
May 22, 2020 at 6:16 pm
in the attached script of the article, we're creating a non-clustered column_store index on FactTransaction_RowStore table. Is this intended or typo? what scenario /test (out of 5) we see its application?
adding more:
is the script meant to be executed in one-go?
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply