January 24, 2015 at 11:41 am
Comments posted to this topic are about the item Replication Filters
January 26, 2015 at 1:17 am
Interesting question, thanks Steve.
Definately learned something.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 26, 2015 at 1:58 am
This was removed by the editor as SPAM
January 26, 2015 at 2:11 am
Thank you for the post, Steve, very interesting one.
Row filters in transactional publications can add significant overhead because the article filter clause is evaluated for each log row written for a published table, to determine whether the row should be replicated. Row filters in transactional publications should be avoided if each replication node can support the full data load, and the overall data set is reasonably small.
any takers on the bold letters? (small as-in 100s?)... like replicating Master data (from the msdn example under static row filter) we add 100 products today and we replicate it and may be next day add another 50 or 75 in master and then replicate it...? if sales has millions of records added each day and then filtering on a specific region like it says.. "can add significant overhead". so performance wise is parameter row filter is better?
ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.
January 26, 2015 at 6:33 am
Nice straightforward question.
Half the people who have replied so far think they can use a column filter to select rows, which is a bit startling; and more than four fifths of the answers so far are wrong, which is even more so.
Tom
January 26, 2015 at 6:47 am
TomThomson (1/26/2015)
Nice straightforward question.Half the people who have replied so far think they can use a column filter to select rows, which is a bit startling; and more than four fifths of the answers so far are wrong, which is even more so.
I totally mis-interpreted both the question and something on BOL (pre-coffee) and as soon as the answer came up incorrect I knew why!!
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
January 26, 2015 at 6:48 am
The article mentioned seems to indicate that you could also use column filtering (although why you'd want to on such a small table is beyond me) but reading the question asks: what options do I have. Not which one would be the most appropriate.
Welcome to test writing 101.
January 26, 2015 at 6:55 am
mhynds (1/26/2015)
The article mentioned seems to indicate that you could also use column filtering (although why you'd want to on such a small table is beyond me) but reading the question asks: what options do I have. Not which one would be the most appropriate.Welcome to test writing 101.
And further if you interpret the issue in the following way, a column filter is appropriate.
The replication store only contains a subset for the US sales region, thus, maintaining the SalesRegion column becomes a redundancy at that point and the column should be filter out.
January 26, 2015 at 12:11 pm
Tricky Tricky 🙂
January 26, 2015 at 12:30 pm
Raghavendra Mudugal (1/26/2015)
Thank you for the post, Steve, very interesting one.Row filters in transactional publications can add significant overhead because the article filter clause is evaluated for each log row written for a published table, to determine whether the row should be replicated. Row filters in transactional publications should be avoided if each replication node can support the full data load, and the overall data set is reasonably small.
any takers on the bold letters? (small as-in 100s?)... like replicating Master data (from the msdn example under static row filter) we add 100 products today and we replicate it and may be next day add another 50 or 75 in master and then replicate it...? if sales has millions of records added each day and then filtering on a specific region like it says.. "can add significant overhead". so performance wise is parameter row filter is better?
I''d say that the answer to your question is in the sentence before the bolded part: "if each replication node can support the full data load" (and though not explicitly mentioned, I would add the network to that).
If your network is saturated, or a subscriber does not have the storage space or processing power to handle the full load, then using a row filter is a great way to reduce the load, by filtering rows before they are published. But if your networks and subscribers can handle the full load without issues, then the recommendation here suggests not using a row filter, and instead just pushing all rows and then, optionally, delete or index the subscribers to prevent the unneeded extra rows from impacting their performance.
January 26, 2015 at 12:55 pm
I've never fully explored it, but it will let you use an IN in the where clause. I think the behavior is that it gets evaluated for the snapshot ok, but I don't think it picks up changes made after that.
January 27, 2015 at 1:16 am
Hugo Kornelis (1/26/2015)
Raghavendra Mudugal (1/26/2015)
Thank you for the post, Steve, very interesting one.Row filters in transactional publications can add significant overhead because the article filter clause is evaluated for each log row written for a published table, to determine whether the row should be replicated. Row filters in transactional publications should be avoided if each replication node can support the full data load, and the overall data set is reasonably small.
any takers on the bold letters? (small as-in 100s?)... like replicating Master data (from the msdn example under static row filter) we add 100 products today and we replicate it and may be next day add another 50 or 75 in master and then replicate it...? if sales has millions of records added each day and then filtering on a specific region like it says.. "can add significant overhead". so performance wise is parameter row filter is better?
I''d say that the answer to your question is in the sentence before the bolded part: "if each replication node can support the full data load" (and though not explicitly mentioned, I would add the network to that).
If your network is saturated, or a subscriber does not have the storage space or processing power to handle the full load, then using a row filter is a great way to reduce the load, by filtering rows before they are published. But if your networks and subscribers can handle the full load without issues, then the recommendation here suggests not using a row filter, and instead just pushing all rows and then, optionally, delete or index the subscribers to prevent the unneeded extra rows from impacting their performance.
I, actually understood that; when there is a role of publisher and subscriber, definitely network comes into the consideration (apart from a case or a scenario where publisher itself acts as a subscriber, which I can't think of any); My question still remains unanswered.
".. can support full data load" along with that "and overall the data subset is reasonably small" - given this, even though a subscriber has enough storage space and processing power* - I CAN take all the US region data (say that a subscriber is scheduled or manually executed to pull the data every 48hours and lets say there are 300,000 new rows in Sales and 125,000 belongs to US) by applying static row filter from the sales table? (provided sufficient disk space and processing power, so I can forget about the performance slowness caused by overheads?)
* - processing power: cpu, ram, internet bandwidth, alternate power supply
ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.
January 27, 2015 at 2:42 am
Thanx, learned somthin new 2day.
Thanks & Best Regards,
Hany Helmy
SQL Server Database Consultant
January 27, 2015 at 6:33 am
Thanks for the question.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply