February 24, 2021 at 5:21 pm
Hi all,
I have a third-party database/application that I'm attempting to augment with functionality using Service Broker. The scenario is the following:
I am considering using Service Broker because I would like to make this whole process both asynchronous and real-time. Also, while I could probably use replication to accomplish this, I'd like to keep the footprint of this whole process as small as possible, and replication feels like a sledgehammer approach.
In order to have this function in real-time, I'm considering using a trigger to handle the instantiation of the messages in the source service broker queue. Typically, I've avoided triggers like the plague. My hope is that by wrapping the code within the trigger in a TRY/CATCH statement, I can avoid having failures in the service broker message delivery affect the primary CRUD operation on the table from executing. I also hope that the actual invocation of the service broker message has a minimal impact on the system. I don't want the CRUD operations to start taking longer because of this process.
My basic plan is to have the CRUD operation perform an INSERT into audit table, and then send a message through service broker. At the receiving end, a similar process will happen - an INSERT will be performed into an audit table at the destination, and the message will be consumed. Finally, I'll have an audit report on a regular basis that will let me know whether there are any messages which were sent from the source and not consumed properly at the destination.
Any insights are appreciated. Thanks!
February 24, 2021 at 9:13 pm
My opinion, this is a good use of service broker. We have a VERY similar thing happening on our systems, but it is a 2-way thing with service broker. Server A sends messages to server B and B sends messages to A. This allows us to have near-real time data from both systems and have it in sync for doing any calculations and logic without the need for linked servers or cross database communication.
Things to watch out for - conversation handles being open too long. This is one problem we ran into is if a conversation handle is left open for too long, it can become slow. The handle is what ensures that messages arrive in order. We close the handles daily during company downtime and make new ones as soon as the old ones are closed. We also have logic built into the triggers to open a new conversation if none are currently open. We store open conversation handles in a table and it results in a TINY table - 4 rows as we have a few different processes that use service broker. Basically, you want 1 handle per set of ordered operations that need to be performed. A good example of "in order" operations would be if you were capturing your UPDATE as a DELETE and an INSERT. If you send across 2 messages for an UPDATE (one to delete and one to insert), you want to make sure the DELETE happens first.
Some things to note about service broker - receiving and processing messages is by default a low priority operation. What this means is that in the event of a deadlock, the service broker operation is going to be the victim. By default, poison message handling is turned on (and I recommend leaving it that way). What this does is if a message fails to be processed, it will try again several times (3 or 5... I can never remember offhand) and after enough failures, the queue will be disabled. This is to ensure that messages continue to be processed in the correct order. There are 2 approaches you can take to deal with poison messaging - 1 let service broker kill the queue on bad messages and have some monitoring tool in place to alert you when a queue goes down. Not an ideal solution, but is a low-impact (to the system, high to the end users) solution. Alternately, you can have the stored procedure that is handling the message toss the message into a table to be processed later/manually or be ignored and keep working. It depends on what your workflow requires on failure to process the message. Where I work, having the queue go down is the ideal solution when it is unable to process the message. The reason being that we require that the messages be processed in order to prevent accidental data loss which would result in broken logic in the application. For something like an audit table, having the message be stored to be processed later is probably a safer option than having the queue go down, but at the same time, it should be very rare (if all it is doing is auditing) that a message fails to be processed.
Another thing to note about service broker is that it is asynchronous and near-real-time. I say near real time because network congestion or getting CPU cycles can cause messages to be delayed and processing messages on the destination can delay things, but with my setup, I am seeing less than a second delay and usually it is close enough that we consider it real-time. When we initially set it up, it worked great in the dev environment with maybe 10-15 messages per day, but when we hit real-world we saw performance issues and deadlocks galore. The problem - we were doing a new conversation per message. That is a BIG no-no with service broker and will tank your performance. Opening and closing a conversation is a slow operation. It may seem fast when you are only doing 1 or 2 messages, but when you start doing 70,000+ per day, you are going to hit performance issues. And yes, we do 70,000+ messages being transferred between the 2 systems per day and after switching to the daily conversation cycle (close all existing that are complete then open new ones) we have had no notable performance issues on either system.
With the triggers that will be sending off the messages, make them as short and efficient as you can. If it seems like there is too much logic in the trigger, it probably is. The ones I built for service broker look at our conversation handle table to get a conversation handle to send a message on. Next, it checks to see if that handle is open. If it isn't, we open a new one and notify the DBA so they can look into fixing that before performance issues pop up. Once we have an open conversation, we generate the XML for the message and pass it over to service broker to send the message. On the receiving server, it is a stored procedure that grabs the message and processes it. XML isn't the fastest to process, but if the message is short, it can be processed quickly. So the trigger is short and quick and the stored procedure on the other side should be short and quick too.
The TL;DR version of this is make sure to open and close a conversation as infrequently as you can as it is slow and that is where your performance will tank and you will get calls that things are failing (deadlocks for example). And make sure to close conversations at regular intervals (daily on my system, but I've read of others who do weekly... really just as long as you have a downtime window). Closing and opening a conversation happens quick, but it is not an instant thing. And make sure to test everything in a test environment first to make sure you are happy with how it works before going live.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
February 24, 2021 at 10:59 pm
That is some very useful info Brian - thanks! The setup I'm trying to create is indeed very similar to yours. Unfortunately since the system is a bit of a black box, I can't control exactly how stuff is being done, and the consequence of that is, like yours, we'll have a huge number of messages.
As an example - the system has a CSV import for data processing. A well-designed system would likely do something with a TVP or a bulk update/insert, however, this one unfortunately does a transaction per row in the CSV file. Since the file can have thousands of rows, that means a single import could result in thousands of messages being created in the queue.
Do you foresee that becoming a problem? I was thinking of maybe trying to do some sort of batch operation on processing of the message, but as you mentioned, I'd like to minimize the amount of logic being added to the trigger itself.
Also - this is a bit off-topic but the way the system handles INSERT/UPDATE is ... silly. It essentially does both an INSERT *and* an UPDATE - if the INSERT fails due to violating the primary key constraint, it follows up by doing an UPDATE. If the INSERT succeeds, it actually will then UPDATE the record and ... set it to itself, basically.
I'm considering having the trigger check if there is an INSERT message in the queue before running the UPDATE. But again - maybe I'm overthinking it. Having the INSERT and UPDATE both execute isn't the end of the world - even if the messages arrive out of order, the receiving end would simply have to have the proper logic (if the UPDATE shows up first, just ignore it, since the record doesn't exist. If the INSERT shows up first, then we do both the UPDATE and INSERT, which is inefficient, but not the end of the world).
February 25, 2021 at 4:39 pm
I personally don't see it being a problem having 1000's of messages being sent at once. I would still send them on a single conversation. The advantage to that is you don't need to worry about what order the messages are sent (with the insert/update) as it will always happen in-order.
You could batch up the messages, but my approach has always been to send the message as soon as it is ready. The reason behind this is small XML objects are a LOT easier and faster to process when they are short. I am not sure how much work you have done with XML in SQL before, but it can be a pain in the butt to get right. That is one of the big reasons I like short messages - I can run my query and I can verify the data easily. Now imagine you have a 1000+ line XML object to parse and you have a bug in your code that misses 1 or 2 lines. Eyeballing that to verify is painful. Drop that down to a 10-30 line XML file and it is MUCH easier.
My approach with regards to your trigger would be to either have 3 triggers - one for INSERT, one for UPDATE, and one for DELETE and handle the logic in them separately rather than building logic into the trigger to check if it is an update or not. The main reason being that I like my triggers short and to the point. Alternately, you could go with the single trigger to handle everything approach and just select everything from the INSERTED and DELETED tables into your XML message and handle the processing of that on the destination side.
Basically, try to keep the logic and processing light on the source side and do all the heavy lifting on the destination side. If it takes 10 seconds to process the message on the destination, that is fine as the messages will just queue up and get processed as the system can handle it. If it takes 10 seconds to process the message on the source side, you may end up with blocking or deadlocks and screwing up your data import. Mind you, service broker shouldn't add 10 seconds to any processing unless your triggers are complicated. And it shouldn't take 10 seconds on the destination side unless you are doing a lot of lookups or calculations on the data prior to inserting it into your audit table. Service broker should only add small fractions of a second to the processing that are near zero (at least it did in my case).
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 1, 2021 at 8:52 am
Very helpful Brian, thanks. I have almost no experience of Service Broker, but we have 3rd party APPs/DBs that use it.
I have a 3rd party database where I want duplicated data in my database (some massaging and flattening). Both databases are on same server (unlikely to move to a different server in future, might move to a different instance).
Like @kramaswamy the 3rd party APP does some inefficient things. I'm not sure about RBAR on CSV import (it probably does that, I've never checked, but all its CSV import is small), but all the CRUD on the main (many-column, many-child-table) tables does multiple updates, typically 5 or 6, on pretty much every INSERT and UPDATE (and pretty sure they are not in a transaction 🙁 ). There is also a fair amount of NOLOCK in the 3rd party database.
Currently I have a process that runs every 5 minutes or so which queries the SourceDB for "changed rows" and Upserts them to my database. This is a bit fraught because processing ParentTable first and then ChildTable may get Child rows that are for new Parent records that are not yet in my table ... The processing is sufficiently large that I have chosen not to do it as single atomic set in order not to block anything. Of course I will get the Parent "next time", but if I query on Modify Date I won't get the Child next time (if I don't take it this time in violation of FKey)
3rd party tables have ModifyDateTime on their rows, but they are not 100% reliable 🙁
My hope is that by wrapping the code within the trigger in a TRY/CATCH statement, I can avoid having failures in the service broker message delivery affect the primary CRUD operation on the table from executing.
What happens when the Source fails to send a message? Is it not going to need a "refresh" of anything that has changed?
What happens until that is resolved? Presumably Foreign keys start to fail on the Destination?
Is there a reason to choose Service Broker over a batch Upsert?
I am not sure how much work you have done with XML in SQL before, but it can be a pain in the butt to get right.
Me too! The XML-hacking part is a negative for me (in terms of "DEV time to get it working smoothly")
How about an alternative of using TRIGGER to just copy the Row to a "Staging table" (in a database on the same server as the Source, so that it is fast and failsafe), and then have the Remote query the Staging Tables for anything newer? That would solve my problem that "Source database Modify Date is not 100% reliable" (i.e. I would be responsible for the Modify DateTime on my Stanging table), I could choose to only store "Columns of interest" in the Staging Table, and only Upsert that (from Main table on Source) if there is actually a substantive difference (in the columns that I am interested in) to reduce "non essential traffic".
March 1, 2021 at 7:49 pm
Hello Kristen. To address your concerns, messages very rarely fail to send. For a message to fail sending, pretty much the only cases I can think of are that the destination is unreachable or the transaction is rolled back. And even with the destination being unreachable, it will send the message as soon as the destination is back assuming the conversation is still open. Service broker is fairly resilient like that. That being said, there may be cases where you want to re-send a message if it doesn't arrive. These can be a little bit more tricky IF you don't keep an audit log. My system has a log of all messages that were sent and received that is kept for 1 week. If a message gets "lost" or processed incorrectly, end users are pretty quick to notify me, plus we have reports we can run to validate that data is accurate between the two systems within a margin of error (messages in transit while running the report). In the event a message is processed incorrectly, I can validate the data that went across and every time it has been due to a human-level process that changed that resulted in a bad message going across the pipe. Since these messages are saved, it is easy to validate that the messages are good (or not) and in the event a conversation gets closed prior to processing all of the messages, we have an audit log so we can re-send any unsent messages.
The advantage of Service Broker over batch upsert is that it is done in near real time and done asynchronously. For us, the near-real time was the important part. Some logic in an in-house application relies on data inside the ERP system. Our options were to use a linked server (which was the previous solution but it was causing performance issues), scheduled task to copy data over (this ran into problems with the data not being close enough to real-time), CDC (which at the time would have required us to upgrade from Standard to Enterprise edition), drastic code changes to alter code logic on the C# side to pull data from both the ERP and the product database, or Service Broker to copy the relevant data from the ERP to the product database in a near real-time manner. Of our options, Service Broker was the most cost effective and least risk approach.
XML is a pain in the butt to work with, but on the source side, you just grab all of the changed data and toss it into an XML object. THAT part is easy to do. It is on the destination side that things get tricky because now you need to strip apart that XML. Nice thing is, it is easy to build up some sample data to work with by having your triggers in place on the source side that are simply capturing the data and tossing it into a table without Service Broker sending anything. Do this for a short period of time with a trigger on a source table to capture what you are going to need to process, then write your query to handle it. Then as long as no metadata changes on the source database related to your tables that are in use by service broker, you will have no problems. With us, the biggest pain in the butt is when we upgrade the ERP, it drops all of the tables that have schema-level changes which means all of the triggers I made go poof. Easy to catch in the test system though during a test-upgrade, so no harm there and minimal problems after live is upgraded as my scripts to recreate the triggers are ready to rock.
Using a trigger to capture data changes is another solution you could go with, but you run into the problem of how often are you going to query your source table to update the destination tables? Is this something that happens every hour or day or week or minute? how "real-time" do you need the data and how frequently is it changing? If you only need hourly refreshes on the data (for example), then service broker is overkill. If you need near real-time, then service broker is helpful. Linked servers could work as well and just pull data from the source (presuming you can live with the potential performance hit). With service broker you can also just pull across the columns of interest.
The only problem I see with your trigger to just copy the row to a staging table is that that trigger is going to create blocking on the staging table which may affect the performance of the tool. Plus you are going to end up with duplication of data inside your database (the actual table and the staging table are going to hold some of the same data).
The approach I would take (if it is allowed... vendor tools don't always like you playing with their tables) would be to put an UPDATE trigger on the source table that updates the modify date of any rows that are updated. This would make your source database "modify date" more reliable. A column that is not reliable is (in my mind) useless so why bother keeping it as is?
And once you have a sample piece of XML to work with, it isn't too hard to make it bend to your will. Especially with the process that is being described as you are likely going to have a root tag followed by the table followed by the data, so it is a pretty easy model to work with for XML. Our process is to have a conversation type for INSERT, DELETE, and UPDATE so we can process it that way. Alternately, you could have that encoded into your message.
To summarize - if I need near real time data and I can't use a linked server (performance issues, company policy, etc), then service broker is going to be the option I go with. I can pick exactly which tables and columns are handled by it and it will provide me with a near real time method of processing my data.
If near real time data isn't my goal, I am probably going to look at offloading the work to something like SSIS rather than triggers. Here, I can pull only the columns I care about and the only processing on the source database is going to be the initial data pull. Everything else I can process on the SSIS server or at the destination side. This way I have minimal impact on the source system.
Plus, since you already have service broker set up and working, you may be able to piggy back on the existing setup for sending messages across. May need to make a few new objects depending on how things are set up, but a lot of the setup tasks are already completed.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 2, 2021 at 5:53 am
Thanks Brian ... but you are not selling it to me!
To do a proper job I would need to have ability to resend a message e.g. audit log kept for a week, but then I need a means of detecting something is missing or rely on user to spot something missing ... and I'd have to have the XML "untangling" to build too; I don't mean to be negative, your real-world description is very helpful to me in understanding what I would need to plan to have in place in order to make a pukka implementation with Service Broker; for my use-case (personal skills, and no multi-server requirement), I think I'm better off what what I know best. That said, I will find a small project that would allow me to gain experience.
Using a trigger to capture data changes is another solution you could go with, but you run into the problem of how often are you going to query your source table to update the destination tables?
We started out querying the Source databases "live", but performance was poor. I don't know how the 3rd party APP gets good performance, their database layout doesn't suit me! For example, the APP deals with lots of regulatory stuff and has a child table for Dates related to a Record, some of which repeat (fair enough) but most can only have one value. Right PITA to have to JOIN to Child table to get each individual date that is needed, rather than having a column in the parent table. So I want the data warehoused into a format more suitable for my purpose, and yes you are right, "very near real time" would be better than "every 5 minutes". Particularly as at the 5 minute batch run I am likely to get some records that are half way through their multi-update change ...
trigger to just copy the row to a staging table is that that trigger is going to create blocking on the staging table which may affect the performance
Hadn't thought of that ... thanks.
you are going to end up with duplication of data inside your database (the actual table and the staging table are going to hold some of the same data).
Yup. However, that I am happy with. Its a trade off, but whenever I have done it in the past it has been very robust ... however, maybe I should rethink that ...
put an UPDATE trigger on the source table that updates the modify date of any rows that are updated
Vendor is relaxed about interaction with their database (which is helpful), but I definitely wouldn't want to modify the data in one of their "admin" columns, nor add a column to their tables (too much hassle during an upgrade)
If Staging Table size became a problem perhaps I could have the staging table only contain Update Date and PKeys ... risk that I am just making a queue at which point (presumably) Service Broker would be the better weapon
A column that is not reliable is (in my mind) useless so why bother keeping it as is?
I reckon that its just a bug or two. Works most of the time, probably only ever used for human auditing, unlikely to have been spotted by users that some updates fail to update the ModifyDate ...
We don't update our ModifyDates by Trigger for situations where we want to merge data from other copies and retain the source ModifyDate ... so I have some sympathy. (Depends on the definition of "Modified" of course ... )
when we upgrade the ERP, it drops all of the tables that have schema-level changes which means all of the triggers I made go poof
Do you have any tricks to help with that? My approach has been to use a naming convention that allows me to query objects (I think only Triggers, Indexes, Tables [normally put them in a different database for "safety"]) that are "ours" and then see if any go missing at any time.
March 2, 2021 at 1:58 pm
Kristen,
In your scenario, I am not sure service broker is the route I would go either. The biggest part that would make me use a different solution is that there is no multi-server requirement. If it is all going to be contained in a single database/instance, I think that service broker is probably overkill. It can handle it for sure, but I think it may be more work than it is worth.
Something to note with service broker - as long as a message got onto the service broker queue and the transaction is not rolled back, it will try to send it to the destination. As long as the destination comes back online at some point, the message will be sent and processed. Our system of logging it for a week and manually checking a report was more for proof it worked initially and waiting on users to tell us it is broken is more when they change processes on their end that result in how messages are handled to be updated but nobody tells the DBA. We also have the service broker queues shut down on poison messages and have our SQL Monitor (tool from RedGate; I do not work for them, but like their tools) checks to see how many service broker queues are online. It runs a check every 5 minutes and if the number drops below the accepted value (2 for most instances, one for sending, one for receiving, and 1 queue for instances that only send), then an alert goes out to the DBA team. We have had 0 issues with service broker in 2 years (implemented roughly 3 years ago). Initial implementation is where we learned about the slowness of opening and closing a conversation handle and had a boatload of deadlocks hitting our live ERP and causing our ERP support person work correcting things. Nothing was irrecoverable, it was just a pain in the butt.
In your scenario where you are really just making a more accurate audit log inside an audit table, I think a trigger is going to be a better option for you than service broker. OR CDC (change data capture). If I remember right, CDC became a feature of Standard edition in SQL Server 2017. CDC may be overkill for what you are doing as well, but it can be nice if you are working with CDC aware tools like SSIS and only need to move changes or you only want to review changes. Alternately, temporal tables may be what you would like. Something to note - I have never set up or used CDC/Temporal Tables, I have just read about them and they are interesting technology and have a few good use cases. I am not sure if this is a good use case or if it is overkill. A trigger is probably going to be sufficient and will be the most beneficial in your scenario.
I suppose that a column like that that works most of the time but has a few bad pieces of data is not horrible. Having something that is "mostly right" is better than having it wrong all the time. And depending on database settings, you may run into troubles with that too (if triggers can fire other triggers is turned on for example, you may need to be careful about infinite loops). And merge data where you need to retain modify date really does depend on the definition of modify.
Tricks for updates that drop objects - I like to use DLM Dashboard (another Red Gate tool, but this one is free). Add the test database to DLM, do the upgrade on the test system, check DLM to see what changed. This will tell you if triggers went away or tables, indexes, etc. From my experience, I have never seen a 3rd party tool update drop a table or database that I have created, but objects that they create can change (seen columns get added and removed, seen indexes added and removed, seen tables dropped and recreated, seen constraints added and removed... pretty much any object created by the 3rd party tool, if you have customized it, it MAY lose the customization during an upgrade). The other fun part about DLM Dashboard is that it will tell you exactly what changed. If a column datatype changed during an upgrade, you may not notice something like VARCHAR(31) changing to VARCHAR(50), but that may have some impact on your end users who can now store an additional 19 characters in the field. Or you may see things that should alarm you such as a column changing from a DECIMAL data type to a MONEY or FLOAT data type for storing financial data. That should set off some alarm bells and you may want to contact the software vendor to open a bug report with them so they can correct it.
The other fun part about only worrying about your created objects is that the software update MAY change some metadata that causes your triggers or stored procedures to perform poorly at the tradeoff of making their tool perform better. They may add/remove a default index because it doesn't help their tool, but it helped your custom stuff. My opinion, it is good to review what changed on the database side if you have time on your upgrade project. I say if you have time because the ERP upgrade we did made changes to over 10,000 objects and trying to review those in DLM dashboard caused it to have tons of page timeouts. In that scenario, SQL Compare was a better tool to investigate it, but a much more manual process and requires you to have 2 different versions of the database (before and after upgrade) to do the comparison (can compare from a backup file). Nice thing about DLM dashboard is that it is web based so there is very little load on your local machine while the differences load. Nice thing about SQL Compare is it is easier to filter out what you don't care about (unchanged objects, users/logins, etc). ApexSQL also has a tool called SQL Diff that will do a similar thing and I am sure there are other tools out there.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
March 3, 2021 at 5:57 am
Thanks for taking the time to type that Brain, all very informative 🙂
Tricks for updates that drop objects - I like to use DLM Dashboard ...
For 3rd party database upgrades we restore a "Before" copy (nothing here that is massive) and have some queries that check schema differences (during evaluation/testing). For the tables we are warehousing I am definitely interested in any size/type changes 🙂 and new columns, for other tables which are in my SYNONYMS I am interested in case they impact any existing code. And any new tables in case I ought to be interested in them 🙂
I'll take a look at DLM
March 3, 2021 at 2:53 pm
In the real-world implementation of SQL Server of an enterprise system, it always needs to implement data in multiple databases. In most of the cases, a single transaction will span across multiple databases. If there are multiple databases, there will be performance issues and implementation issues during the cross-database transactions. However, with the use of SQL Service Broker that was introduced in SQL Server 2005, you can implement asynchronous transactions between databases. forpc.onl jiofi.local.html
May 4, 2022 at 10:46 am
Thanks for sharing this information. For the tables we are warehousing I am definitely interested in any size/type changes. techsolveguide.com, playstoreinfo.com
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply