There are a million and one uses for synonyms. There are at least that many uses for SSIS. The reality is, not all of those uses are good for you nor for your data nor for your database.
Recently I wrote an article about some good and some bad with synonyms. You can read that article from my work blog site here. Today, I just want to expand on that article a little bit. I glossed over some things pretty quick in that article that I though might be fun to explain with a little more detail.
The Bad – a little closer look
First, let’s take a look a little closer at the bad. More specifically, in the example I used, there was a vendor that created a synonym for everything. I really do mean everything. This is one case where using the word “literally” would be accurate. On the client database, I could run a quick query such as the following and see over 7300 synonyms.
[codesyntax lang="tsql"]
SELECT name AS SynonymName ,SCHEMA_NAME(schema_id) AS SchName ,create_date ,base_object_name FROM sys.synonyms;
[/codesyntax]
In the case of this client and this vendor, 7300+ synonyms is far too many. It just led to mass confusion. If you were to run that query, you might see something like the following image.
I added a “derived” column to show the total count of synonyms and the record name as it relates to that total. That is a crazy amount of synonyms. That just makes me think that somebody got into the krazy kool-aid one day, was bored and gave life to a synonym beast.
The Good – With more detail
On the flip side, in the aforementioned article, I talked about synonyms as a means to tweak performance in SSIS. Normally I would not tout a synonym as a performance tweak. So I want to caution that the performance gains are specific to SSIS and a handful of those millions of uses for SSIS.
Let’s just begin with a little bit of background. For that background, some pictures will be really handy. So here goes.
In the preceding image we see a very simple segment of a data flow.
The data source uses a sql command to fetch the data required for the data flow. In the beginning, it is very straight forward. You probably have some package lying around with something similar to this.
In the following image, we see what the SQL Command was for that data source circled in red in the previous image.
In the next image we see a slight tweak to the query. This time to include a reference to a table that is defined/obfuscated by a synonym.
At this point I can hear some of you saying, “Ok, I see what he is doing.” While many others are wondering why I just made the query more complex than the previous example.
Well as luck would have it, this change serves a couple of purposes. 1) The data has been staged in a separate database. That database has a different name in every environment (recall the aforementioned article). So the synonym minimizes code changes when deploying the package. 2) The synonym allows us to confirm that there is data in the stage table and that the data matches a ClientCode in the destination table. 3) Lastly, the synonym reduces my dataset which reduces memory requirements and also gets the data loaded faster (because it is smaller).
In addition to this minor tweak, I can also do something like the following.
In the preceding image, we see two datasources. Each datasource is subsequently sorted and then eventually joined. Much like the previous example, due to naming standards and an effort to try and minimize code changes during deployments, at least one datasource is pulling in too much data. The data is filtered down due to the Join transformation, but this is not very efficient.
Through the use of a synonym, the datasources can be reduced to a single datasource. This will eliminate the need for the Sort transformations and Join transformation. Removing those three transformations reduced memory requirements. And like the previous example, since we can trim down the number of records, the data flow will run a little bit faster too.
As You can see, the code is simple. It’s not a super TSQL trick or anything to add a synonym into an existing query. It just gets referenced like any other table. Once again, in this case, the synonym is pointing to a table in a staging database. That table has been loaded as a part of an ETL process and now needs to be manipulated a little bit through some other transformations and then inserted eventually into a “target” database.
Conclusion
As with tuning stored procedures or any TSQL, a similar technique was used here. Reducing the datasets to contain JUST the data that is needed for the flow. To facilitate that reduction in data to be just the essential data, I employed synonyms.
The reasons for using a synonym in this case were to: 1) restrict data to precisely what was needed, 2) ensure data being loaded was “constrained” by data in the destination table (e.g. only load for a specific client that does exist), and 3) minimize code changes during deployments.
When dealing with databases that serve the same purpose but follow some absurd naming standard that changes the name between environments, it can become cumbersome to maintain code during deployments. This is particularly true when dealing with cross database joins or lookups.