Since I designed the Enhanced Threading Framework (ETF) I am always looking to improve it. Status quo is not for me. I decided to take on the 'queue' part of the framework. This is the part that uses a SQL Server table to manage the list of work to be done. I have used the Framework in two ways. One way creates a SQL Server table out of a grouping of other tables, which are actually sql statements themselves. The second way I use the Framework is to hold a list of external files that I then import into a SQL Server table. To do this, I use a Script Component to scan a bunch of directories and push that list into a SQL Server table. It was this second use that got me thinking: "Why do I need to push this list into a SQL Server table? How else can I use this list as a queue?". It took me a few weeks of spare time, but I believe I have found a way: Named Pipes.
This post will show you the changes necessary to implement the new Named Pipe version of the ETF. If you are unfamiliar with the framework, please look here before continuing on.
I created a simple test to see if the process would actually work. I have a text file which contains the full path of 214,880 files. This file list is provided by the File Source Script Component and consumed by the engine Script Component.
Below is an overview of what my test looks like. I have created 4 engines and a File Source.
A standard For Loop is used with each engine. I use a variable, engine_0_isMore, for engine 0 which checks to see if the variable ==1. If it does, there is work to be done, cycle again. Each engine will have its own variable, just like in the ETF.
The File Source opens up a file (in this case it is hard coded, but can easily be replaced by a package variable) and then creates a pipe. The first thing you have to do to even include the pipe library is convert your script from the .Net 2.0 to .Net 3.5. After you open your script up, on the right side in the property explorer you will see a long and weird looking name, something like st_a925f581915d460e8b2e5c29095299a1. Right-click on that name and select properties. It will look like this:
This will enable you to include System.IO.Pipes. Here is the File Source listing.
The code will provide one line of text every time a process connects to the pipe. Only one process is allowed to connect to the pipe at a given time, so we have our Queue. This does differ from the SQL Server table in that only one connection at a time is allowed. The table based implementation allows for any number of concurrent connections. You might be thinking; "Why would I want to implement this seemingly limited version?". I am looking to trade round trip SQL Server calls for local in-memory calls. Is it faster? Not sure as of yet, because I haven't finished my timing tests. Never the less, on to the client portion!
The engine connects to the pipe which then sends a string. Remember the Engine must also be .Net 3.5. All of the Script Components must be .Net 3.5 to take advantage of the Named Pipes.
The string that is pulled from the pipe is then saved to a package variable, for engine 0 it is Engine_0_msg. I kind of cheat in the Data Flow Task by not providing a Data Source. If you look I have a Derived Column that takes, in the case of Engine 0, Engine_0_msg and passes that down the stream. The Derived Column object will yell at you saying there is no input columns, but I don't care, I am using the variable. I neat trick.. Since I am only testing how the queue works, I dump the stream. I have provided the source code for the project in the downloads section. The file name is SSIS_NamedPipe_Queue.zip