Data Archival - text, XML, excel, db output

  • Hi All,

    I would like to know your thoughts and ideas on design that I am working on. I have been asked to design and develop a replacement for DTS packages that we currently use. SSIS is not an option. So, here are some details about the problem:

    Environment: SQL Server 2008

    Problem Description:

    1.There are about 50 DTS packages (possibly growing) that were created in 2000 to move data from tables to either CSV or XML files, occasionally to another database’s table.

    2.About 30 of these follow this pattern:

    a.Get parameters e.g. date, status, directory name, file name

    b.Pass above parameters from Step 2a to a SP, get some output parameters back and pass to next statement

    c.Select all records from a table that meet above criteria. So, query looks like:

    SELECT * FROM TABLE_NAME WHERE Col1 <= @Date AND Col2 = @status

    d.Write records from Step 2a in a CSV file using DTS Text destination control.

    e.Call a stored procedure (SP) to remove above records from the table by passing parameters from Step 2a.

    They repeat Steps 2a through Step 2d for different table and write a new CSV file for each table. In one DTS package there could be just one table and in another there could be 15 tables.

    3.Another 10 of these follow this pattern:

    a.Get parameters e.g. date, directory name, file name

    b.Pass above parameters to a SP that returns an XML

    c.Using an ActiveX script write this XML to an XML file.

    d.Call another SP to do some clean up of data from table.

    Again, repeat Steps 3a through 3d for different tables and write a new XML file for each table. Same thing here too… the number of tables reference depends on the DTS package.

    4.Another 5 do this:

    a.Get parameters e.g. date, directory name, file names

    b.Pass above parameters from Step 2a to a SP, get some output parameters back and pass to next statement

    c.Execute a SQL statement with above parameters

    d.Write output from 4b to a text file separated by with # as delimiter

    e.Read the text file written in Step 4d

    f.Connect to another database and dump that data to a table.

    g.Call a set of SPs in the new database.

    5.Last 5 are random and do unique tasks.

    Solution:

    I have thought of a solution using SQLCLR and few tables to support it.

    1.Tasks

    a.Id - Identity

    b.Name

    c.Output type – I have a mapping table for Jobs and Output because in few cases 1 job can have either a CSV and XML output.

    2.Steps

    a.Id - Identity

    b.TaskId

    c.OrderOfOperation

    d.Command Type (Stored procedure or Text) – used for assigned command type in C# code

    e.SPorText – Has the name of the SP or SQL script that needs to be executed

    f.StepOutput – (Datetime, string, bool, dataset) – the expected output of this step

    3.Parameters

    a.Id - Identity

    b.Name – (@date, @fileName) – this has the exact SP name that is passed for a step.

    c.Value – value of the parameter, can be null too

    d.SQLDataType – data type of the parameter

    e.Length – in case of varchar, char etc.

    f.Direction – if it is an input, output, InputOutput, ReturnValue

    g.StepId – to link it to the step it belongs to.

    4.Output types

    a.Id

    b.Name – CSV, XML, EXCEL, DB

    5.Queue

    a.Id – Identity

    b.TaskId – the task that has to be executed

    c.Status – If it is queued or complete or error.

    Let’s call our main SP as MoveData. So when MoveData is called

    •It looks into Queue table gets a Task

    •Using the TaskId it gets the Steps from Steps table.

    •For each step it get the parameters from Parameters table

    •Executes each step

    •Writes the file

    •Calls additional steps that are to be called after writing the file.

    •Cleans up by closing all connections, etc and exits.

    Suggestions? I am not happy with the solution but this is what I could come up with in a week.

    I am comfortable working with C# and TSQL.

  • I wouldn't bother with the current design and start from scratch: what are the input tables and values and what is the expected result.

    Don't worry about the current DTS design since SS2K8 has a lot more tools available to do the task (e.g. I don't think you'll need an ActiveX script to write an XML file).

    Try to design a set based solution that covers most of your steps with as little effort as required. Most probably, dynamic SQL is the way to go here.

    Personal note: from my point of view the question is a little beyond the forum scope. If you're unsure about the best design concept you might want to get a consultant in for a day or two.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I think I should have been more clear with what I am trying to do.

    Overall this is what happens:

    1. Execute a SP get result set back - this has all the parameters

    2. Do a select * from table_name using above parameters

    3. Write output of Step 2 to a file in either CSV or XML

    4. Call another SP that deletes data from table_name

    It is straight forward thing but there are some variables (like Step 1 can be split in many steps and each step calls some SP and sets some data some where and returns result and that is used in query in Step 2.)

    I don't think there is any set based operation that is required. All I really need to do is:

    select * from table_name where col1 = something and col2 = somethingElse

    Write result to an output file in a given location.

  • ASDL (5/16/2010)


    I don't think there is any set based operation that is required. All I really need to do is:

    select * from table_name where col1 = something and col2 = somethingElse

    Write result to an output file in a given location.

    Looks like you have all figured it out so, what's the problem? Where is the question? 🙂

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Yes, I got it.

    I think I over-engineered it.

    I am going to scrap all that and create one SQLCLR which accepts set of parameters, queries the table and hands over the output to a function (within SQLCLR) which will write the output to a file.

    That is going to be my solution for now but what I would like to be able to write is a SQLCLR like

    int WriteCsvFile(DataTable table) and it can be called from any TSQL SP that needs to write a CSV file.

    SQLCLR does not allow table-valued parameters so I will need to work around that. Moreover TSQL SPs that accept TVP allow of specific type which has to be defined before using in SP.

    I have a new problem now, what is the best way to pass a table to a SQLCLR?

  • ok. Take two. Let me try again!

    I wanted to get your opinion as to if this methodology will work for huge databases? I may have 12 million records in the table that I am reading from and creating a CSV file. I may have to write 10 million rows to a CSV file at a time. (Previous message has more details of implementation)

    1. Will SQLCLR be able to handle so much sequential data because each row will have to be written at a time to a CSV file? I plan on using a SQLDataReader.

    2. Do I need to worry about procedure timeouts if it is a SQLCLR? Timeouts and crashes may crash the whole SQL Server?

    3. Is using a service broker a better solution? Will it not run out of SQL Server context and require round trips to the database to get data rows? Also, any crash will not affect SQL Server?!

    Thanks,

    ASDL

  • Have you considered using bcp utility?

  • Yes, I did. I will not be able to use that as data has to be written in specific CSV format.

    Some fields will be in quotes, some not - and this is configurable.

  • I see...

    I'm guessing you have a config table with ColID, IsQualified, Qualifier columns, and within the CLR you'll be doing a look up by ColID?

  • brucla (8/26/2010)


    I see...

    I'm guessing you have a config table with ColID, IsQualified, Qualifier columns, and within the CLR you'll be doing a look up by ColID?

    I don't understand.

    Are you referring to how am I passing table parameter to SQLCLR?

    I am not passing a TVP (I think there is no way to do that); I write the contents to a table and pass the ID to the SQLCLR which in turn builds a query using those parameters and writes data to a file.

    For adding quotes to columns values, I have a static lookup table for each column which tells me which columns have quotes and other formatting.

  • If you have a static table for qualifiers, you can create a script that selects the fields with the delimiter and qualifiers.

    SELECT fn_GetFieldWithDelimiter(Column1, 'Column1') + fn_GetFieldWithDelimiter(Column2, 'Column2') + ...

    FROM DataTable

    The function fn_GetFieldWithDelimiter() takes a column value and a column name, then does a look up on the static qualifier config table. Returns the column value with the delimiter and qualifier.

    eg. SELECT @delimiter = Delimiter, @qualifier = Qualifier FROM QualiferConfigTable WHERE ColumnName = @columnName

    RETURN @qualifier + @columnValue + @qualifier + @delimiter

    There's few things like dealing with null values and stuff that should be added.

    And you would still have do to a little clean up, like removing the last delimiter.

    Your select statement returns something like this...

    "John","Smith",1234

    "Jim","Johnson",9876

    ...

    Then just write the result to a file using bcp. This way you can config the qualifier and delimiter and write to file as a set.

    I did this with fixed width and it works great. (my function obvious is written to handle fixed width)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply