Schema comparision

  • Hi

    I m integrated the mvc application with SSIS package. I m sending the excel file from mvc application to SSIS package. I need to transform excel file data to database table. I need to compare the excel file schema with database table schema. If both does not matches then I need to provide validations that mismatch of columns(say "Name" column in excel doesnot exists in database table). How can I do this. Please suggest.

  • where is the comparison going to occur? in a script task , likecomparing rows in a datatable DataTable to DataTable, for example?

    on the SQL server so you can use the EXCEPT or INERSECT operators?

    wherever it occurs, you need to shape the excel data to a common format to what you pull from SQL so that you can compare them.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • gajayku (7/2/2014)


    Hi

    I m integrated the mvc application with SSIS package. I m sending the excel file from mvc application to SSIS package. I need to transform excel file data to database table. I need to compare the excel file schema with database table schema. If both does not matches then I need to provide validations that mismatch of columns(say "Name" column in excel doesnot exists in database table). How can I do this. Please suggest.

    Your SSIS package will error out unless the Excel file is in the exact format it is expecting. Unless

    a) You check the format before invoking the package.

    b) You set 'delay validation' to true in the package (against the Excel data source) and perform the validation before the dataflow is invoked, probably in a Script Task.

    Either way should work, though the second option is possibly better - you can do all of your logging and error handling in one place.

    By the way, 'schema' means more than just 'column name' - are you intending to check data types too?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi

    Thanks for quick response. Let assume that I have schema of both source and destination in the following formats.

    In excel File, I have columns (Id, Name, Salary) and in database I have columns (Id,FullName,Sal). If this is the schema then I want to throw error message to user that "Name,Salary columns mismatched. Please change the Source and retry" instead of saving the data.

  • gajayku (7/2/2014)


    Hi

    Thanks for quick response. Let assume that I have schema of both source and destination in the following formats.

    In excel File, I have columns (Id, Name, Salary) and in database I have columns (Id,FullName,Sal). If this is the schema then I want to throw error message to user that "Name,Salary columns mismatched. Please change the Source and retry" instead of saving the data.

    Option a) sounds better, in that case. SSIS is not designed to be an interactive solution.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • gajayku (7/2/2014)


    Hi

    Thanks for quick response. Let assume that I have schema of both source and destination in the following formats.

    In excel File, I have columns (Id, Name, Salary) and in database I have columns (Id,FullName,Sal). If this is the schema then I want to throw error message to user that "Name,Salary columns mismatched. Please change the Source and retry" instead of saving the data.

    there's more to a schema than just names; data types, sizes, nullability should all be in there too.

    you also need to check for columns that exist int he source, but not in the Excel(Id, Name, Salary) is not Equal to SQL(Id, Name, Salary,DateOfHire)

    if you just have a list of column names, you can just query SQL directly and make a decision based on whether they match or not, i guess.

    it also depends on if you Excel has the column definitions across the top. or as rows of data. that affects how you are going to compare the data.

    if you were just querying SQL, something like this might get you started:

    ;WITH MyCTE([ExcelColName])

    AS

    (

    SELECT 'Id' UNION ALL

    SELECT 'Name' UNION ALL

    SELECT 'Salary'

    )

    SELECT [ExcelColName] FROM MyCTE

    EXCEPT

    SELECT name from sys.columns WHERE object_name(object_id) = 'SQLTable'

    UNION ALL

    SELECT name from sys.columns WHERE object_name(object_id) = 'SQLTable'

    EXCEPT

    SELECT [ExcelColName] FROM MyCTE

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Excel file depends on user input. I cannot say what type of columns exists in the excel file. I can get the columns at run time.

  • gajayku (7/2/2014)


    Excel file depends on user input. I cannot say what type of columns exists in the excel file. I can get the columns at run time.

    SSIS depends on static formats. It is starting to sound like you need to consider whether it is the right tool for this job.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hello,

    My question may have confused you. I want to make it more clear. here is the flow:

    I m creating a Excel source with static columns and database with static columns (say Id,Name,Salary) in SSIS package(when I run the package it is working fine). Now, I created an MVC application which provides an option to user to upload an excel file. Then user can upload any excel file( that can be of any format). When user uploads the file I m sending this excel file to SSIS package which I created previously. As this excel(which is uploaded by user), may or may not match with the schema of the excel file of SSIS package. If it does not match then I need to throw custom exception to user that these specific columns does not match.

  • If the users can upload any Excel file then I don't see how you can have a static Excel Source.

    It is possible to create a dynamic Excel destination through the use of OPENQUERY and accessing the INFORMATION_SCHEMA views, but I don't know how to achieve what you are trying to do nor whether it's actually possible.

    Regards

    Lempster

  • Lempster (7/3/2014)


    If the users can upload any Excel file then I don't see how you can have a static Excel Source.

    It is possible to create a dynamic Excel destination through the use of OPENQUERY and accessing the INFORMATION_SCHEMA views, but I don't know how to achieve what you are trying to do nor whether it's actually possible.

    Regards

    Lempster

    I was thinking that this would be the process:

    1) User uploads Excel file

    2) Format of Excel file is validated

    a) If fail, error feedback to user and return to step 1 / exit process

    3) Copy Excel file to location expected by package

    4) Invoke package

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (7/3/2014)


    I was thinking that this would be the process:

    1) User uploads Excel file

    2) Format of Excel file is validated

    a) If fail, error feedback to user and return to step 1 / exit process

    3) Copy Excel file to location expected by package

    4) Invoke package

    Using a Script Task (for Step 2) you mean? If Delay Validation is set to False it would error out, but would not be able to provide a nice message to the user informing them what was wrong.

    If the name of the Excel file is known, it might be possible to dump out all the column names into a table by using OPENQUERY to connect to the Excel file as if it were a table. The comparison could then be made between the two tables (source and destination) using only TSQL which would be a lot easier. That may only work if the Excel file has only one sheet containing data though.

    Regards

    Lempster

  • Using a Script Task (for Step 2) you mean? If Delay Validation is set to False it would error out, but would not be able to provide a nice message to the user informing them what was wrong.

    If the name of the Excel file is known, it might be possible to dump out all the column names into a table by using OPENQUERY to connect to the Excel file as if it were a table. The comparison could then be made between the two tables (source and destination) using only TSQL which would be a lot easier. That may only work if the Excel file has only one sheet containing data though.

    No - I was assuming that Step 2 would be accomplished before calling the package, via some nifty MVC code 🙂

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply