Taking a single 80 character piece of Text and converting to columns in a table.

  • roger.plowman wrote:

    Andrew Goffin-302397 wrote:

    Yes, I totally agree unfortunately that is not possible to have the format changed ...

    Sometimes we have to work with what we are given ....

    🙂

    Actually, you can cheat a little. 🙂

    Nothing says you can't write a little program in a .NET language to translate the fixed format into a more SQL friendly intermediate file format, and then import the intermediate file instead of the original. You can even automate it: feed the original file into a file folder, have a scheduler run the intermediate file translator to create the intermediate file(s) in a different folder, and then have a scheduleded T/SQL task that imports the intermediate files.

    My company takes orders in whatever format the customer wants to give it to us in and we then translate it to an intermediate format, kind of like how .NET translates source code into its own intermediate format.

    Works VERY well, especially since from a database POV you only need staging files for your own format!

    Just asking nicely and not wanting to undermine your hard work , but why do that when we have BCP, Bulk insert, SSIS . so many times I have tried to re-invent the wheel and made it hard for myself. Just curious

    MVDBA

  • roger.plowman wrote:

    Andrew Goffin-302397 wrote:

    Yes, I totally agree unfortunately that is not possible to have the format changed ...

    Sometimes we have to work with what we are given ....

    🙂

    Actually, you can cheat a little. 🙂

    Nothing says you can't write a little program in a .NET language to translate the fixed format into a more SQL friendly intermediate file format, and then import the intermediate file instead of the original. You can even automate it: feed the original file into a file folder, have a scheduler run the intermediate file translator to create the intermediate file(s) in a different folder, and then have a scheduleded T/SQL task that imports the intermediate files.

    My company takes orders in whatever format the customer wants to give it to us in and we then translate it to an intermediate format, kind of like how .NET translates source code into its own intermediate format.

    Works VERY well, especially since from a database POV you only need staging files for your own format!

    This is a tidy approach, but I'd have concerns, from an audit point of view, about data having been manipulated before it even hits the database. Potentially makes that SOX audit even more painful.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin wrote:

    This is a tidy approach, but I'd have concerns, from an audit point of view, about data having been manipulated before it even hits the database. Potentially makes that SOX audit even more painful.

    Amen to that... especially since it's just not needed in this case.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • MVDBA (Mike Vessey) wrote:

    roger.plowman wrote:

    Andrew Goffin-302397 wrote:

    Yes, I totally agree unfortunately that is not possible to have the format changed ...

    Sometimes we have to work with what we are given ....

    🙂

    Actually, you can cheat a little. 🙂

    Nothing says you can't write a little program in a .NET language to translate the fixed format into a more SQL friendly intermediate file format, and then import the intermediate file instead of the original. You can even automate it: feed the original file into a file folder, have a scheduler run the intermediate file translator to create the intermediate file(s) in a different folder, and then have a scheduleded T/SQL task that imports the intermediate files.

    My company takes orders in whatever format the customer wants to give it to us in and we then translate it to an intermediate format, kind of like how .NET translates source code into its own intermediate format.

    Works VERY well, especially since from a database POV you only need staging files for your own format!

    Just asking nicely and not wanting to undermine your hard work , but why do that when we have BCP, Bulk insert, SSIS . so many times I have tried to re-invent the wheel and made it hard for myself. Just curious

    Because frankly the orders are SO VERY DIFFERENT. 🙂

    They range the gamut from CSV, SDF, XML, and Excel (both .xls and .xlsx) that it's just simpler to convert them to a common format before looking at them.

    The conversion program is also a great place to do sanity checking, it's amazing how a customer can screw up an agreed upon format. If a file doesn't match the expected format we can kick it out and send them a nasty gram. 🙂

    Once in the common format we suck it into staging tables and from there do validation checks and the rest. Doing it this way lets us leverage all the advantages of .NET code as well as T/SQL. Reinventing the wheel does often suck, but then again flexibility is often lost with canned solutions.

    YMMV

  • amen to customers screwing up CSV files, adding columns etc.  🙂

    and I get the  difference in import format files (csv, XML etc) and that you have to create new packages for each import if you use SSIS or BCP or bulk insert, how does C# help surely you must have a config file or table that is pretty much the same as an SSIS package?

    or is there way too much string manipulation in the files? - just trying to figure out if i'm missing a trick here 🙂

    MVDBA

  • getting the data into the database isn't an issue its breaking out the the differing record types ...

    However I have manged to re-code my procedure it will now load the data into it's different data tables  quicker.

     

    I have reduced the full load time from 1 hour to around 30mins ...  which on 8.4 million records I think is pretty good ...

  • well.. it depends..

    Today I've loaded 16 Million records in 34 mins - 650 bytes per row - table already had 50 Million rows

    and this is loading onto a staging table (as varchar(xxx) and then doing an insert onto final table with correct data types.

    (SSIS load to staging, then stored proc to load to final table)

    on a VM with 4 vCPU and 12 GB ram.

    what process you using for it?

     

  • MVDBA (Mike Vessey) wrote:

    amen to customers screwing up CSV files, adding columns etc.  🙂

    and I get the  difference in import format files (csv, XML etc) and that you have to create new packages for each import if you use SSIS or BCP or bulk insert, how does C# help surely you must have a config file or table that is pretty much the same as an SSIS package?

    or is there way too much string manipulation in the files? - just trying to figure out if i'm missing a trick here 🙂

    VB.Net, actually. 🙂

    And yes, the string manipulation can be intense, because sometimes you have to translate codes (customers do love to use their own codes, sigh) or do other things requiring lots of string manipulation like unpacking "intelligent" serial numbers and the like, which T/SQL usually sucks at. The real dividend is in the common format. Turns out flexibility is key, and nothing beats customization for flexibility.

     

  • Yes, string manipulation is intensive and once I have optimised for a physical machine, possibly improve the record dissection this will end up on sql database in the cloud, so hence to optimise to Nth level ....

  • Jeff Moden wrote:

    Thom A wrote:

    How to you get 26 different columns from the string 'BSNC001481905191912080000001 PEE5P01 124650005 EMU 090D P'? I see 6 at most. Where are the other 20 coming from?

    Fixed field length, Thom... oldest format in the world.  Look at the table the OP designed and realize all the columns are fixed width.  This is truly a "Unit Record Definition".

    This wasn't particularly easy for us to discern though. Had the OP simply mentioned that, we would have had an easier time spotting it and actually understanding their goals. 🙂

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A wrote:

    Jeff Moden wrote:

    Thom A wrote:

    How to you get 26 different columns from the string 'BSNC001481905191912080000001 PEE5P01 124650005 EMU 090D P'? I see 6 at most. Where are the other 20 coming from?

    Fixed field length, Thom... oldest format in the world.  Look at the table the OP designed and realize all the columns are fixed width.  This is truly a "Unit Record Definition".

    This wasn't particularly easy for us to discern though. Had the OP simply mentioned that, we would have had an easier time spotting it and actually understanding their goals. 🙂

    Actually, he did.  This was in his second post:

    BSNC001481905191912080000001 PEE5P01 124650005 EMU 090D P
    and for that to be translated to this table structure
    [recordidentity] [nchar](2) NOT NULL,
    [transactiontype] [nchar](1) NULL,
    [trainuid] [nvarchar](6) NULL,
    [daterunsfrom] [nvarchar](6) NULL,
    [daterunsto] [nvarchar](6) NULL,
    [daysrun] [nvarchar](7) NULL,
    [bankholiday running] [nchar](1) NULL,
    [trainstatus] [nchar](1) NULL,
    [traincategory] [nchar](2) NULL,
    [trainidentity] [nvarchar](4) NULL,
    [headcode] [nvarchar](4) NULL,
    [courseindicator] [nchar](1) NULL,
    [trainservicecode] [nvarchar](8) NULL,
    [portionid] [nchar](1) NULL,
    [powertype] [nvarchar](3) NULL,
    [timingload] [nvarchar](4) NULL,
    [speed] [nvarchar](3) NULL,
    [operatingcharacteristics] [nvarchar](6) NULL,
    [seatingclass] [nchar](1) NULL,
    [sleepers] [nchar](1) NULL,
    [reservations] [nchar](1) NULL,
    [connectionindicator] [nchar](1) NULL,
    [cateringcode] [nvarchar](4) NULL,
    [servicebranding] [nvarchar](4) NULL,
    [spare] [nchar](1) NULL,
    [stpindicator] [nchar](1) NULL
  • I don't really feel like I should be defending myself here, but 'BSNC001481905191912080000001 PEE5P01 124650005 EMU 090D P' is 57 characters long, the length of all those columns is 80 characters. The OP never actually said (at least prior to my reply Jeff quoted)) that it was fixed width, and supplying a 57 character length string, and saying that they have an 80 character length one didn't help that.

    This information was drip fed, and had to be worked out through intuition/guess work, and it certainly wasn't obvious when I made the reply that Jeff quoted; which was my first reply on the thread.

    The OP does, in a later reply, say that the value is actually 'BSNC001481905191912080000001 PEE5P01 124650005 EMU 090D                  P', however, that string is also only 74 characters long; though helps identify that we are likely more on the path of fixed width.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Andrew Goffin-302397 wrote:

    getting the data into the database isn't an issue its breaking out the the differing record types ...

    However I have manged to re-code my procedure it will now load the data into it's different data tables  quicker.

    I have reduced the full load time from 1 hour to around 30mins ...  which on 8.4 million records I think is pretty good ...

    Would you like to share your solution?

  • Thom A wrote:

    I don't really feel like I should be defending myself here, but 'BSNC001481905191912080000001 PEE5P01 124650005 EMU 090D P' is 57 characters long, the length of all those columns is 80 characters. The OP never actually said (at least prior to my reply Jeff quoted)) that it was fixed width, and supplying a 57 character length string, and saying that they have an 80 character length one didn't help that.

    This information was drip fed, and had to be worked out through intuition/guess work, and it certainly wasn't obvious when I made the reply that Jeff quoted; which was my first reply on the thread.

    The OP does, in a later reply, say that the value is actually 'BSNC001481905191912080000001 PEE5P01 124650005 EMU 090D                  P', however, that string is also only 74 characters long; though helps identify that we are likely more on the path of fixed width.

    Sorry, Thom... my fault for this confusion.  You're correct that it wasn't stated as fixed field at the time of the posting.  I had the advantage of viewing the thread after several other posts were made and I didn't consider the order of revelation before posting.  I'll also say that the OPs second post was the giveaway for me simply because I've done so much similar.  I could have easily been mistaken.

    And for sure, my post wasn't meant as a chastisement.  It was meant to help explain what I thought the OP was doing so others could help because it seems the OP had some difficulty explaining that it was fixed field.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Sorry, Thom... my fault for this confusion.  You're correct that it wasn't obvious at the time of the posting.  I had the advantage of viewing the thread after several other posts were made and I didn't consider the order of revelation before posting.

    No harm no foul, Jeff. 🙂

    I haven't actually got fully up to speed on this one since Thursday (had a 4 day weekend, but ended up being ill for 3 of those days! >_<), but looks like the OP has found a solution, so will be good to see what it is when they post it.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

Viewing 15 posts - 46 through 60 (of 73 total)

You must be logged in to reply to this topic. Login to reply