Insert batches of 50-100 rows using UNION ALL - Huge hit on CPU

  • Hello SQL Gurus,

    We are working on a project where we have a 6 flat files each with a header row, 10,000 to 3.5 million rows, and a footer row. The developer tasked with getting this data into the SQL Server database has written a Perl script to insert this data into each table. This will happen on a nightly basis and will insert into empty tables with no indexes or constraints. His script is running as 4 different processes to speed up the time it takes 2 insert. I do not have access to this script, nor do understand 100% how it functions. At this point in the project, only minor changes can be made to the script.

    What I do know is that each process is running statements in batches of 50-100 rows using the following in each statement:

    INSERT INTO tableA (column1, column2, column3, column4)

    SELECT 'wwww001','xxxx001', 'yyyy001', 'zzzz001'

    UNION ALL

    ...

    UNION ALL

    SELECT 'wwww100', 'xxxx100', 'yyyy100', 'zzzz100'

    The problem is that this is hitting the CPU hard. For about 30-40 minutes, the CPU usage on the server is between 85-95%.

    What can I do or suggest to improve the CPU usage without moving from Perl construct or asking to use BCP? i.e. Can I change anything on the database without changing the script? Should we remove the UNION ALL and insert each row independently? Again, BCP and SSIS is not an option at this point.

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • BCP will eat 1 M rows like it's nothing. If you really are having issues with the import (like users can't work), then I'd redo it.

  • Thanks for the response. Unfortunately, for the short term BCP is not doable. The only issue is the CPU usage being so high. This is a staging server and not being accessed by users or applications. My concern right now is "can the server tolerate being at such a high CPU usage for so long?" I don't know the answer to this not being as knowledgeable on server performance versus SQL Server.

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • Perl has its place...but not for the use-case you described...unless it's calling bcp or dtexec 🙂

    Can you use Perl to strip off the header and footer and then use BULK INSERT from within a stored procedure?

    You can try reducing the number of threads the Perl runs on down to 1, which will lessen the impact on your server, but will almost certainly make the process run longer.

    You could try switching from using UNION ALL to using individual inserts...I won;t venture a guess as to what that would do to performance. If you try it please post back the results.

    I'll go with the idea that you're on SQL2005 since that's the forum we're in otherwise I would have recommended using row constructors instead of UNION ALL.

    INSERT INTO tableA (column1, column2, column3, column4)

    SELECT 'wwww001','xxxx001', 'yyyy001', 'zzzz001'

    UNION ALL

    ...

    UNION ALL

    SELECT 'wwww100', 'xxxx100', 'yyyy100', 'zzzz100'

    That said, if you do try individual inserts you may be able to save some network bandwidth by constructing your inserts with no column-list (see caveat below). Something like this:

    INSERT tableA VALUES ('wwww001','xxxx001', 'yyyy001', 'zzzz001');

    ...

    INSERT tableA VALUES ('wwww100', 'xxxx100', 'yyyy100', 'zzzz100');

    The caveat is that if you omit the column list then the VALUES are evaluated ordinally and SQL Server assumes they are assignable starting with the first column in the table, then the second and so on through the column list. The hope is that you save some transfer time between the Perl and SQL Server by omitting the column list.

    You could also try submitting batches bigger than 50-100 rows. Several thousand may still perform well. If you're stuck with this design then you only have a few variables to adjust and none are jumping out to me as big hitters that will save the day. Are you sure you can't get to a place where BCP or BULK INSERT can be used?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I'm not 100% sure I understand your concern then. Importing millions of rows is going to take processing power, no matter the tool used for the job.

    If you don't have any problems right now nor in the forseeable future then I'd leave it alone. (don't fix it if it ain't broken).

  • The problem right now with using BCP right now is that the Perl script validates data by what errors occur during insertion. These defunct rows are written to a file and sent back to the provider. Using BCP will not allow us to do this as far as I know.

    I will try some of your suggestions. This is only a short term solution to a larger project and unfortunately time constraints have forced me down a path where reworking the Perl script too much is not a viable solution for now.

    You are correct that there are not really any problems right now that are visible. My concern is the potential unseen effect of having the high CPU usage. Hopefully some of your suggestions help even a little bit. 🙂

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • The only "problem" is that the cpu can't process another high load at the same time. On the very long run, maybe premature death of the processors.

    Other than that don't waste time on something that's working!

  • jared-709193 (4/28/2011)


    The problem right now with using BCP right now is that the Perl script validates data by what errors occur during insertion. These defunct rows are written to a file and sent back to the provider. Using BCP will not allow us to do this as far as I know.

    I think it can. Lookup the -m and -e options: http://msdn.microsoft.com/en-us/library/ms162802.aspx

    If you set -m and use -e then all bad rows can be "skipped" and written to the error file while bcp keeps trucking along.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (4/28/2011)


    jared-709193 (4/28/2011)


    The problem right now with using BCP right now is that the Perl script validates data by what errors occur during insertion. These defunct rows are written to a file and sent back to the provider. Using BCP will not allow us to do this as far as I know.

    I think it can. Lookup the -m and -e options: http://msdn.microsoft.com/en-us/library/ms162802.aspx

    If you set -m and use -e then all bad rows can be "skipped" and written to the error file while bcp keeps trucking along.

    Thanks for the info... I never had to master that tool so I don't know all the ins and outs.

    It certainly looks like you have a viable option. At the very least I'd do a test import just to see how much ressources are required. It should be too long to setup!

  • Ninja's_RGR'us (4/28/2011)


    opc.three (4/28/2011)


    jared-709193 (4/28/2011)


    The problem right now with using BCP right now is that the Perl script validates data by what errors occur during insertion. These defunct rows are written to a file and sent back to the provider. Using BCP will not allow us to do this as far as I know.

    I think it can. Lookup the -m and -e options: http://msdn.microsoft.com/en-us/library/ms162802.aspx

    If you set -m and use -e then all bad rows can be "skipped" and written to the error file while bcp keeps trucking along.

    Thanks for the info... I never had to master that tool so I don't know all the ins and outs.

    It certainly looks like you have a viable option. At the very least I'd do a test import just to see how much ressources are required. It should be too long to setup!

    Thanks to the both of you! I think in the long term, our new solution will use BCP with the -m and -e options when we can recode the script. Until then... We will not touch the script, but try different options of reducing the number of processes.

    Thanks!!!

    Jared

    Jared
    CE - Microsoft

  • I don't see any needs for a rewrite at the moment. You'd have a lot to learn and it wold be very time consumming.

    Bottom line is this. Try doing 1 bcp import with your biggest file (or all of them). Time it and monitor the cpu. That will tell you if you really have a problem with perl at the moment performance wise.

    From there you have an easy decision moving forward wether to rewrite or leave alone. BCP will definitely give you more options to change the scripts as need be and solve that issue as well.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply