March 11, 2015 at 1:30 am
Hi,
I need suggestion to process more than 100,000 number of XML Files in SSIS. Each XML file contains data of size more than 250 KB.
In our current solution we are processing each file at a time and for processing each file it takes 5 seconds and for processing all 100,000 XML files it takes many hours (around 5*100,000 = 500,000 seconds). Approximately 138 hours to complete the whole process.
We are planning to Create Parallel Execution for processing these files in a chunk. but we have some constraints, we have only one Server with 8 GB RAM where we have our SQL Database, SSIS Jobs and Some SSRS Reports running. Our client is not agree to give us more servers so need to do it in that server itself but client wants the performance in that condition too.
Could anybody give me any suggestion to improve the performance for processing these large number of files ?
And also How much memory it has to be in server for processing these large number of files? Is 8 GB is sufficient? if not then how much it has to be for better performance?
Thanks in Advance for any suggestion.
March 11, 2015 at 1:45 am
8GB is low, very low. The OS needs RAM, SQL Server needs RAM, so there is very little left for SSIS, which is a memory-intensive application. My laptop has more RAM. That's not a server, but a big smartphone.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 11, 2015 at 2:14 am
Thanks Koen for your response.
But could please suggest me what size of Memory should we propose to client for smooth and better performance?
And could you also give me some suggestion for processing large number (100,000 XML files) of files with better performance?
Thanks
March 11, 2015 at 2:23 am
Which edition do you have of SQL Server?
Regarding the XML files: how complex are they? What kind of processing do you do?
Parallellism seems the most obvious performance improvement.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 11, 2015 at 2:52 am
1.Which edition do you have of SQL Server?
2.Regarding the XML files: how complex are they? What kind of processing do you do?
Answer of your Question:
1.We are using SQL Server 2012 Enterprise Edition.
2. XML File are not so complex. We are just reading these files and loading it in Target Table.
How should we proceed for parallelism? We have only one Target table where we need to load data from all files.
We have one thought which is by creating dummy tables for loading some chunk of files in parallel but will it will valid way or proper design to go ahead and load the data ?
Thanks
March 11, 2015 at 3:08 am
I think the dummy tables is a good idea to reduce locking and contention on the destination table.
At the end, you can write a UNION ALL query on all the dummy tables to store the result in the final destination table.
The difficult thing is to divide all your XML files in equal chunks.
If you have Enterprise Edition, 8GB is seriously underpowered. I mean, even laptops at the discount store have 8GB RAM.
I don't know what else you do on that server, but personally I would shoot for at least 64GB.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 11, 2015 at 3:19 am
pinkuraj.l (3/11/2015)
1.Which edition do you have of SQL Server?2.Regarding the XML files: how complex are they? What kind of processing do you do?
Answer of your Question:
1.We are using SQL Server 2012 Enterprise Edition.
2. XML File are not so complex. We are just reading these files and loading it in Target Table.
How should we proceed for parallelism? We have only one Target table where we need to load data from all files.
We have one thought which is by creating dummy tables for loading some chunk of files in parallel but will it will valid way or proper design to go ahead and load the data ?
Thanks
So the client is willing to pay for Enterprise Edition licenses, but won't fork out for some extra memory. :crazy: They might as well run Express Edition!
March 11, 2015 at 4:26 am
Thanks Koen for giving your input on my opinion for processing the Files. Is there any other way around?
Yes, I agree the major thing is for bifurcating the Large no. of XML Files. As of now I don't have any idea how divide those file first.
There is no major services running apart from SQL Server Engine, SSIS, SSRS.
So, as you said I think I can approach with 64 GB RAM.
Thanks
March 11, 2015 at 4:32 am
Yes Lempster. Some times customer does not want to understand at all :-D.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply