June 28, 2022 at 7:59 am
Hi everyone,
So I have a database with 10-12 tables with lots of rows (in millions). I have a Individual as the main parent table and rest our child tables.
I have designed a SSIS package that uses "ZS Export XML Task"
This task first loads all the tables data into memory and then in next step it creates XML file one by one and dumps data into it.
Files can be created based on size or no. of rows. I have selected split the file by 100,000 rows each.
I have a test database also which is just few rows and creates xml files instantly.
But my main database takes forever to load and never gets to the next step.
Can anyone suggest how can I first load 100,000 rows from sql tables, create xml file and dump the data into it and then load next 100,000 rows and create next file and so on......
Or is there any other better way to perform this task?
Thanks
June 28, 2022 at 8:03 am
Or is there any other better way to perform this task?
Thanks
Presumably BACKUP/RESTORE is not an option.
For the volumes of data you are talking about, I'd probably use a pipe-delimited format. XML adds so much bloat that your file size will probably be twice as large as a well-formed text file.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
June 28, 2022 at 8:26 am
File size is not the issue.
I know there are many other methods but lets just say I have been given the task to produce it in XML.
Any possible way to do it?
June 28, 2022 at 2:17 pm
Unless you have a powerful server, I suspect "loads all the tables data into memory" may be a problem. You may be better off exporting the data to flat files, and then using .NET stream reader/writer to process data by line. Is that an option w/ Zappy?
Fundamentally, you have a lot of data, and XML is an expensive file format to process. You could find worse ways than XML, but you'd have to work really hard to do so.
June 28, 2022 at 2:34 pm
XML will definitely be slow.
It sounds like you're going from SQL Server to SQL Server. If that's the case, native format would be fastest.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 28, 2022 at 3:08 pm
Can anyone suggest how can I first load 100,000 rows from sql tables, create xml file and dump the data into it and then load next 100,000 rows and create next file and so on......
If you believe the process will work for a data set of 100k rows, you could create a table that separates the main table into batches of 100k, with a sequential numeric batchid, min and max column values to define the batch and start and end times to log progress. Use an execute sql task to return the batchids into an object variable and use a for each loop to iterate through the batchids and process the export for one batch at a time.
There are online tutorials explaining how to use a for each loop for a dataflow if you are not familiar with the process. I would set up the dataflow to run manually for a batch of 100k rows using variables (with hard coded values) to define the min and max values of a column. If this works in a reasonable time I would then put it inside a for each loop and add logging for the start and end of each batch. The loop will know the batchid which will allow you to populate the min and max variables you need to limit the query. The Batchid will also allow you to update the batch table to record when the batch starts and when it completes. You might also add an error column so that on error you run an update to flag the batch that failed. You can use the BatchID in the file name too, so you don't keep trying to create the same file. I don't know how familiar you are with SSIS, but you would use a variable with an expression to create a dynamic file name and location.
As others have said, this is not a fast way to transfer data, but if you have to create xml files to store millions of rows then exporting the data in batches might be the only way. If 100k takes 30 seconds, that's five minutes per million. 10 million might be manageable, but 100 million would be too slow by anyone's standards.
June 28, 2022 at 5:50 pm
That seems to be some custom SSIS component you're using, it's entirely possible it's not optimized properly. Have you tried just generating the XML in SQL Server with XML Path and dumping that out?
June 29, 2022 at 10:12 am
That seems to be some custom SSIS component you're using, it's entirely possible it's not optimized properly. Have you tried just generating the XML in SQL Server with XML Path and dumping that out?
Yes I am able to generate XML but of limited data otherwise it takes forever and I am not successful to export generated data into XML file on local drive.
I have searched various ways but not very effective so far.
June 29, 2022 at 10:16 am
XML will definitely be slow.
It sounds like you're going from SQL Server to SQL Server. If that's the case, native format would be fastest.
Basically I have 23 XML files ranging from 200 mb to 2 gb.
Each file consists of 100,000 to 1000,000 rows.
The data in these files is originated from same source, just divided into 23 files as one XML file would be very huge.
I have been given the task to import these files into my SQL database after making some changes.
And then export them again into 23 (or more) xml files.
The importing part is done. That was also done using SSIS package and using ZappySYS task.
Now I am stuck on the exporting part.
The original 23 xml files have been generated by someone, somehow so clearly its not impossible. It may take longer to generate but its not impossible.
June 29, 2022 at 10:23 am
Unless you have a powerful server, I suspect "loads all the tables data into memory" may be a problem. You may be better off exporting the data to flat files, and then using .NET stream reader/writer to process data by line. Is that an option w/ Zappy?
Fundamentally, you have a lot of data, and XML is an expensive file format to process. You could find worse ways than XML, but you'd have to work really hard to do so.
I am not familiar with .NET stream process. Trying to achieve my goal using SQL/SSIS only so far.
I dont think Zappysys has such option.
June 29, 2022 at 10:24 am
Jobs90312 wrote:Can anyone suggest how can I first load 100,000 rows from sql tables, create xml file and dump the data into it and then load next 100,000 rows and create next file and so on......
If you believe the process will work for a data set of 100k rows, you could create a table that separates the main table into batches of 100k, with a sequential numeric batchid, min and max column values to define the batch and start and end times to log progress. Use an execute sql task to return the batchids into an object variable and use a for each loop to iterate through the batchids and process the export for one batch at a time.
There are online tutorials explaining how to use a for each loop for a dataflow if you are not familiar with the process. I would set up the dataflow to run manually for a batch of 100k rows using variables (with hard coded values) to define the min and max values of a column. If this works in a reasonable time I would then put it inside a for each loop and add logging for the start and end of each batch. The loop will know the batchid which will allow you to populate the min and max variables you need to limit the query. The Batchid will also allow you to update the batch table to record when the batch starts and when it completes. You might also add an error column so that on error you run an update to flag the batch that failed. You can use the BatchID in the file name too, so you don't keep trying to create the same file. I don't know how familiar you are with SSIS, but you would use a variable with an expression to create a dynamic file name and location.
As others have said, this is not a fast way to transfer data, but if you have to create xml files to store millions of rows then exporting the data in batches might be the only way. If 100k takes 30 seconds, that's five minutes per million. 10 million might be manageable, but 100 million would be too slow by anyone's standards.
Woah.... thats a long reply. I am trying to understand what exactly are you telling me to do... might take a while.... Sorry xD
June 30, 2022 at 4:06 pm
Couldn't you just use NTILE() to split the file into a given number of pieces?
DECLARE @Tiles INT;
SELECT @Tiles = CEILING(COUNT(*)/10000)
FROM dbo.CovidData;
SELECT
bucketNo = NTILE(@Tiles) OVER (ORDER BY submission_date),
*
FROM CovidData;
June 30, 2022 at 4:28 pm
Woah.... thats a long reply. I am trying to understand what exactly are you telling me to do... might take a while.... Sorry xD
Skip my attempt at describing the process and use a tutorial for a for leach loop:
If you can make your xml export work for 100k rows and you can select those rows using variables, then you should be able to use a for each loop to repeatedly run your export as many times as you need to account for all the rows. Each iteration of the loop will set the variables to allow you to return the next set of rows and name the export file according to the batch counter. It might take a really long time and it might be better to do 500k or 1 million at a time, usually you can find an optimal batch size after which the export starts to take exponentially longer.
June 30, 2022 at 5:10 pm
If the files are to be used with "near enough" version of SQL Server, then using XML, JSON, CSV, or TSV is just nuts. The absolute fastest way to do this is to fire up BCP to do a "native" export to the files, On the other side where you want to do an import, either VCP or BULK INSERT will be the fastest.
If your export is in the same order as the keys for the Clustered Index, you can make BULK INSERT faster using minimal logging and there's zero need to split stuff up into batches, in most cases.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 4, 2022 at 10:53 am
Couldn't you just use NTILE() to split the file into a given number of pieces?
DECLARE @Tiles INT;
SELECT @Tiles = CEILING(COUNT(*)/10000)
FROM dbo.CovidData;
SELECT
bucketNo = NTILE(@Tiles) OVER (ORDER BY submission_date),
*
FROM CovidData;
I have just studied this NTILE Function...
Can you please tell me how can I store the values generated by this function in the same table (in a new column)?
Or can I use these values in WHERE expression of select statement?
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply