April 29, 2016 at 11:53 am
Grant Fritchey (4/29/2016)
jasona.work (4/29/2016)
Grant Fritchey (4/29/2016)
Just a follow up on my other post (and apologies for bring tech & real work into the Thread, but you guys have the answers so I may as well post the question here).All the feedback was good and helpful. Now, the remaining question is on the paging. I know of a number of different ways to get this done. However, I suspect you guys have run tests and can make a good recommendation on the best method for breaking apart the data for export (regardless of the precise method of the export). Please don't get hung on the 10,000 row value either. It was just an arbitrary number. Could be 5,000 or 1,000,000. I just need to be able to efficiently chunk the data needed for export.
Vielen Danke!
As punishment for bringing "Real Work(TM)" into The Thread, you must now buy everyone who comes up to you at an event, who utters the phrase "the hippo sent me with the pork chop launcher," a drink, not to exceed $3 USD, from now until 6 May 2016.
:-D:hehe:
Hmmm... that's an interesting choice on dates.
Besides, according to this, I'm entitled to drinks from everyone.
Well, I figured for a first-time offense, a light sentence was warranted...
Plus, I have no idea if there even *ARE* any events that you'd be attending in that time frame, and didn't want to stick you with buying drinks for months to come...
But a second offense, well...
We'll just keep the day and month the same, and increase the year...
😀
April 29, 2016 at 12:08 pm
Grant Fritchey (4/29/2016)
Brandie Tarvin (4/29/2016)
So a coworker attended a gender diversity workshop. She forwarded me the presentation with a slide that says women constitute 40% of database administrators, the highest of any computer science occupation.Other occupations break down as thus:
Web Developers: 37%
Computer System Analysts: 35%
Computer & Information systems Managers: 30%
and it keeps going down to computer network architects: 11%
So what does it say about database administration that this job attracts more women than any of the other computer science occupations? Actually, what does it say about DBA culture that gender diversity is so high?
Good things. Is this published data?
I'm not sure. I wanted to post a pic of the slide, but then decided to error on the side of caution as the slide does belong to the people who generated it. It was generated by CEB[/url] (link note: I guessed based on the home slide's tag line. This may or may not be the same company).
April 29, 2016 at 12:28 pm
Brandie Tarvin (4/29/2016)
Grant Fritchey (4/29/2016)
Brandie Tarvin (4/29/2016)
So a coworker attended a gender diversity workshop. She forwarded me the presentation with a slide that says women constitute 40% of database administrators, the highest of any computer science occupation.Other occupations break down as thus:
Web Developers: 37%
Computer System Analysts: 35%
Computer & Information systems Managers: 30%
and it keeps going down to computer network architects: 11%
So what does it say about database administration that this job attracts more women than any of the other computer science occupations? Actually, what does it say about DBA culture that gender diversity is so high?
Good things. Is this published data?
I'm not sure. I wanted to post a pic of the slide, but then decided to error on the side of caution as the slide does belong to the people who generated it. It was generated by CEB[/url] (link note: I guessed based on the home slide's tag line. This may or may not be the same company).
This might work. http://www.bls.gov/cps/cpsaat11.pdf
April 29, 2016 at 12:36 pm
GilaMonster (4/29/2016)
rodjkidd (4/29/2016)
SQLBits - make sure to say hi ... It will be good to catch up with the usual mob (Gail, Steve, Grant etc... 😉 )Errr....
http://sqlinthewild.co.za/index.php/2016/04/26/upcoming-conferences/
Ah yes good point. I knew about insidesql. I think it was your tweet about sorting out hotels made me think you were at bits. Or I just got insidesql and bits muddled up. In my defence I haven't looked at the schedule past Friday lunchtime!
I'm not sure if I can make inside SQL.
So er... Until the next one then!
Rodders...
April 29, 2016 at 12:52 pm
jasona.work (4/29/2016)
Grant Fritchey (4/29/2016)
Just a follow up on my other post (and apologies for bring tech & real work into the Thread, but you guys have the answers so I may as well post the question here).All the feedback was good and helpful. Now, the remaining question is on the paging. I know of a number of different ways to get this done. However, I suspect you guys have run tests and can make a good recommendation on the best method for breaking apart the data for export (regardless of the precise method of the export). Please don't get hung on the 10,000 row value either. It was just an arbitrary number. Could be 5,000 or 1,000,000. I just need to be able to efficiently chunk the data needed for export.
Vielen Danke!
As punishment for bringing "Real Work(TM)" into The Thread, you must now buy everyone who comes up to you at an event, who utters the phrase "the hippo sent me with the pork chop launcher," a drink, not to exceed $3 USD, from now until 6 May 2016.
:-D:hehe:
That needs to be not to exceed $8 USD, what are you going to get at a bar for $3 USD or less, water?
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 29, 2016 at 1:59 pm
TomThomson (4/28/2016)
Jeff Moden (4/28/2016)
Grant Fritchey (4/28/2016)
Technical question:I want to export large amounts of data. However, I don't want to use SSIS. I'm going to use SQLCMD.EXE. I need to be able to script these exports so that I do, let's say, 10,000 rows at a time, in chunks. I'm doing this so that I have the ability to easily restart, and, allow for starting to migrate or move the exported files as each chunk finishes.
What's the best way to, in a relatively generic fashion, efficiently divide a table into these 10,000 row chunks?
Articles, blog posts, some documentation to get this going in the right direction. Please!
Oh, and, what the heck... URGENT!
Ha!
BCP and xp_CmdShell. Simple and very fast.
Yes, old school is often best. It worked a decade and a half ago, and it can still work today.
If the stuff contains duplicate rows (:sick:) you need to take care of that when recording where you are up to, and it may mean you can't use precise 10,000 row chunks. In any case you need to make sure your copying and your recording of where you are at happen in the same transaction. I don't recall there being any problems apart from that (but it's a long time since I last did it).
As an alternative to xp_cmdshell and BCP, do it with an SQL agent job - a jobstep copies 10,000 rows with an insert-select statement and some extra rows if the last or those rows is not the last (or only) row with particular values using another select statement, then writes a where I'm up to record, then perhaps delays for a time and then perhaps picks itself as next step (can't remember if you need two steps to do this - it's all so long ago), or it can put the copying and place marking and delay code in a while loop within the step, or can just exit (and take the next 10,000 rows when next it's scheduled). That too is old school that will still work.
Just out of curiosity, how did you (or anyone else) solve the problem of consistency across multiple queries / transactions? I mean, without duplicating the entire table at the very start of the export process, wouldn't this process be subject to most of the pitfalls of sensitive / dynamic (i.e. non-STATIC) cursors and non-Serializable transactions? There will be rows that disappear between queries, new rows added (that could be added to a position that is earlier in the sequence of records since), and rows updated after being written to file but before the end of the operation. Each of these scenarios (and possibly others) would render the total set of export files as potentially in conflict, depending on the business rules for how that particular data works. Hence the files might not be importable.
I have an idea or two, but I don't think that they would work in all situations so I am wondering how you overcame this in the past.
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
April 29, 2016 at 3:08 pm
Solomon Rutzky (4/29/2016)
TomThomson (4/28/2016)
Jeff Moden (4/28/2016)
Grant Fritchey (4/28/2016)
Technical question:I want to export large amounts of data. However, I don't want to use SSIS. I'm going to use SQLCMD.EXE. I need to be able to script these exports so that I do, let's say, 10,000 rows at a time, in chunks. I'm doing this so that I have the ability to easily restart, and, allow for starting to migrate or move the exported files as each chunk finishes.
What's the best way to, in a relatively generic fashion, efficiently divide a table into these 10,000 row chunks?
Articles, blog posts, some documentation to get this going in the right direction. Please!
Oh, and, what the heck... URGENT!
Ha!
BCP and xp_CmdShell. Simple and very fast.
Yes, old school is often best. It worked a decade and a half ago, and it can still work today.
If the stuff contains duplicate rows (:sick:) you need to take care of that when recording where you are up to, and it may mean you can't use precise 10,000 row chunks. In any case you need to make sure your copying and your recording of where you are at happen in the same transaction. I don't recall there being any problems apart from that (but it's a long time since I last did it).
As an alternative to xp_cmdshell and BCP, do it with an SQL agent job - a jobstep copies 10,000 rows with an insert-select statement and some extra rows if the last or those rows is not the last (or only) row with particular values using another select statement, then writes a where I'm up to record, then perhaps delays for a time and then perhaps picks itself as next step (can't remember if you need two steps to do this - it's all so long ago), or it can put the copying and place marking and delay code in a while loop within the step, or can just exit (and take the next 10,000 rows when next it's scheduled). That too is old school that will still work.
Just out of curiosity, how did you (or anyone else) solve the problem of consistency across multiple queries / transactions? I mean, without duplicating the entire table at the very start of the export process, wouldn't this process be subject to most of the pitfalls of sensitive / dynamic (i.e. non-STATIC) cursors and non-Serializable transactions? There will be rows that disappear between queries, new rows added (that could be added to a position that is earlier in the sequence of records since), and rows updated after being written to file but before the end of the operation. Each of these scenarios (and possibly others) would render the total set of export files as potentially in conflict, depending on the business rules for how that particular data works. Hence the files might not be importable.
I have an idea or two, but I don't think that they would work in all situations so I am wondering how you overcame this in the past.
I had the same concern you did about consistency. The only items I came up with were Serializable Isolation (please no!), Snapshot Isolation (single connection), backup/restore an offline copy, CDC (+ some complex reconciliation scripts), and Database Snapshots. It seems like you either need to block everyone from making changes, or have some way of capturing and reconciling changes like snapshot isolation does.
Chad
April 29, 2016 at 3:11 pm
Brandie Tarvin (4/29/2016)
you can do anything you put your mind to.
Reminds me of the strip that showed up on my Dilbert calendar this week: http://dilbert.com/strip/2013-04-25 "If you work hard, you can be anything you want"
Chad
April 29, 2016 at 3:23 pm
Chad Crawford (4/29/2016)
Brandie Tarvin (4/29/2016)
you can do anything you put your mind to.Reminds me of the strip that showed up on my Dilbert calendar this week: http://dilbert.com/strip/2013-04-25 "If you work hard, you can be anything you want"
HA.
April 29, 2016 at 3:37 pm
Brandie Tarvin (4/29/2016)
Chad Crawford (4/29/2016)
Brandie Tarvin (4/29/2016)
you can do anything you put your mind to.Reminds me of the strip that showed up on my Dilbert calendar this week: http://dilbert.com/strip/2013-04-25 "If you work hard, you can be anything you want"
HA.
I like it!
:w00t:
April 29, 2016 at 3:43 pm
Also reminds me of a quote from 30 Rock that seems to be how a lot of people approach technology:
Tracy Jordan: Doctor Spaceman, when they check my DNA, will they tell me what diseases I might get, or help me to remember my ATM pin code?
Dr. Leo Spaceman: Absolutely. Science is whatever we want it to be.
Cheers!
April 29, 2016 at 3:43 pm
Luis Cazares (4/29/2016)
Brandie Tarvin (4/29/2016)
Grant Fritchey (4/29/2016)
Brandie Tarvin (4/29/2016)
So a coworker attended a gender diversity workshop. She forwarded me the presentation with a slide that says women constitute 40% of database administrators, the highest of any computer science occupation.Other occupations break down as thus:
Web Developers: 37%
Computer System Analysts: 35%
Computer & Information systems Managers: 30%
and it keeps going down to computer network architects: 11%
So what does it say about database administration that this job attracts more women than any of the other computer science occupations? Actually, what does it say about DBA culture that gender diversity is so high?
Good things. Is this published data?
I'm not sure. I wanted to post a pic of the slide, but then decided to error on the side of caution as the slide does belong to the people who generated it. It was generated by CEB[/url] (link note: I guessed based on the home slide's tag line. This may or may not be the same company).
This might work. http://www.bls.gov/cps/cpsaat11.pdf
Yeah, that's pretty good. However, statisticians & analysts have parity, which is really interesting.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 29, 2016 at 3:53 pm
Chad Crawford (4/29/2016)
Solomon Rutzky (4/29/2016)
TomThomson (4/28/2016)
Jeff Moden (4/28/2016)
Grant Fritchey (4/28/2016)
Technical question:I want to export large amounts of data. However, I don't want to use SSIS. I'm going to use SQLCMD.EXE. I need to be able to script these exports so that I do, let's say, 10,000 rows at a time, in chunks. I'm doing this so that I have the ability to easily restart, and, allow for starting to migrate or move the exported files as each chunk finishes.
What's the best way to, in a relatively generic fashion, efficiently divide a table into these 10,000 row chunks?
Articles, blog posts, some documentation to get this going in the right direction. Please!
Oh, and, what the heck... URGENT!
Ha!
BCP and xp_CmdShell. Simple and very fast.
Yes, old school is often best. It worked a decade and a half ago, and it can still work today.
If the stuff contains duplicate rows (:sick:) you need to take care of that when recording where you are up to, and it may mean you can't use precise 10,000 row chunks. In any case you need to make sure your copying and your recording of where you are at happen in the same transaction. I don't recall there being any problems apart from that (but it's a long time since I last did it).
As an alternative to xp_cmdshell and BCP, do it with an SQL agent job - a jobstep copies 10,000 rows with an insert-select statement and some extra rows if the last or those rows is not the last (or only) row with particular values using another select statement, then writes a where I'm up to record, then perhaps delays for a time and then perhaps picks itself as next step (can't remember if you need two steps to do this - it's all so long ago), or it can put the copying and place marking and delay code in a while loop within the step, or can just exit (and take the next 10,000 rows when next it's scheduled). That too is old school that will still work.
Just out of curiosity, how did you (or anyone else) solve the problem of consistency across multiple queries / transactions? I mean, without duplicating the entire table at the very start of the export process, wouldn't this process be subject to most of the pitfalls of sensitive / dynamic (i.e. non-STATIC) cursors and non-Serializable transactions? There will be rows that disappear between queries, new rows added (that could be added to a position that is earlier in the sequence of records since), and rows updated after being written to file but before the end of the operation. Each of these scenarios (and possibly others) would render the total set of export files as potentially in conflict, depending on the business rules for how that particular data works. Hence the files might not be importable.
I have an idea or two, but I don't think that they would work in all situations so I am wondering how you overcame this in the past.
I had the same concern you did about consistency. The only items I came up with were Serializable Isolation (please no!), Snapshot Isolation (single connection), backup/restore an offline copy, CDC (+ some complex reconciliation scripts), and Database Snapshots. It seems like you either need to block everyone from making changes, or have some way of capturing and reconciling changes like snapshot isolation does.
Chad
Yeah, I had been thinking of CDC but that is only available in Enterprise Edition (hence not working in all situations). But I suppose only Change Tracking is needed since if one were to go down this path then it would be easier to just update the entire modified row (doesn't really matter if only one column changed). HOWEVER, that still leaves you with:
1) messy reconciliation script and process (i.e. added complication),
2) not sure if this works in situations where someone is already using Change Tracking,
3) not sure if this works in situations where there is no PK and ROW_NUMBER() needs to be added because if rows are deleted that have already been written, while those can be removed via reconciliation, the effect of removing those rows is that all rows past that point shift down, and so if you track which row number you ended the last set at, then when you start up again, the next starting row might be after where you ended and hence rows would be skipped AND not reflected in Change Tracking since they didn't change.
Copying the table or doing a snapshot seems to be required, but that could be problematic for very large tables (and I believe the desire is to be able to handle Warehouse-sized tables) since you might not have room for those copies.
Without knowing the specific end-goal (but understandable why it can't be shared), it is difficult to say if the following would work, but given the stated goal of wanting to allow for the migration to start prior to the completion of exporting the data, I just remembered that there is a way to do this, and without even hitting the file system: use the SqlBulkCopy class in .NET, which can be called from a Windows app, a console app, or a SQLCLR object. The particular medium doesn't matter as it all does the same thing, which is to take a result set and save it to a table somewhere. It allows for setting "BatchSize" and some other options that are also available in BCP, but the key factor here is that it essentially streams the result set to the destination if you simply call SqlCommand.ExecuteReader and then pass the DataReader to the Write (or is it WriteToServer?) method of SqlBulkCopy. Meaning, that would start the migration the moment the first result set row is available to the DataReader. And being in a single operation means it would have that consistency, and can provide it without blocking if Snapshot Isolation has been enabled. And no file system permissions need to be worried about, and no need to worry about running out of space for the export files, and no need to slow down the process just to write to disk and then read from disk. Ergo: 😎 and :w00t:
P.S. I forgot to mention: if one wanted to do this via SQLCLR, that is already available in the Free version of the SQL# library (which I am the author of). Just look for the Stored Procedure DB_BulkCopy.
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
April 30, 2016 at 4:01 pm
Chad Crawford (4/29/2016)
Solomon Rutzky (4/29/2016)
TomThomson (4/28/2016)
Jeff Moden (4/28/2016)
Grant Fritchey (4/28/2016)
Technical question:I want to export large amounts of data. However, I don't want to use SSIS. I'm going to use SQLCMD.EXE. I need to be able to script these exports so that I do, let's say, 10,000 rows at a time, in chunks. I'm doing this so that I have the ability to easily restart, and, allow for starting to migrate or move the exported files as each chunk finishes.
What's the best way to, in a relatively generic fashion, efficiently divide a table into these 10,000 row chunks?
Articles, blog posts, some documentation to get this going in the right direction. Please!
Oh, and, what the heck... URGENT!
Ha!
BCP and xp_CmdShell. Simple and very fast.
Yes, old school is often best. It worked a decade and a half ago, and it can still work today.
If the stuff contains duplicate rows (:sick:) you need to take care of that when recording where you are up to, and it may mean you can't use precise 10,000 row chunks. In any case you need to make sure your copying and your recording of where you are at happen in the same transaction. I don't recall there being any problems apart from that (but it's a long time since I last did it).
As an alternative to xp_cmdshell and BCP, do it with an SQL agent job - a jobstep copies 10,000 rows with an insert-select statement and some extra rows if the last or those rows is not the last (or only) row with particular values using another select statement, then writes a where I'm up to record, then perhaps delays for a time and then perhaps picks itself as next step (can't remember if you need two steps to do this - it's all so long ago), or it can put the copying and place marking and delay code in a while loop within the step, or can just exit (and take the next 10,000 rows when next it's scheduled). That too is old school that will still work.
Just out of curiosity, how did you (or anyone else) solve the problem of consistency across multiple queries / transactions? I mean, without duplicating the entire table at the very start of the export process, wouldn't this process be subject to most of the pitfalls of sensitive / dynamic (i.e. non-STATIC) cursors and non-Serializable transactions? There will be rows that disappear between queries, new rows added (that could be added to a position that is earlier in the sequence of records since), and rows updated after being written to file but before the end of the operation. Each of these scenarios (and possibly others) would render the total set of export files as potentially in conflict, depending on the business rules for how that particular data works. Hence the files might not be importable.
I have an idea or two, but I don't think that they would work in all situations so I am wondering how you overcame this in the past.
I had the same concern you did about consistency. The only items I came up with were Serializable Isolation (please no!), Snapshot Isolation (single connection), backup/restore an offline copy, CDC (+ some complex reconciliation scripts), and Database Snapshots. It seems like you either need to block everyone from making changes, or have some way of capturing and reconciling changes like snapshot isolation does.
Chad
It's all too long ago, and I'm not sure what we did. I have a vague recollection of two quite different things that were used in two different cases.
The second one was pretty trivial - the stuff being exported was history and essentially the records in the original db that were old enough to be of interest didn't get updated or deleted, (actually the eventually did get deleted, we cleared stuff out of the active db once they had been included in the history db's full backup at least twice), and cluster keys were strictly increasing so there was no problem with inserts. The apps there ran with either repeatable read or serializable, and I think the export used repeatable read.
The first (much earlier) one was rather more complicated, and it was on a strictly non-critical system (research, not production - not product development). But it was history oriented in a sense. The active database was such that the state at any time in the past was recoverable, an app could choose to read the database as it had been at a certain date. The initial export was done by reading the state at a past instant and exporting, subsequent updates to the exported data were done by a newer state and updating anything that was different from the exported state or was not in the exported state. There were of course timestamps all over the place, it was at least partly based on some of Date's ideas (I think - they might have been someone else's, I've been out of touch with datase research for too long to be sure that they came from him) about a normal form for databases that would permit complete knowledge of database history.
It strikes me as very possible that neither of those history-based approaches would be applicable to Grant's problem. But the idea of trying to do it using just isolation levels doesn't seen likely to work either. Serializable is probably OTT since repeatable read may be adequate (maybe that depends on what sort of updating is going on, I haven't thought it through) but I suspect it isn't going to deliver high performance; nor is snapshot isolation, the temp database is probably going to grow too much when a vast collection of export chunks have to be mutually compatible and snapshot isolation is used to ensure that by building them all in one enormous transaction.
Tom
April 30, 2016 at 9:47 pm
Grant Fritchey (4/28/2016)
Technical question:I want to export large amounts of data. However, I don't want to use SSIS. I'm going to use SQLCMD.EXE. I need to be able to script these exports so that I do, let's say, 10,000 rows at a time, in chunks. I'm doing this so that I have the ability to easily restart, and, allow for starting to migrate or move the exported files as each chunk finishes.
What's the best way to, in a relatively generic fashion, efficiently divide a table into these 10,000 row chunks?
Articles, blog posts, some documentation to get this going in the right direction. Please!
Oh, and, what the heck... URGENT!
Ha!
Ok, so based on some of the problems folks think you might have with chunking data, I'm curious. How many columns and how many rows do you want to export? And will they be delimited or fixed field?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 53,866 through 53,880 (of 66,712 total)
You must be logged in to reply to this topic. Login to reply