September 16, 2017 at 1:02 am
xsevensinzx - Friday, September 15, 2017 6:00 PMChris Harshman - Friday, September 15, 2017 7:39 AMJeff Moden - Thursday, September 14, 2017 5:38 PMYou mentioned what you don't like. What method for large data transfers do you prefer?I actually prefer pipe symbol delimited text files. I've almost never seen them used in the value of a column as opposed to commas which are much more frequent, and it makes the file more compact than a fixed width text file. I do prefer the fixed width to comma delimited though.
Depends on the industry. In advertising, Pipes, hyphens and even underscores are commonly imputed by the end user on the front-end, which causes you to still escape them in a pipe delimited export or have the application deny them.
This is why I use þorn over pipe.
That þorn can be a thorn in ones side if the text is Icelandic😉
😎
September 16, 2017 at 7:39 am
Eirikur Eiriksson - Saturday, September 16, 2017 1:02 AMxsevensinzx - Friday, September 15, 2017 6:00 PMChris Harshman - Friday, September 15, 2017 7:39 AMJeff Moden - Thursday, September 14, 2017 5:38 PMYou mentioned what you don't like. What method for large data transfers do you prefer?I actually prefer pipe symbol delimited text files. I've almost never seen them used in the value of a column as opposed to commas which are much more frequent, and it makes the file more compact than a fixed width text file. I do prefer the fixed width to comma delimited though.
Depends on the industry. In advertising, Pipes, hyphens and even underscores are commonly imputed by the end user on the front-end, which causes you to still escape them in a pipe delimited export or have the application deny them.
This is why I use þorn over pipe.
That þorn can be a thorn in ones side if the text is Icelandic😉
😎
Yep, thorns rock. I try to use it where I can.
September 17, 2017 at 4:44 pm
Jeff Moden - Friday, September 15, 2017 11:31 PMxsevensinzx - Friday, September 15, 2017 6:00 PMChris Harshman - Friday, September 15, 2017 7:39 AMJeff Moden - Thursday, September 14, 2017 5:38 PMYou mentioned what you don't like. What method for large data transfers do you prefer?I actually prefer pipe symbol delimited text files. I've almost never seen them used in the value of a column as opposed to commas which are much more frequent, and it makes the file more compact than a fixed width text file. I do prefer the fixed width to comma delimited though.
Depends on the industry. In advertising, Pipes, hyphens and even underscores are commonly imputed by the end user on the front-end, which causes you to still escape them in a pipe delimited export or have the application deny them.
This is why I use þorn over pipe.
If I had my preference, I'd use CHAR(31) for the delimiter and CHAR(30) for the "end-of-line".
They'd certainly work. Trying to get people to use them would be the hard part. I have one feed that comes to me every month that's both delimited (semicolon) and fixed-width. It's needlessly large because of all the padding, but it still beats the bloated formats hands-down.
September 18, 2017 at 1:21 am
If you have known columns represented in fixed width or delimited file there is so little processing to do that it is always going to be fast on any platform.
I can remember Buck Woody joking that you could upset a NOSQL fan by observing how like ancient COBOL flat files their brand new format was. I'm probably committing heresy by wishing they really had used such a format. At least awk, grep and bcp could make a decent fist of it.
I've never had any particular problems with XML for less than 2GB data. That's mainly because I don't shred it using SQL. I use XSLT to get it into a format where bcp can be brought to bear.
A format where random attributes can be missing is always going to be problematic. Apache Spark contains the means to generate a JSON schema from a JSON document. Obviously if elements are missing then that schema has to be manually edited, but once you have the schema document you get a guaranteed contract. You can write ANSI SQL2011 against the documents without needing strange constructs in the from clause.
To give a crude performance guideline 2 billion records BCP'd takes <20 minutes. The same thing in JSON in a small Spark cluster will take >24 hours.
September 18, 2017 at 7:30 am
David.Poole - Monday, September 18, 2017 1:21 AMIf you have known columns represented in fixed width or delimited file there is so little processing to do that it is always going to be fast on any platform.I can remember Buck Woody joking that you could upset a NOSQL fan by observing how like ancient COBOL flat files their brand new format was. I'm probably committing heresy by wishing they really had used such a format. At least awk, grep and bcp could make a decent fist of it.
I've never had any particular problems with XML for less than 2GB data. That's mainly because I don't shred it using SQL. I use XSLT to get it into a format where bcp can be brought to bear.
A format where random attributes can be missing is always going to be problematic. Apache Spark contains the means to generate a JSON schema from a JSON document. Obviously if elements are missing then that schema has to be manually edited, but once you have the schema document you get a guaranteed contract. You can write ANSI SQL2011 against the documents without needing strange constructs in the from clause.
To give a crude performance guideline 2 billion records BCP'd takes <20 minutes. The same thing in JSON in a small Spark cluster will take >24 hours.
Understood on the "crude" performance comparison. I am curious, though. How big and how many columns of what type of columns were each record in each? A SWAG will satisfy that curiosity. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 18, 2017 at 8:15 am
@jeff. 5 integers, datetime, 6 varchars, 2 of which were varchar(max) for the bcp version.
Similar for the JSON version but with some additional attributes and pointless nesting. I quite like Spark and used for the right thing it has a lot to offer. I'm just not convinced that something that can be achieved with bcp should be replaced with something that cannot.
September 18, 2017 at 5:22 pm
David.Poole - Monday, September 18, 2017 8:15 AMI'm just not convinced that something that can be achieved with bcp should be replaced with something that cannot.
I strongly second that!
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply