April 4, 2016 at 2:32 pm
patrickmcginnis59 10839 (4/4/2016)
Jeff Moden (4/4/2016)
patrickmcginnis59 10839 (4/4/2016)
But whatever floats your boat, if you can't see two inches beyond your TSQL experience, the least you could do is admit that separate language linking is at least a valid approach.You don't need to get snotty about it, Patrick. My point is that I've worked in shops where Active X, Perl, VBS, VB, C#, DOS, and a couple of other "tools" were all used in a single system that not only worked at a snail's pace (took 45 minutes just to get 1 file ready for import, never mind actually importing it) but no one that was currently employed knew about many of the tools and so no one did anything to improve the situation. What "floats my boat" is people being able to fix other people's code and, if you have "Tower of Babel" code, that task is made much more complex even for people that have more than one oar in the water.
Right, just like there is no dot net programmer who has ever studied sql servers. Forgive me for even bothering to comment.
edit: My point still stands, and I believe it, and even if your point has merit, two languages is hardly a Tower of Babel.
Snotty indeed.
As everyone has been quick to point out, both C# and VB can be used to create an SQLCLR. Why use a C# envelope on a VB core especially since it's the VB core that's going to do all the work? It doesn't make any sense.
Heh... and I'm sure that at the start of the little project I mentioned, someone said the same thing about two languages not constituting a Tower of Babel... but it sure laid the foundation for what came to be.
Not that you're interested but I did rewrite the code for that whole problem. It was all done using only T-SQL and did all of the pre-processing and final imports for 8 files in 2 minutes (still kinda slow for me) instead of just getting 1 file ready for import in 45 minutes. I know, I know... I just can't see more than two inches past my T-SQL experience but the people that have to maintain it are pretty happy now.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 4, 2016 at 3:21 pm
Jeff Moden (4/4/2016)
patrickmcginnis59 10839 (4/4/2016)
Jeff Moden (4/4/2016)
patrickmcginnis59 10839 (4/4/2016)
But whatever floats your boat, if you can't see two inches beyond your TSQL experience, the least you could do is admit that separate language linking is at least a valid approach.You don't need to get snotty about it, Patrick. My point is that I've worked in shops where Active X, Perl, VBS, VB, C#, DOS, and a couple of other "tools" were all used in a single system that not only worked at a snail's pace (took 45 minutes just to get 1 file ready for import, never mind actually importing it) but no one that was currently employed knew about many of the tools and so no one did anything to improve the situation. What "floats my boat" is people being able to fix other people's code and, if you have "Tower of Babel" code, that task is made much more complex even for people that have more than one oar in the water.
Right, just like there is no dot net programmer who has ever studied sql servers. Forgive me for even bothering to comment.
edit: My point still stands, and I believe it, and even if your point has merit, two languages is hardly a Tower of Babel.
Snotty indeed.
As everyone has been quick to point out, both C# and VB can be used to create an SQLCLR. Why use a C# envelope on a VB core especially since it's the VB core that's going to do all the work? It doesn't make any sense.
Heh... and I'm sure that at the start of the little project I mentioned, someone said the same thing about two languages not constituting a Tower of Babel... but it sure laid the foundation for what came to be.
Not that you're interested but I did rewrite the code for that whole problem. It was all done using only T-SQL and did all of the pre-processing and final imports for 8 files in 2 minutes (still kinda slow for me) instead of just getting 1 file ready for import in 45 minutes. I know, I know... I just can't see more than two inches past my T-SQL experience
but the people that have to maintain it are pretty happy now.
The fellow used an already written vb module. Whats not to like? I get it if you in particular can't read vb and c#, but I just frankly think its silly to extrapolate this into some straw palace "tower of babel" because you saw some bad programming once that did contain multiple languages. By golly if I spot a pokey module, it loses a level of abstraction or I lose some context switches, but if its fast enough for the job (that by golly at least I can judge even if you can't), programmer time gets saved for the next priority item. Because that's what grownups do, we prioritize.
I don't care that you were compelled to rewrite ANYTHING because "it wasn't invented here." Your compulsions aren't my issue.
Plus, Jeff Moden the imperial wizard can't work everywhere, occasionally we poor mortals just have to make do, right?
There! Hopefully I upped the snot level to your satisfaction LOLOL let me know how you liked it!
edit: I've never understood why folks here get touchy about contrary opinions and have to descend into personal adjectives like "snotty", "condescending", "troll". Can't people just step up and correct mistaken views on their own lack of merit? Heck for that matter, some things are just not black or white and really do amount to judgement calls often related to experience or resource levels. Heck, Jeff can't work everywhere right? Can't a regular joe like me call some vb from c# and then just call it a day?
Just wondering aloud but whatever.
April 4, 2016 at 3:56 pm
You're correct. I mostly can't tell the difference between C# and VB. I was going on what Sachin stated near his code...
Between I am not a C# expert, not even close by the slightest of margin . But when I consulted one of my friend whom I do consider a C# expert he directed me to this sweet six line piece of C# code that references the Visual Basic TextFieldParser Object.Throw anything at and it will split it for you
But, there you go being snotty again. Have fun.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 4, 2016 at 4:53 pm
Jeff Moden (4/4/2016)
You're correct. I mostly can't tell the difference between C# and VB. I was going on what Sachin stated near his code...Between I am not a C# expert, not even close by the slightest of margin . But when I consulted one of my friend whom I do consider a C# expert he directed me to this sweet six line piece of C# code that references the Visual Basic TextFieldParser Object.Throw anything at and it will split it for you
But, there you go being snotty again. Have fun.
you were awfully willing to denigrate calling a vetted dll in the vb namespace from c# without knowing what the heck you are even talking about. "tower of babel" and all that.
but it is honestly ok if this makes me the snotty one just as long as someone here supports a perfectly reasonable construct against such comments. glad to do my part
April 4, 2016 at 6:58 pm
So then you agree that there should be no language barriers and that anyone can bring whatever they're comfortable with to any task, correct? Whether I know C# or not has nothing to do with that.
As for having a bunch of tools written in two languages (one of your previous posts), that's a part of the very problem that I'm trying to convey. It's nice that ".Net is .Net" and that the DLLs will work with C# or VB but why would you start out with both to begin with and how many more languages that aren't .Net friendly will be allowed in?
But, let me ask, is there no well vetted splitter available in C#? And, yes, it's a serious question that may well justify the call to VB dll. If so, why not use the "C# DLL" instead? If not, why not use VB to call the "VB DLL" instead?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2016 at 3:18 am
Jeff Moden (4/4/2016)
So then you agree that there should be no language barriers and that anyone can bring whatever they're comfortable with to any task, correct? Whether I know C# or not has nothing to do with that.
Your (lack of?) knowledge of C# actually caused your mischaracterization in my opinion. Have you did a "project->new" in visual studio lately? Theres already a hodgepodge of tech there, its sort of what Microsoft does. c# calling vb namespace is not a "tower of babel".
Go back to calling me snotty, maybe I'll disagree with you less LOL
April 5, 2016 at 9:43 am
patrickmcginnis59 10839 (4/5/2016)
Theres already a hodgepodge of tech there, its sort of what Microsoft does
So why make it more of a hodgepodge? Both C# and VB are good for what they do. When using well vetted DLL in a fairly dedicated fashion, why use a different envelope for that functionality? And, no, not trying to be snarky here. I'm trying to see it your way.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2016 at 11:03 am
Jeff Moden (4/5/2016)
patrickmcginnis59 10839 (4/5/2016)
Theres already a hodgepodge of tech there, its sort of what Microsoft doesSo why make it more of a hodgepodge? Both C# and VB are good for what they do. When using well vetted DLL in a fairly dedicated fashion, why use a different envelope for that functionality? And, no, not trying to be snarky here. I'm trying to see it your way.
In my case, I'm picking up c# because in general it seems to pick up features sooner, and there seems to be a preference for it. Since it also links to vb code, theres not really any downside.
I think it depends on the staff, it certainly could be a bad move to bring a bunch of esoteric languages in that you now have to support when you don't have the staff to support it. Certainly if its a very small place or market, its a good bet to keep it mainstream. Both C# and VB are pretty mainstream, they both use dot net, the same ide even, so I think if you aren't put off by dot net, its a perfectly ordinary construct we see here.
Its been like that for a while with Microsoft technology, before dot net, it was common to use the same object files and link them together if you kept track of the calling methods. Reusing code was what it was about.
When you get to shops with different staffing levels, its probably even more common. Heck sometimes you call web services and you're not even concerned with what languages they're written in. Many companies even delve into language design themselves, I've been to vendor specific classes to learn their particular programming languages. In particular, I've often benefitted from prior language experience in learning new languages.
You can certainly overdo it but it also depends on the organization. Maybe you could accept that organizations differ widely in what mixes of tech they can employ and even much wider variances in tech have their place.
Its true that there is a skills disadvantage in linking between code, but even with that, theres just as much skills requirements within c# itself, heck you have folks migrating from winforms to wpf, you are probably challenged linking to vendor libraries, the very process of being a developer means you get used to this stuff. I am perfectly confident that this sort of churn DWARFS the relative mundane nature and degree of dipping into another namespace for vb textfieldparser.
Textfieldparser does perform slower according to at least one blogger. http://www.dotnetperls.com/textfieldparser
It might be that for other use cases, textfieldparser has useful features, and despite its performance downside could still be useful.
But the idea that this construct is some sort of "tower of babel" is just silly in my opinion.
April 5, 2016 at 9:05 pm
patrickmcginnis59 10839 (4/5/2016)
But the idea that this construct is some sort of "tower of babel" is just silly in my opinion.
I've seen some pretty bad things in various shops, much like what you wrote of above but worse, and that's the reason why my guard is so high. As a friend of mine once wrote about a different subject, it's become a "visceral fear".
It was interesting to see you state the very points I was poorly trying to express because I originally missed that you agreed but simply not about the bit of code we've been talking about (heh... I definitely got that part :-D). Based on what you've identified about .Net and, in particular, C#/VB (and thank you for the explanation), I have to agree... the code we're talking about isn't likely to incite the free-for-all that I've seen occur at various shops, a few of which have gone down the tubes because of it. You've proven that I need to temper the learned fear that I have on the subject. Thanks for that.
Textfieldparser does perform slower according to at least one blogger. http://www.dotnetperls.com/textfieldparser
That certainly caught my interest. After Googling for https://www.google.com/?gws_rd=ssl#q=csv+tokenizer+performance, it was amazing to see that the "other world" suffers a similar quest for a good splitter as the SQL world. Heh... if they only knew how even the slower splitters blow the doors off of what can be done using only T-SQL. It's also interesting to see the parallels between those worlds and the world of SQL when it comes to the accuracy, or rather, inaccuracy of many of the splitters.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 5, 2016 at 9:21 pm
Few months back I had a situation where I had to create instance of class that supported multiple inheritance(MI).C# does not support MI but C++(CPP) does.C# supports MI through interfaces but that was to cumbersome to implement due to the scope of the requirements.So I ended up creating a class wrapper in CPP that could inherit multiple classes.This CPP class was then used all over the project.Now just because I used a CPP class in C# project, does that mean I should recode the entire project in CPP so that it would not be labelled as "Tower of Babble".Of course I would not.I have used Excel objects in couple of my C# projects.So should I discard my C# code and recode everything instead in VBA because I made a reference to an MS Office object using COM Iterop .If one goes back and check my code it only had referenced a VB class and there was absolutely no Vb.Net code in it.
Textfieldparser does perform slower
Yes it does perform slower but at the same time it reduces the code complexity.So if you are ok with average performance and your files arent that big this is the best option.
But if the performance is a show stopper there loads of other options like using the .Net Streamreader class which is quite robust and faster than TextFieldParser or custom built objects on GitHub.For example this one
https://github.com/phatcher/CsvReader
where approximately 10 million records were parsed and imported in 11 minutes.
Another one http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader where a 45 MB CSV file containing 145 fields and 50,000 records was processed at a rate of 30 MB/sec and took only 1.5 seconds to import on a P4 3.0 GHz, 1024 MB machine.
Or take this one in PowerShell that inherited .Net Objects(another Tower of Babble) https://gallery.technet.microsoft.com/scriptcenter/Import-Large-CSVs-into-SQL-216223d9 which imported around 5 million rows in around a minute.The author made C# and VB.Net equivalent code as well available which I think is a bit flawed because I cannot see any option to specify the delimiter in them.
We had a requirement where CSV files had to be parsed and imported and a few columns from each of those files where to be ftped through individual CSV's.We utilized a custom CSV parser from GitHub for parsing and recoded it with minimal effort so that as the records were being parsed and imported,the required columns from the parsed CSV were simultaneously moved to a new csv file and ftped through a different thread and was all done at the sametime.
My point is that there are multitude of ready-made options available for CSV parsing apart from TSQL that provides you with very high performance and less code manageability along with the added flexibility of error logging,custom column mappings,serialization support,UI support,Thread management etc etc.Throw in a nice WPF or a silverlight grid that displays the records being imported as they happen along with a few fancy counters and a progress bar and with an additional functionality of emailing the user the exact problem value,row number and column name that could not be imported and I am sure you will make your user very happy.
--------------------------------------------------------------------------------------------------
I am just an another naive wannabe DBA trying to learn SQL Server
April 6, 2016 at 4:18 pm
That's how many towers start out. Simple project. Hope it doesn't bite you or the people that follow you in the long run.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 7, 2016 at 3:32 am
So there were a number of passionate replies.
Thanks for the links in the passionate replies, they might lead to something usefull for me. But for me there is still the problem of my unfamiliarity with the different techniques.
So for a .net/C#/other language solution it should be more 'contained' than a TSQL solution. Because I am definitly not 'fluent' in one of those techniques.
As for multilanguages.
I am multilangual, I wouldn't be able to communicate in this group with my native language. So there is definitly something so say for multilanguages. But I speak only a limited amount of natural languages, and some of them I speak badly.
As for non natural languages, I have programmed in quite a number of languages (and build some), but last few years I have mainly programmed in T-SQL, being different and set based, it's a very different language and at this time I am not comfortable in using a different development surroundings.
So at the moment I try to stick to the techniques I know I can handle.
So I still hope to solve the complete problem (convert a folder of .CSV files into tables in SQL-server) within the techniques I am comfortable with. I have made some progress with this. I have build a stored procedure which does just this. At the moment I exclude the large and complex files, because of the performance. But still hope tackle these within T-SQL as wel.
If there is a CLR solution with better performance I would welcome that. But this has to tackle the CSV files, and come with an instruction how to use/do this. *)
Ben
*)
Yes I think that a CLR (for example C# or VB) can handle large and complex CSV files far more efficient than T-SQL can.
What supprises me that I could not find a complete solution on the internet. Yes code in several languages which does the correct (or incorrect) split, but not an application or a 'complete' T-SQL script which transforms a single .CSV file or a complete folder into tables in SQL-server.
The Upside of this that this gives me the oppertunity to build my own T-SQL solution.
April 7, 2016 at 6:32 am
was anybody able to get a tally table looking solution to work with the delimiter inside quotes exclusion? its obviously easy with a loop, but was curious about the tally table or other faster solutions?
edit: sorry, missed Eirikur Eiriksson's post, pretty sharp!
April 7, 2016 at 8:10 am
Sachin Nandanwar (4/5/2016)
Few months back I had a situation where I had to create instance of class that supported multiple inheritance(MI).C# does not support MI but C++(CPP) does.C# supports MI through interfaces but that was to cumbersome to implement due to the scope of the requirements.So I ended up creating a class wrapper in CPP that could inherit multiple classes.This CPP class was then used all over the project.Now just because I used a CPP class in C# project, does that mean I should recode the entire project in CPP so that it would not be labelled as "Tower of Babble".Of course I would not.I have used Excel objects in couple of my C# projects.So should I discard my C# code and recode everything instead in VBA because I made a reference to an MS Office object using COM Iterop .If one goes back and check my code it only had referenced a VB class and there was absolutely no Vb.Net code in it.
Have to agree with that. CPP is very typical of using the language without knowing the entirety of it, and moving an entire project to CPP just to use CPP would be nuts.
On the other hand, introducing CPP at all would be a judgement call. Decisions like this are made all the time. Who is qualified to make it, would it be someone who specializes in T-SQL to the exclusion of everything else, or would it be more of a generalist type guy who has general programming experience. Does he program in such a way to ensure maintainability should he depart the company?
I'd go the more generalist route of "supportability" period, whether its multiple languages or even a single new language. Can your organization support it or not? Can your organization contract out the support when needed, if that's an appropriate direction?
Textfieldparser does perform slower
Yes it does perform slower but at the same time it reduces the code complexity.So if you are ok with average performance and your files arent that big this is the best option.
Agree with that too, Textfieldparser has other features that could be useful. That's why people cross the namespace to include the reference to it, and it can also be done without knowing or supporting any vb net at all, so in this case I don't even understand the "tower of babel" argument, but I disagree with Jeff on other topics too so it could just be me!
But if the performance is a show stopper there loads of other options like using the .Net Streamreader class which is quite robust and faster than TextFieldParser or custom built objects on GitHub.For example this one
https://github.com/phatcher/CsvReader
where approximately 10 million records were parsed and imported in 11 minutes.
Another one http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader where a 45 MB CSV file containing 145 fields and 50,000 records was processed at a rate of 30 MB/sec and took only 1.5 seconds to import on a P4 3.0 GHz, 1024 MB machine.
Or take this one in PowerShell that inherited .Net Objects(another Tower of Babble) https://gallery.technet.microsoft.com/scriptcenter/Import-Large-CSVs-into-SQL-216223d9 which imported around 5 million rows in around a minute.The author made C# and VB.Net equivalent code as well available which I think is a bit flawed because I cannot see any option to specify the delimiter in them.
We had a requirement where CSV files had to be parsed and imported and a few columns from each of those files where to be ftped through individual CSV's.We utilized a custom CSV parser from GitHub for parsing and recoded it with minimal effort so that as the records were being parsed and imported,the required columns from the parsed CSV were simultaneously moved to a new csv file and ftped through a different thread and was all done at the sametime.
My point is that there are multitude of ready-made options available for CSV parsing apart from TSQL that provides you with very high performance and less code manageability along with the added flexibility of error logging,custom column mappings,serialization support,UI support,Thread management etc etc.Throw in a nice WPF or a silverlight grid that displays the records being imported as they happen along with a few fancy counters and a progress bar and with an additional functionality of emailing the user the exact problem value,row number and column name that could not be imported and I am sure you will make your user very happy.
I think Bens criteria is using anything other than T-SQL exceeds his comfort level, even using basic dot net code is outside his comfort level. Otherwise I for one agree with you, procedural programming does have advantages over set oriented programming / T-SQL in many cases, and while acknowledging others reservations here, mixing languages is not unheard of. I think using the "supportability" criteria is something we can all agree on right?
April 7, 2016 at 8:18 am
patrickmcginnis59 10839 (4/7/2016)
was anybody able to get a tally table looking solution to work with the delimiter inside quotes exclusion? its obviously easy with a loop, but was curious about the tally table or other faster solutions?edit: sorry, missed Eirikur Eiriksson's post, pretty sharp!
As said the Eirikur Eiriksson does that.
But also the Ben Brugman Solution does solve that.
Shortcommings of the Ben Brugman Solution.
1. Because of the intermediate results it is probably a bit slow.
2. Does not handle elements starting with a " correctly. (Two double qoutes are removed by an optimising statement. This should not occure for an odd numbered qoute followed by a qoute.) Discovered after posting the code.
Eirikur Eiriksson (shortcommings?) :
Does not handle CRLF. (CRLF being the terminator of a 'row' in CSV files).
A CRLF can also be a part of an element.
There is a severe length restriction. (8000 is not enough for most CSV files).
It uses the Lead and Lag constructions (not supported by 2008).
But although in the first post 'able to process any csv file' was mentioned, this was left to the 'reader'. (I didn't want to complicate the question to much).
The solution from Eirikur Eiriksson can be used to construct a solution where the split is on both lineterminators and on a delimitor. At the moment I am working on this. (First use the code to split on the CRLF except for the quoted CRLF's. Second use the code to split on the delimitor. Maybe this is even possible in one pass ?).
Ben
Viewing 15 posts - 31 through 45 (of 62 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy