Processing strings.

  • ben.brugman (4/7/2016)


    patrickmcginnis59 10839 (4/7/2016)


    was anybody able to get a tally table looking solution to work with the delimiter inside quotes exclusion? its obviously easy with a loop, but was curious about the tally table or other faster solutions?

    edit: sorry, missed Eirikur Eiriksson's post, pretty sharp!

    As said the Eirikur Eiriksson does that.

    But also the Ben Brugman Solution does solve that.

    Shortcommings of the Ben Brugman Solution.

    1. Because of the intermediate results it is probably a bit slow.

    2. Does not handle elements starting with a " correctly. (Two double qoutes are removed by an optimising statement. This should not occure for an odd numbered qoute followed by a qoute.) Discovered after posting the code.

    Eirikur Eiriksson (shortcommings?) :

    Does not handle CRLF. (CRLF being the terminator of a 'row' in CSV files).

    A CRLF can also be a part of an element.

    There is a severe length restriction. (8000 is not enough for most CSV files).

    It uses the Lead and Lag constructions (not supported by 2008).

    But although in the first post 'able to process any csv file' was mentioned, this was left to the 'reader'. (I didn't want to complicate the question to much).

    The solution from Eirikur Eiriksson can be used to construct a solution where the split is on both lineterminators and on a delimitor. At the moment I am working on this. (First use the code to split on the CRLF except for the quoted CRLF's. Second use the code to split on the delimitor. Maybe this is even possible in one pass ?).

    Ben

    As hardcore as I am about using T-SQL for database functionality, I'll readily admit that there are many things that need to be done outside of T-SQL and even outside of SQL Server. For example, I clearly stated that if you really want performance for a splitter function, a well written SQLCLR is absolutely the way to go (emphasis on "well written" because many solutions have faults like leaving off the last "empty" element if the string ends with a delimiter).

    The same hold true for importing data from a file. Even SSIS isn't a panacea for all of the variations of "CSV" files that are out there because even Microsoft doesn't follow the "real" standards for true CSV (especially those bloody exports from Excel). Add to that that many providers of data send poorly formed data that doesn't appear to follow any standard in particular and you end up making the realization that no one solution is correct for all imports even if relegated to just handling supposed "CSV" files.

    Because most people do attempt to apply some type of standard to their CSV files (especially with my negotiations with them to do so), I've been mostly successful with using BCP and BULK INSERT (even though BCP and, therefor, BULK INSERT, is clearly advertised as having never been intended to handle CSV) in conjunction with the occasional BCP Format File (which can also handle quotes as part of a consistent delimiter but fails miserably for inconsistent quote usage as in an Excel export).

    Although I do try to avoid it, there are times when you just have to do something else besides T-SQL (although I'll frequently use xp_CmdShell to more easily interface with whatever that is to make life a little easier). Importing CSV files are no exception. Owing to the sometimes miserable failure of BCP and Bulk Insert to handle certain things, I'll sometimes resort to calling something "native" (I do try to avoid specialized external-to-SQL code not normally available from machine to machine, if possible) to accomplish a given task.

    For example, we get a whole lot of Excel spreadsheet data, which has the problem of using text qualifiers ONLY if there's a contained delimiter in the data. To get around that for files from sources where I expect such problems to exist on an unpredictable basis, I will run the file through the IMPORTCSV|EXPORTCSV piped Powershell command combination to "standardize" the inputs (predictably puts double quotes around every "field" and doubles up on embedded quotes, for example). Then I can have a properly constructed BCP format file that handles the quotes as a part of the delimiter, leading quote, and line terminator.

    Make no doubt about it, it kills me to have to use a disparate method such as that but it does use what is "native" on each machine.

    That, notwithstanding, sometimes you just have to bite the bullet and rely on "The Team" to come up with a better solution in the form of managed code. Yes, it's still important to NOT allow the willy-nilly adoption of whatever technology strikes someone to prevent "Tower of Babel" code, but a well thought out "black boxable" solution that mostly won't need to be maintained over time is sometimes the best way to go.

    Now, if we could just get Microsoft to make the same realization and come up with a much more flexible BULK INSERT (especially where CSV, TSV, and other delimited data is concerned) and create an equally flexible BULK EXPORT instead of trying to drive us towards the likes of SSIS and other "packages", which may or may not be available on every instance, we'd be all set and we wouldn't be having this conversation. 😀

    Shfting gears, although I believe that folks can come a whole lot closer, it's not possible to write code that will be "able to process [font="Arial Black"]ANY [/font]CSV" just as there is no code to successful handle ANY XML file. People do have to realize there are standards that need to be followed and, just as having the need for properly formed XML, there is the need for properly formed CSV data. Even if it doesn't follow the real standards for CSV, the format of the file should at least be consistent from row to row including any header rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/7/2016)

    As hardcore as I am about using T-SQL for database functionality, I'll readily admit that there are many things that need to be done outside of T-SQL and even outside of SQL Server. For example, I clearly stated that if you really want performance for a splitter function, a well written SQLCLR is absolutely the way to go (emphasis on "well written" because many solutions have faults like leaving off the last "empty" element if the string ends with a delimiter).

    Although I think you are right, I am still hoping your are not.

    Because most people do attempt to apply some type of standard to their CSV files

    If a CSV file is not properly formed, I do not consider it to be a CSV file.

    It should conform to some sort of standard. Problem is offcourse that there are so many standards to choose from. *)

    Textqualifier, delimitor and lineterminator can be choosen.

    I these symbols appear in a value the value should be within textqualifiers.

    And a textqualifier should be used 'double' when in a value.

    With these 'rules' almost all wel formed CSV files can be passed.

    Some 'standards' allow the textqualifier to be switched. So that both a single quote and a double qoute function as a text qualifiers, the first one met in a field determines which textqualifier is used and the field should end with this as wel.

    (Working with even / odd textqualifiers this switching between ' and " for example does not work very wel in the SQL solutions, so at the moment I am excluding these constructions).

    For performance I hope to process about 1 mb each 10 to 20 seconds. For the smaller files this was achievable.

    But my solution is still not complete. (Friday I run into trouble, the performance was especially bad, this morning I did split a CTE in two parts, storing the inbetween result into a table. This gave a huge performance gain. I still do not understand why exactly. Probably the optimiser was making a 'mess' of it).

    My time is limited I at the moment I am comming up with more idea's than I can properly test. At the moment I am testing, doing a split on the lineterminator, then counting the textqualifiers within a line. If that is an odd number, then up the the next 'odd' line, the lines should be 'added' together. After the lines are split start on the delimitors. (With the same principle or with the solutions allready present in this thread).

    Ben

    *)

    By counting the number of appearances of " ' ; , CR LF, it is often possible to determine which characters are choosen for the different functions.

  • ben.brugman (4/11/2016)


    For performance I hope to process about 1 mb each 10 to 20 seconds. For the smaller files this was achievable.

    But my solution is still not complete. (Friday I run into trouble, the performance was especially bad, this morning I did split a CTE in two parts, storing the inbetween result into a table. This gave a huge performance gain. I still do not understand why exactly. Probably the optimiser was making a 'mess' of it).

    My time is limited I at the moment I am comming up with more idea's than I can properly test. At the moment I am testing, doing a split on the lineterminator, then counting the textqualifiers within a line. If that is an odd number, then up the the next 'odd' line, the lines should be 'added' together. After the lines are split start on the delimitors. (With the same principle or with the solutions allready present in this thread).

    There are several misunderstandings going on throughout this entire thread that are being counter-productive.

    The main thing that seems to be tripping folks up is that parsing a simple, single list of delimited values is not the same operation as parsing a true CSV file that has: multiple lines, possible text-qualifies, possible embedded text-qualifiers, possible embedded field delimiters, and possible embedded row delimiters. Viewing these two things as being the same, just because they happen to be "things separated by commas", makes it much harder to deal with the actual problem of parsing true CSV data. And this, unfortunately, is a wide-spread confusion that is also evident in the DotNetPerls blog post that Patrick linked to. Whoever wrote that post did a very poor job of testing because they only tried data that could also be split using String.Split. That is blatantly irresponsible, or at least incompetent, because TextFieldParser can handle all of those extra CSV requirements, requirements that make no sense for a simple splitter that does nothing more than look for the delimiter(s). Nobody would ever use TextFieldParser to split simple strings, and String.Split simply can't handle actual CSV data; they do vastly different things and shouldn't be compared.

    The official question being asked here is "What's the best way to process strings [in T-SQL]?" with a footnote of "The end goal is to be able to process any csv file (imported as string in SQL-server)". Framing this problem in terms of "processing strings" confuses the issue by implying that simple splitters can be used to solve this problem, yet they just won't work. The actual question is: "What is the best way to import CSV files into SQL Server?". And the answer to that question is:

    You have a few choices (assuming you want to avoid SSIS, which I certainly understand, and it might not be able to handle embedded row delimiters anyway):

    • Use a .NET app (Console or Windows, but Console Apps are easier to schedule / automate) in combination with a Table-Valued Parameter (TVP), which is a User-Defined Table Type (UDTT; introduced in SQL Server 2008), to stream the data into a Stored Procedure which can do a simple INSERT INTO TableName (field_list) SELECT field_list FROM @TVPvariable; or something more complex to handle MERGE / Upsert scenarios. I have some example code posted on StackOverflow showing one method of doing this: How can I insert 10 million records in the shortest time possible?. The advantages here are that you can schedule it outside of SQL Server Agent, and that it isn't running from within the database (if that is a concern). Also, the transport mechanism, the UDTT is strongly-typed and even allows for enforcing some data integrity rules via CHECK and UNIQUE constraints. However, the data layout needs to be fairly static since UDDTs / TVPs are not dynamic.
    • Similar to the option mentioned above, use a .NET app to package up the multiple rows of data into XML and send that XML document to a Stored Procedure which can do a simple INSERT INTO TableName (field_list) SELECT field_list FROM @XMLvariable.nodes('/something'); or something more complex to handle MERGE / Upsert scenarios. This technique can be done in all versions of SQL Server starting with 2005. XML has the option of being a dynamic structure, but that takes a bit of work so is best to have a static data structure.
    • Use a .NET app and the SqlBulkCopy class, which provides a lot of the same functionality as the BCP utility. The benefits are that it handles most of the same options as BCP, such as turning off triggers, keeping identity values, setting a batch size, requesting a Table Lock. The Table Lock is nice as it will attempt to do a minimally logged operation. Of course, the price to pay for that is that the data will first be loaded into tempdb so that it can be sorted (even if it is already in the proper order) and then inserted into the destination table. A downside to using SqlBulkCopy is that it does not natively accept anything read from a file so more work needs to be done to get it all working. This option does allow for the structure to be dynamic.
    • Use a SQLCLR Stored Procedure. You can either read the file and issue INSERT statements, or you can return the processed fields as a result set and use the INSERT INTO...EXEC construct. I have example code that does just this, described below. The advantage to this approach is that you can call the import proc inline within existing T-SQL without needing to shell out using xp_cmdshell or a CmdExec job step of a SQL Server Agent job. This option allows for the structure to be dynamic. Disadvantages to a dynamic approach is that getting strongly typed fields is much harder to accomplish. Also, a disadvantage of the INSERT...EXEC construct is that the EXEC call needs to fully complete before its results can be released to the INSERT and that can require a lot of memory.
    • Use a SQLCLR Function. You can either read the file and issue INSERT statements (in which case a Scalar UDF would be fine, but so many INSERTs, even wrapped in a Transaction, is one of the slower methods mentioned here), or you can return the processed fields as a result set using a TVF and use the INSERT...SELECT construct. Using a TVF should even allow for using SELECT * INTO.... A TVF also allows for the result fields to be strongly-typed, but the downside is that the structure cannot be dynamic.

    In all cases, .NET is used for the processing that T-SQL is not built for, and then the result is shipped off to SQL Server for the processing that T-SQL is built for. Any of these choices will be much faster and less error-prone than the T-SQL acrobatics you are attempting.

    So, how best to parse the CSV file? Well, you can write your own parser, or use the TextFieldParser class in the Microsoft.VisualBasic.FileIO namespace (suggested by Sachin earlier in this thread). The nice thing about TextFieldParser is that it is already done and handles all aspects of CSV files: text-qualification (handles only some fields being text-qualified), embedded text-qualifiers, embedded field delimiters, and embedded row delimiters. The main down-side seems to be that if you do use this within the context of SQLCLR, it requires the Assembly to be set to UNSAFE, which is less than ideal. But that is a non-issue for Console apps and Windows apps. Or you can even find an existing CSV parser out on GitHub (also suggested by Sachin), or CodeProject, or probably a few other places.

    There is also some confusion as to what SQLCLR can and cannot do, and how well or how poorly it might do certain things. I have been trying to address this lack of understanding of SQLCLR in a few ways, two of which are:

    • Provide the information so that people can do this on their own. To help on this level I have been writing the Stairway to SQLCLR[/url] series (here on SQL Server Central) that already covers the deployment and security questions.
    • Provide the functionality itself so that people who don't have the time to learn how to do it can just install it and use it. This is the goal of my SQL#[/url] project.

    As I mentioned above in the list of methods for importing CSV data, SQLCLR is not the key factor: .NET is. Still, there are some conveniences to doing this .NET work in SQLCLR, mainly being able to integrate the import more directly into the overall import process (especially for those who work primarily in T-SQL). One convenience is being able to create a local temporary table and import directly into it, which allows for not needing a permanent ETL table that then either needs a field to distinguish each process, or requires that processing be done serially (i.e. single-threaded).

    In the interest of making all of these points a bit clearer, I have worked up a demo in the form of a SQLCLR Stored Procedure that utilizes the TextFieldParser class to read a CSV file (or string) and output the fields as a result set of all NVARCHAR(MAX) fields. The number of fields is dynamic, and is determined by the first row of the file or string. There are two tests shown: one using a string to show that all of the various CSV options can indeed be parsed, and one that imports a CSV file using the INSERT...EXEC construct. If you follow the exact instructions within the test script, you will generate an 80 MB file to then import. There are 3 scripts located at Pastebin.com:

    RESULTS ON MY SYSTEM: The 80 MB file containing 185,288 rows imported in 40 seconds, which is 1 MB every 0.5 seconds. Compare that with the stated goal of "1 mb each 10 to 20 seconds" which equates to 80 mb in 800 to 1600 seconds. So the method demonstrated here is 20x to 40x faster than the desired performance.

    KEEP IN MIND: This test was done on my slowish laptop with not enough memory and non-SSD HD.

    ALSO KEEP IN MIND: The INSERT...EXEC construct is one of the slower methods due to requiring that the result set from the EXEC sub-process be fully materialized in memory before being sent to the INSERT. Meaning, using a TVF (assuming it is done correctly and is a streaming TVF) would be faster due to the results being immediately available to the INSERT, and using a TVP should be similarly faster than the INSERT...EXEC construct.

    I have written articles about both TVPs and Streaming TVFs:

    - Streaming Data Into SQL Server 2008 From an Application[/url]

    - CLR Table-Valued Function Example with Full Streaming (STVF / TVF)[/url]

    Also, later this year I will be releasing a new product that is a command-line tool for doing automated bulk inserts. It will allow for full CSV parsing, strongly typed fields without needing a format file (unlike BCP), multiple source files, multiple destination tables (i.e. routing incoming data into different destinations), etc.


    Jeff Moden (4/2/2016)


    Also, instead of using C# to call Visual Basic, stop the tower of Babel and just make the CLR using Visual Basic.

    ...

    I'll also warn that the built in splitter for C# is relatively slow. It could be that the similar splitter in VB is just as slow.

    ...

    Why use a C# envelope on a VB core especially since it's the VB core that's going to do all the work? It doesn't make any sense.

    ...

    is there no well vetted splitter available in C#? And, yes, it's a serious question that may well justify the call to VB dll. If so, why not use the "C# DLL" instead? If not, why not use VB to call the "VB DLL" instead?

    Actually, all of the statements quoted directly above make absolutely no sense, and reflect a complete misunderstanding of .NET. .NET is structured very similarly to Java: source code is written in a high-level language (C# and VB.NET being the most common, but also F#, Visual C++, and others), and then compiled into a format that is common across all .NET languages, yet not runnable on the OS. This "common" format is called "MSIL" (Microsoft Intermediate Language) and I believe has been renamed to "CL" (Common Language) though a lot of documentation still refers to MSIL. This MSIL / CL code is what is contained in a .NET Assembly / DLL. And since this MSIL / CL code cannot run on the OS, it needs to be translated into machine-specific (i.e. "native") code and executed by a lower-level layer: the CLR (Common Language Runtime) which is analogous to the JVM (Java Virtual Machine) for Java.

    .NET Assemblies can either contain only MSIL / CL code (a.k.a. "pure") or a combination of MSIL / CL and C++ (not Visual C++ / VC++) (a.k.a. "mixed-mode"). Assemblies being loaded into SQL Server (whether custom or unsupported .NET Framework libraries) are restricted to being "pure" MSIL Assemblies. This restriction is one reason why incorporating one or more unsupported .NET Framework libraries into a SQLCLR project is ill-advised: Microsoft can change a pure MSIL library to mixed-mode any time they feel it best to do so. And this has happened: Unable to register .NET framework assembly not in the supported list[/url].

    • "using C# to call Visual Basic" -- There isn't any Visual Basic code in the example provided by Sachin; it is all C# code. The Microsoft.VisualBasic .NET Framework library does not contain VB code any more than the Microsoft.SqlServer library contains SQL Server. The names of the libraries refer to the subject matter of the functionality they provide. According to the MSDN page for Microsoft.VisualBasic.FileIO Namespace: "The Microsoft.VisualBasic.FileIO namespace contains types that support the My file system object in Visual Basic". Hence this library exists because Visual Basic includes this functionality, but its usage is not restricted to VB callers.
    • "the built in splitter for C# is relatively slow" -- this is a bit over-stated and/or out of context. While there are some notes regarding performance in the MSDN documentation for String.Split, for splitting smallish strings it is just fine. If you are referring to the performance testing done against other splitter functions, whether various T-SQL methods or other SQLCLR methods, various considerations need to be pointed out, such as:

      * the Microsoft-provided example of using String.Split in a SQLCLR TVF is not the best way of using String.Split

      * T-SQL functions using VARCHAR(8000) input params have at least two, if not three, advantages: 1) the SQLCLR API only supports NVARCHAR, so there will always be at least some minor performance hit to convert the VARCHAR data to NVARCHAR, not to mention also sending twice as many bytes which does add up on larger values, 2) passing data to the CLR is separate memory, and 3) using MAX instead of a 1 - 4000 value on the NVARCHAR size has a definite performance hit, and while it would be similarly seen on the VARCHAR param in the T-SQL function, when testing VARCHAR(8000) data it is sometimes tested against NVARCHAR(MAX) in order to test the full 8000 characters.

    • "It could be that the similar splitter in VB is just as slow." -- No, it can't be because there is, again, only one String.Split method. It might be faster in C++ as that would by-pass the middle-layer of the CLR, but then it wouldn't be managed code and hence not allowed in a SAFE Assembly.
    • "a C# envelope on a VB core" -- again, this incorrect because there is no "VB core": what you are thinking is VB is actually MSIL (since it is a provided Assembly) and as soon as the C# is compiled it too will be MSIL / CL.
    • "tower of Babel" blah blah blah -- while an interesting topic on its own, the "Tower of Babel" discussion is entirely orthogonal to this particular thread, and is an unnecessary distraction that has only added to the confusion here.

    For more information on the general structure of .NET and the "CLR Integration" feature in SQL Server, please see the following resources:

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (4/20/2016)


    Actually, all of the statements quoted directly above make absolutely no sense, and reflect a complete misunderstanding of .NET. .NET is structured very similarly to Java: source code is written in a high-level language (C# and VB.NET being the most common, but also F#, Visual C++, and others), and then compiled into a format that is common across all .NET languages, yet not runnable on the OS. This "common" format is called "MSIL" (Microsoft Intermediate Language) and I believe has been renamed to "CL" (Common Language) though a lot of documentation still refers to MSIL. This MSIL / CL code is what is contained in a .NET Assembly / DLL. And since this MSIL / CL code cannot run on the OS, it needs to be translated into machine-specific (i.e. "native") code and executed by a lower-level layer: the CLR (Common Language Runtime) which is analogous to the JVM (Java Virtual Machine) for Java.

    Absolutely correct. With the possible exception of the functionality that "R" brings to the table, I don't understand the need for so many different languages to do the same thing.

    Fortunately, I no longer have to worry about such things. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Solomon Rutzky (4/20/2016)


    •"tower of Babel" blah blah blah -- while an interesting topic on its own, the "Tower of Babel" discussion is entirely orthogonal to this particular thread, and is an unnecessary distraction that has only added to the confusion here.

    I have to continue to disagree. I continue to fight such problems at work where someone just didn't keep a lid on it and it got out of control. I'm not saying that such code should be forbidden but it should be questioned especially if the code used for the envelope being used already has the same capability.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/20/2016)


    Solomon Rutzky (4/20/2016)


    Actually, all of the statements quoted directly above make absolutely no sense, and reflect a complete misunderstanding of .NET. .NET is structured very similarly to Java: source code is written in a high-level language (C# and VB.NET being the most common, but also F#, Visual C++, and others), and then compiled into a format that is common across all .NET languages, yet not runnable on the OS.

    Absolutely correct. With the possible exception of the functionality that "R" brings to the table, I don't understand the need for so many different languages to do the same thing.

    I think it is more of a "desire" than "need", and is the same reason why there are probably no less than 10 offerings for any type of software ever conceived of (10 on average, some types of software have more, some less). I mean, how many RDBMS's are there? How many NoSQL options? How word processors? How many simple text editors even? Why has Linux been ported to so many old processors? Sometimes people are not aware of another program existing so create a new one, others are aware but think they can do better, sometimes people are just bored, and I am sure that there are other reasons. What it all comes down to is: humans are quirky, highly irrational creatures.

    Jeff Moden (4/21/2016)


    Solomon Rutzky (4/20/2016)


    •"tower of Babel" blah blah blah -- while an interesting topic on its own, the "Tower of Babel" discussion is entirely orthogonal to this particular thread, and is an unnecessary distraction that has only added to the confusion here.

    I have to continue to disagree. I continue to fight such problems at work where someone just didn't keep a lid on it and it got out of control. I'm not saying that such code should be forbidden but it should be questioned especially if the code used for the envelope being used already has the same capability.

    Disagree with what, exactly? You started this whole sub-topic of "Tower of Babel" because you mistakenly thought that Sachin was introducing both VB and C#. Since that isn't the case, there is no reason for hijacking the discussion here (which should be focused on how to import CSV data). As I said, it is a worthy topic to discuss, and would be a great topic for another thread, probably in the "Anything that is NOT about SQL!" forum. And if there was such a thread, then it would be entirely appropriate to mention the concern here and link to that separate thread.

    Yes, I do understand that any solution needs to be maintained and that the O.P. stated reasons for preferring T-SQL over .NET / C#, but the simple fact of the matter is that this particular problem is not solvable solely within the realm of T-SQL (well, unless one is willing to have a painfully slow solution that is a huge mess of T-SQL that is probably just as unmaintainable as .NET). So a) introducing one additional language is not "Tower of Babel" by any stretch, and b) the fact that you have to fight it so much where you work is understandably frustrating (really, it is) but that still doesn't mean that it warrants anything more than a mention here.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (4/25/2016)

    Disagree with what, exactly? You started this whole sub-topic of "Tower of Babel" because you mistakenly thought that Sachin was introducing both VB and C#. Since that isn't the case, there is no reason for hijacking the discussion here (which should be focused on how to import CSV data). As I said, it is a worthy topic to discuss, and would be a great topic for another thread, probably in the "Anything that is NOT about SQL!" forum. And if there was such a thread, then it would be entirely appropriate to mention the concern here and link to that separate thread.

    I think it'd be a great thread, it would be interesting to see what extent folks can support multiple languages, and what extent was too much, real world experiences. I even promise not to reply to Jeff in it!

  • Solomon Rutzky (4/25/2016)


    Disagree with what, exactly? You started this whole sub-topic of "Tower of Babel" because you mistakenly thought that Sachin was introducing both VB and C#. Since that isn't the case, ...

    I understand that "dot net is dot net" but, very specifically, is he not using a C# envelope to call a VB function?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/26/2016)


    Solomon Rutzky (4/25/2016)


    Disagree with what, exactly? You started this whole sub-topic of "Tower of Babel" because you mistakenly thought that Sachin was introducing both VB and C#. Since that isn't the case, ...

    I understand that "dot net is dot net" but, very specifically, is he not using a C# envelope to call a VB function?

    •"using C# to call Visual Basic" -- There isn't any Visual Basic code in the example provided by Sachin; it is all C# code. The Microsoft.VisualBasic .NET Framework library does not contain VB code any more than the Microsoft.SqlServer library contains SQL Server. The names of the libraries refer to the subject matter of the functionality they provide. According to the MSDN page for Microsoft.VisualBasic.FileIO Namespace: "The Microsoft.VisualBasic.FileIO namespace contains types that support the My file system object in Visual Basic". Hence this library exists because Visual Basic includes this functionality, but its usage is not restricted to VB callers.

  • patrickmcginnis59 10839 (4/26/2016)


    Jeff Moden (4/26/2016)


    Solomon Rutzky (4/25/2016)


    Disagree with what, exactly? You started this whole sub-topic of "Tower of Babel" because you mistakenly thought that Sachin was introducing both VB and C#. Since that isn't the case, ...

    I understand that "dot net is dot net" but, very specifically, is he not using a C# envelope to call a VB function?

    •"using C# to call Visual Basic" -- There isn't any Visual Basic code in the example provided by Sachin; it is all C# code. The Microsoft.VisualBasic .NET Framework library does not contain VB code any more than the Microsoft.SqlServer library contains SQL Server. The names of the libraries refer to the subject matter of the functionality they provide. According to the MSDN page for Microsoft.VisualBasic.FileIO Namespace: "The Microsoft.VisualBasic.FileIO namespace contains types that support the My file system object in Visual Basic". Hence this library exists because Visual Basic includes this functionality, but its usage is not restricted to VB callers.

    Yep. Saw that before. But, to quote what you posted...

    Hence this library exists because Visual Basic includes this functionality, but its usage is not restricted to VB callers.

    I've never said the usage is restricted to VB callers.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/26/2016)


    Solomon Rutzky (4/25/2016)


    Disagree with what, exactly? You started this whole sub-topic of "Tower of Babel" because you mistakenly thought that Sachin was introducing both VB and C#. Since that isn't the case, ...

    I understand that "dot net is dot net" but, very specifically, is he not using a C# envelope to call a VB function?

    The question in the second part negates the statement in the first part. Meaning, no, you do not understand what is going on here if you are thinking that there is a "VB function" anywhere in any of the code presented thus far in this thread.

    Very specifically: no, he is not using a C# envelope to call a VB function because there is no VB function. You are confusing my using the term "functionality" (what is being done) with "function" (the code, as it was written, that is doing something). As I explained in the section that you quoted above and replied "Absolutely correct" to, when code written in any .NET language is compiled into a DLL / EXE, it then exists as MSIL / CL (Common Language); it is no longer VB or C# or whatever it originally was. The Assemblies that come with the .NET Framework are already compiled into MSIL / CL. So whatever language they were originally written in is irrelevant. And the Microsoft.VisualBasic DLL is one of these provided .NET Framework libraries / assemblies.

    But, if you are still not convinced that there is no VB code, then take a look at the published portion of the source code for this VB function on reference.microsoft.com:

    http://referencesource.microsoft.com/#Microsoft.VisualBasic/Microsoft/VisualBasic/FileIO/TextFieldParser.cs

    As you can see, it is written in C#, not VB.NET. So there: Sachin was using a C# "envelope" to call a C# "function" 😛

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • So the following quote was incorrect or correct?

    Hence [font="Arial Black"]this library exists because Visual Basic includes this functionality[/font], but its usage is not restricted to VB callers.

    Also and since I don't know for sure on the latest VS, is it possible to install VS without the VB component and, if so, would that function still be available to C# users?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/28/2016)


    So the following quote was incorrect or correct?

    Hence [font="Arial Black"]this library exists because Visual Basic includes this functionality[/font], but its usage is not restricted to VB callers.

    That quote is correct. As stated in the sentence in my post just prior to the one you quoted above:

    According to the MSDN page for Microsoft.VisualBasic.FileIO Namespace: "The Microsoft.VisualBasic.FileIO namespace contains types that support the My file system object in Visual Basic".

    So again: this functionality exists because it was a feature of the VB language prior to .NET, so they made it available in .NET. But that doesn't mean it is exclusive to a particular language. You can call the Microsoft.VisualBasic methods/functions from F#, Visual C++, etc.

    Also and since I don't know for sure on the latest VS, is it possible to install VS without the VB component and, if so, would that function still be available to C# users?

    I haven't tried setting up such an environment, but it wouldn't matter if I did since Visual Studio has nothing to do with the functionality provided by the .NET Framework libraries. The DLL is: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Microsoft.VisualBasic.dll, and comes with the .NET Framwork.

    You can even try this for yourself (and it's quite easy). I provided links several posts ago. Go to the link for "Installer" and download and install it. Then run the tests. I assume you don't have Visual Studio installed. Yet the code works, even with a reference to Microsoft.VisualBasic.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (4/29/2016)


    Jeff Moden (4/28/2016)


    So the following quote was incorrect or correct?

    Hence [font="Arial Black"]this library exists because Visual Basic includes this functionality[/font], but its usage is not restricted to VB callers.

    That quote is correct. As stated in the sentence in my post just prior to the one you quoted above:

    According to the MSDN page for Microsoft.VisualBasic.FileIO Namespace: "The Microsoft.VisualBasic.FileIO namespace contains types that support the My file system object in Visual Basic".

    So again: this functionality exists because it was a feature of the VB language prior to .NET, so they made it available in .NET. But that doesn't mean it is exclusive to a particular language. You can call the Microsoft.VisualBasic methods/functions from F#, Visual C++, etc.

    Also and since I don't know for sure on the latest VS, is it possible to install VS without the VB component and, if so, would that function still be available to C# users?

    I haven't tried setting up such an environment, but it wouldn't matter if I did since Visual Studio has nothing to do with the functionality provided by the .NET Framework libraries. The DLL is: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Microsoft.VisualBasic.dll, and comes with the .NET Framwork.

    You can even try this for yourself (and it's quite easy). I provided links several posts ago. Go to the link for "Installer" and download and install it. Then run the tests. I assume you don't have Visual Studio installed. Yet the code works, even with a reference to Microsoft.VisualBasic.

    You know, when I was writing the response shown above, I realized that the current question -- if TextFieldParser would be available to C# if VB stuff wasn't installed as part of Visual Studio -- was off topic since the original concern was about the code being in VB. But I figured I would just answer the question to provide that info. BUT, I had completely forgotten the origin of this subtopic and hence didn't realize how completely off-topic it is, even from the moment it was first mentioned. Let's review the original concern (emphasis added):

    Jeff Moden (4/3/2016)


    Sachin Nandanwar (4/3/2016)


    Jeff Moden (4/3/2016)


    Also, instead of using C# to call Visual Basic, stop the tower of Babel and just make the CLR using Visual Basic.

    I don't think it should matter because that is what .Net interoperability is all about.All managed and unmanaged code can co exists within the same assembly.

    It's not the .Net interoperability that I'm concerned with, in such cases. That usually works just fine. Although the code is not likely to need a change, the code does rely on someone not only knowing about T-SQL and SQLCLR using C# but now they also need to know about VB to be able to maintain it properly. This is what Ben and I have been taking some good exception to in this thread.

    The concern was about maintainability due to needing to know VB. BUT, TextFieldParser is proprietary code supplied by Microsoft as part of the .NET Framework. So it doesn't matter if it is written in C#, VB, TurboPascal, JCL, etc since there is nothing to maintain. I used it for the example code I posted and I don't know VB. And even if I knew VB and wanted to make a change, there is nothing to change. So nobody ever needed to know VB in order to use this feature, even if it is written in VB, and even if the code in the DLL is still VB (which it isn't). This was all a completely absurd and unnecessary off-topic diversion and waste of everyone's time.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Nah... the question isn't off topic although I'll certainly agree that it's a sub-topic.

    If a shop forbids VB, it seems to me that they wouldn't allow it to be installed anywhere if it could be excluded from VS. If what you said about the function being available only because of VB being installed, then that function wouldn't be available.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 46 through 60 (of 62 total)

You must be logged in to reply to this topic. Login to reply