May 31, 2009 at 10:33 am
As a side bar, I really really hate these new code windows.... they haven't worked correctly for months now. They keep saying they're going to fix them, but they've made almost no progress. If you look at the code I just posted, the wrong stuff is RED and it continues to obliterate blank lines.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 31, 2009 at 3:30 pm
We've had this debate before and the delimiting function's performance can practically always be improved by a hand-written piece of SQL. But the delimiting function's flexibility and simplicity makes it easily adaptible to many more uses and that is why I suggested it for this thread.
For example, suppose the requirement changes to get the last monetary value: "SOLD FOR $10,000 on 6/1/2009". The delimiting function still makes this straightforward but imagine an inline expression to do the same.
select ...
from SomeTable ST cross apply global.dbo.fListToVarchars( reverse(ST.text), ' ' ) as X
where ISNUMERIC( reverse(X.item) )
Suppose multiple values are enclosed ("2 SOLD FOR $10,000 and $11,000"). Suppose the values are related ("SOUTH: $10,000; NORTH: $15,000; WEST: $16,000 and $15,000"). All these scenarios can be handled by the delimiting function and the SQL would remain very straightforward.
May 31, 2009 at 5:16 pm
Hi guys,
Everyone’s nailed it for the example I provided, though I have realised that there may be legitimate Trades with the exact same values (yes, there is no Primary or unique identifer. And it’s a really big American company providing this data if you really want to send some of these high velocity pork chops), so Ramesh’s example works perfectly for me (It may be the way I’m approaching it, but if I use any of the @Variable setups mentioned, e.g. –
declare @Source table (Cashnotes varchar(100), [Random other field] varchar(100))
insert into @source
select 'FFX Buy USD/Sell AUD, Broker 703 MATURITY 11/05/09 SOLD AUD 1,557,526.65' as Cashnotes, 'Rand1' as 'Random other field'
union all
select 'FFX Buy USD/Sell AUD, Broker 703 MATURITY 11/05/09 SOLD AUD 1,557,526.65' as Cashnotes, 'Rand1' as 'Random other field'
union all
select 'FFX Buy USD/Sell AUD, Broker 703 MATURITY 11/05/09 SOLD AUD 18,669,911.96' as Cashnotes, 'Rand5' as 'Random other field'
declare @abc varchar(100)
set @abc = (
SELECT cashnotes
FROM @Source
)
Select [random other field],reverse(Substring(Reverse(@ABC),0,Patindex('% %', reverse(@ABC))))
from @Source
- I'd fail because the subquery returns more than 1 result; where in practice, I’d be looking at about a hundred results with say 10 being Duplicates. Let me know if I’m just setting this up incorrectly).
Arun’s correct that Siva’s second example required rounded numbers, and on Jeff’s advice (The table’s current 290,314 rows and increases daily), I’ve avoided Antonio’s example for this problem in favour of Ramesh’s, though it's not an approach I've seen before so I might grab a coffee and see if I can use that code for any future needs I may encounter.
Thanks guys,
Nathan
May 31, 2009 at 5:28 pm
P.S.: if you've got hundreds of thousands of entries and the table is growing regularly, then an explicit colum to store the derived value should be added to the table and that value should be populated when the row is added. if you're positive that the technique to derive is rock solid, use the derivation as a computed column. 🙂
May 31, 2009 at 5:40 pm
antonio.collins (5/31/2009)
We've had this debate before and the delimiting function's performance can practically always be improved by a hand-written piece of SQL. But the delimiting function's flexibility and simplicity makes it easily adaptible to many more uses and that is why I suggested it for this thread.For example, suppose the requirement changes to get the last monetary value: "SOLD FOR $10,000 on 6/1/2009". The delimiting function still makes this straightforward but imagine an inline expression to do the same.
select ...
from SomeTable ST cross apply global.dbo.fListToVarchars( reverse(ST.text), ' ' ) as X
where ISNUMERIC( reverse(X.item) )
Suppose multiple values are enclosed ("2 SOLD FOR $10,000 and $11,000"). Suppose the values are related ("SOUTH: $10,000; NORTH: $15,000; WEST: $16,000 and $15,000"). All these scenarios can be handled by the delimiting function and the SQL would remain very straightforward.
Yes, we have had this conversation before. I don't know about you, but I'm not willing to use a canned function that takes 22 seconds to process only 10,000 rows when some direct function can do it in milliseconds I don't care how simple it supposedly makes the code.
And, if you take a look at your function call and compare it the simple knowledge method that Ramesh used, I think you're also losing out on the simplicity and straightforwardness you speak of.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 31, 2009 at 7:06 pm
antonio.collins (5/31/2009)
P.S.: if you've got hundreds of thousands of entries and the table is growing regularly, then an explicit colum to store the derived value should be added to the table and that value should be populated when the row is added. if you're positive that the technique to derive is rock solid, use the derivation as a computed column. 🙂
I didn't think to mention, but only 38,458 of those rows will be, well, I'll call it this "Transaction Type", 1,000 less if I remove Deleted Records, so this case probably wouldn't be the best example for another column, though I agree that if every record were like these I'd have it in the SSIS package that loads this information in.
Having said that, I can't add Columns, or even create Indexes(!) on the Production Server.
May 31, 2009 at 11:22 pm
NathanB (5/31/2009)
antonio.collins (5/31/2009)
P.S.: if you've got hundreds of thousands of entries and the table is growing regularly, then an explicit colum to store the derived value should be added to the table and that value should be populated when the row is added. if you're positive that the technique to derive is rock solid, use the derivation as a computed column. 🙂I didn't think to mention, but only 38,458 of those rows will be, well, I'll call it this "Transaction Type", 1,000 less if I remove Deleted Records, so this case probably wouldn't be the best example for another column, though I agree that if every record were like these I'd have it in the SSIS package that loads this information in.
Having said that, I can't add Columns, or even create Indexes(!) on the Production Server.
Heh... don't get me started on the great dis-service to mankind Microsoft made when they came up with things like DTS and SSIS. 😀
So far as not being able to add things to the production server, you're wrong. All you have to do is write a script to do it and submit it to the SDBA's... they'll take care of it for you.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 31, 2009 at 11:52 pm
Correct, I could write it and pass it on to Data Warehousing but I honestly believe I may cause a heart-attack by suggesting that tables in this particulat Database should be amended considering the amount of processes that run off it!
For my own development, what would you suggest I use for Importing CSV/XLS files into SQL Server 2005? I can't say I have any great love for SSIS packages either, it's simply what i was taught.
June 1, 2009 at 12:01 am
NathanB (5/31/2009)
Correct, I could write it and pass it on to Data Warehousing but I honestly believe I may cause a heart-attack by suggesting that tables in this particulat Database should be amended considering the amount of processes that run off it!For my own development, what would you suggest I use for Importing CSV/XLS files into SQL Server 2005? I can't say I have any great love for SSIS packages either, it's simply what i was taught.
Bulk Insert. On a slow server, it'll load and do some data integrity checks while offloading "bad records" to a "rework file" on a 5.1 million row, 20 column wide CSV in less than 60 seconds. Throw in a little xp_CmdShell wizardry on your ETL system and you can pretty much blow DTS and SSIS away for development times and usually beat SSIS for performance even though it uses Bulk Insert behind the scenes.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 1, 2009 at 2:43 am
Jeff Moden (6/1/2009)
Bulk Insert. On a slow server, it'll load and do some data integrity checks while offloading "bad records" to a "rework file" on a 5.1 million row, 20 column wide CSV in less than 60 seconds. Throw in a little xp_CmdShell wizardry on your ETL system and you can pretty much blow DTS and SSIS away for development times and usually beat SSIS for performance even though it uses Bulk Insert behind the scenes.
Sorry Jeff, I have to suggest otherwise. Though bulk insert is faster than SSIS but it has its own drawbacks/issues.
1. Works only with delimited files (does not support XLS files)
2. Requires "Bulk Administrators" or "ADMINISTER BULK OPERATIONS" permissions
3. Requires execute permissions on xp_cmdshell procedure, if multiple files needs to be imported. Most of the DBAs are reluctant to grant permissions on this object as it is prone to security concerns.
4. Requires manual error handling and redirecting invalid rows requires bcp/sqlcmd utility support through xp_cmdshell.
--Ramesh
June 1, 2009 at 11:41 am
Ramesh (6/1/2009)
Jeff Moden (6/1/2009)
Bulk Insert. On a slow server, it'll load and do some data integrity checks while offloading "bad records" to a "rework file" on a 5.1 million row, 20 column wide CSV in less than 60 seconds. Throw in a little xp_CmdShell wizardry on your ETL system and you can pretty much blow DTS and SSIS away for development times and usually beat SSIS for performance even though it uses Bulk Insert behind the scenes.Sorry Jeff, I have to suggest otherwise. Though bulk insert is faster than SSIS but it has its own drawbacks/issues.
1. Works only with delimited files (does not support XLS files)
THAT would be a problem in itself. 😛
2. Requires "Bulk Administrators" or "ADMINISTER BULK OPERATIONS" permissions
Ummm... what kind of privs do you need for SSIS? And what's wrong with Bulk Admin privs for a system job on an ETL system (which shouldn't usually be public facing).
3. Requires execute permissions on xp_cmdshell procedure, if multiple files needs to be imported. Most of the DBAs are reluctant to grant permissions on this object as it is prone to security concerns.
I agree about the xpCmdShell thing but see the note on item 2 above. So far as importing multiple files, Bulk Insert does not require the use of xp_CmdShell at all. The only time you need it is if you want to move processed files to an archive area. Again, think "system level job". Just having xp_CmdShell activated (especially on non-public facing ETL systems) is not a risk in an of itself.
4. Requires manual error handling and redirecting invalid rows requires bcp/sqlcmd utility support through xp_cmdshell.
No true in 2k5. Bulk Insert now has those capabilities and no longer requires xp_CmdShell because BCP is no longer required to write to error logs or to sequester bad rows in an errata file.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 1, 2009 at 6:31 pm
Ah, xp_CmdShell, I wasn't even SysAdmin on our Dev environment when I last heard about this and first tried to get OpenRowset working for Excel, but they're things I can check out now (though as of now - I starting replying yesterday - I keep getting issues with providing a null blank server when looking at this around my other work). - e.g. EXECUTE SP_CONFIGURE 'show advanced options', 1
RECONFIGURE WITH OVERRIDE
GO
EXECUTE SP_CONFIGURE 'Ad Hoc Distributed Queries', '1'
RECONFIGURE WITH OVERRIDE
GO
SELECT *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\Source\Addresses.xls',
'SELECT * FROM [Sheet1$]')
1. Works only with delimited files (does not support XLS files)
I've probably got a few situations were I can work with CSV's but I'll definitely have a few cases where I'm using XLS (though a few of those XLS are XML in disguise and have to be manually opened and changed to XLS anyway!)
Ummm... what kind of privs do you need for SSIS? And what's wrong with Bulk Admin privs for a system job on an ETL system (which shouldn't usually be public facing).
With SSIS, I believe you only need straight-forward DataWriter for Inserts. I believe if I write something, Data Warehousing'd run it with a System Account (the same one i'd set up the SSIS package using as i have the password) with Db_owner over the Database in particular, but without BulkAdmin which I think would be needed with this. These things are a bit more challenging with limited permissions!
June 2, 2009 at 5:03 am
Jeff Moden (6/1/2009)
3. Requires execute permissions on xp_cmdshell procedure, if multiple files needs to be imported. Most of the DBAs are reluctant to grant permissions on this object as it is prone to security concerns.
I agree about the xpCmdShell thing but see the note on item 2 above. So far as importing multiple files, Bulk Insert does not require the use of xp_CmdShell at all. The only time you need it is if you want to move processed files to an archive area. Again, think "system level job". Just having xp_CmdShell activated (especially on non-public facing ETL systems) is not a risk in an of itself.
Actually, I meant to say that xp_cmdshell is required to loop through files and archiving files. Think about Jeff, how many DBAs are dare enough to allow xp_cmdshell in their production (or even test/uat) servers?
Jeff Moden (6/1/2009)
4. Requires manual error handling and redirecting invalid rows requires bcp/sqlcmd utility support through xp_cmdshell.
No true in 2k5. Bulk Insert now has those capabilities and no longer requires xp_CmdShell because BCP is no longer required to write to error logs or to sequester bad rows in an errata file.
I am not sure;-) how can one use "Bulk Insert" to redirect rows to a error (flat) file. I would be interested to see a practical example of it.
I think its about time for BULK INSERT to make way for SSIS:-)
--Ramesh
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply