I'm scraping data that sometimes has several footnotes per record. If so, the csv file which is my source material has additional columns named Footnote, Footnote2, Footnote3 etc. The number of such Footnote columns varies depending on the particular records scraped. Sometimes there is only 1, sometimes there are 3 or 4.
After ingesting the CSV into a table in my database, I'm trying to write a select script that will concatenate all of these footnotes into a single column.
The script with my attempt follows. As you can see it attempts to check for existence of the columns I am concatenating. If footnote2 doesn't exist, then the first option should be returned. However what is happening is that SQL appears to be checking for existence of all of the columns I've mentioned in subsequent 'WHEN' clauses before execution and it returns an 'Invalid column name' error if for example footnote2 does not exist.
Is there a way I can re-write this to accomplish what I am trying to do?
SELECT 'Footnotes' = CASE
WHEN
-- There is only 1 footnote column
NOT EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote2' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
THEN h.Footnote
WHEN
-- There are 2 footnote columns
NOT EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote3' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote2' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
THEN 'Footnotes for all transactions in this filing: ' + ISNULL(h.Footnote,'') + ISNULL(h.Footnote2,'')
WHEN
-- There are 3 footnote columns
NOT EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote4' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote2' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
AND EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote3' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData')) -- Test for existence of Footnote number
THEN 'Footnotes for all transactions in this filing: ' + ISNULL(h.Footnote,'') + ISNULL(h.Footnote2,'')+ ISNULL(h.Footnote3,'')
ELSE NULL
END
FROM TempparsedHeaderCSVData h
Edit: adding sample data:
Here's sample data. Since as I mentioned the source table sometimes has more than 1 footnote column, I'm creating 2 tables here; one for a situation where there is just 1 footnote column, a second for a situation where there are 2 footnote columns. As I mentioned I'm trying to create a script that will take data from both footnote columns if both exists (like in the second table here) and concatenate it into a single column. Obviously if only 1 exists like the first table, I just want to take the text from that 1 footnote column.
-- Create table. This is an example where there is only 1 footnote column in the source table (that comes in from an external source)
CREATE TABLE TempparsedHeaderCSVData
(Footnote nvarchar(1000) NULL)
-- Insert test data
INSERT INTO TempparsedHeaderCSVData
(Footnote)
VALUES
('This is footnote 1')
, ('Helllloooooo')
, ('Size 9')
, ('Are these my legs?')
Second table:
-- Create table. This is an example where there are 2 footnote columns in the source table (that comes in from an external source)
CREATE TABLE TempparsedHeaderCSVData
(Footnote nvarchar(1000) NULL, Footnote1 nvarchar(1000) NULL)
-- Insert test data
INSERT INTO TempparsedHeaderCSVData
(Footnote, Footnote1)
VALUES
('This is a footnote', 'This also is a footnote' )
, ('Helllloooooo', 'Goodbye')
, ('Size 9', 'But what was the question?')
, ('Are these my legs?', 'Size 9')
April 17, 2020 at 2:12 pm
"I'm scraping data ..."
What do you mean by this?
Your query is checking for the existence of columns in tables, and has nothing to do with data.
Is TempparsedHeaderCSVData a table? Maybe you need to be querying that, rather than sys.columns?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
April 17, 2020 at 2:19 pm
What about if you break that out into IF statements instead?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
April 17, 2020 at 2:20 pm
You could build the query using dynamic SQL, that way you wouldn't be referring to columns that don't exist in your query.
Also, (this isn't going to fix your problem) instead of
NOT EXISTS(SELECT 1 FROM sys.columns WHERE Name = N'Footnote2' AND Object_ID = Object_ID(N'dbo.TempparsedHeaderCSVData'))
You could have
COLUMNPROPERTY(OBJECT_ID('dbo.TempparsedHeaderCSVData'),'Footnote2','ColumnId') IS NULL
which is a bit shorter to write.
April 17, 2020 at 3:08 pm
"I'm scraping data ..."
What do you mean by this?
Your query is checking for the existence of columns in tables, and has nothing to do with data.
The scraping I referred to isn't shown in this script Phil. I said that by way of background.
Is TempparsedHeaderCSVData a table? Maybe you need to be querying that, rather than sys.columns?
Yes TempparsedHeaderCSVData is a table. The FROM clause at the end queries TempparsedHeaderCSVData. My understanding is that querying sys.columns as I've attempted to do in the WHEN clauses allows to check for existence of columns in the table TempparsedHeaderCSVData (when done in isolation rather than as I've done here)
April 17, 2020 at 3:13 pm
You could build the query using dynamic SQL, that way you wouldn't be referring to columns that don't exist in your query
Thank you - I think this gets to the heart of the issue if doing it that way wouldn't result in an invalid column result. I'll test.
April 17, 2020 at 4:03 pm
I am not 100% sure if IF statements would handle that better or not. Been a while since I tried it. Just tried it and it does not work. So don't go with that idea.
Dynamic SQL should work though, but you will want to be VERY careful with that. It would be possible for someone to do SQL injection attack doing that. I've done similar things with dynamic SQL to get database names and build up my backup scripts (gets database names and their recovery model). Not quite the same thing you are going for, but a similar idea.
If you go with the dynamic SQL route, I would test it with some ' characters in the footer on a test system just to make sure you don't get someone being dumb and injecting SQL into that.
Sorry about the bad suggestion of IF statements; that suggestion will not work.
A different suggestion (that I tested) is to have multiple stored procedures for this. Use an IF statement to determine which stored procedure to run, then call the proper stored procedure depending on how many "footer"s there are.
I personally try to avoid dynamic SQL that can have user-input text in it. If I don't know or trust the input data, I don't put it in dynamic SQL.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
April 17, 2020 at 4:07 pm
select
concat(iif((columnproperty(object_id('dbo.TempparsedHeaderCSVData'),'Footnote','ColumnId') IS NOT NULL) and
(columnproperty(object_id('dbo.TempparsedHeaderCSVData'),'Footnote2','ColumnId') IS NULL),
'', 'Footnotes for all transactions in this filing: '),
isnull(h.Footnote,''), isnull(h.Footnote2,''), isnull(h.Footnote3,''))
from
TempparsedHeaderCSVData h;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 17, 2020 at 4:19 pm
scdecade - will that not fail still if columns don't exist (ie footnote3 may not exist)? I just did a test on that and, like my solution, SQL sees "footnote3" in the query and if it doesn't exist in the table it will throw an error.
I think the only good solutions are going to be dynamic SQL OR stored procedures.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
April 17, 2020 at 4:22 pm
Well I have no sample data so that was a stab in the dark. Makes sense though! Maybe more iif's
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 17, 2020 at 8:33 pm
Well I have no sample data so that was a stab in the dark. Makes sense though! Maybe more iif's
Sorry about that. Here's sample data. Since as I mentioned the source table sometimes has more than 1 footnote column, I'm creating 2 tables here; one for a situation where there is just 1 footnote column, a second for a situation where there are 2 footnote columns. As I mentioned I'm trying to create a script that will take data from both footnote columns if both exists (like in the second table here) and concatenate it into a single column. Obviously if only 1 exists like the first table, I just want to take the text from that 1 footnote column.
-- Create table. This is an example where there is only 1 footnote column in the source table (that comes in from an external source)
CREATE TABLE TempparsedHeaderCSVData
(Footnote nvarchar(1000) NULL)
-- Insert test data
INSERT INTO TempparsedHeaderCSVData
(Footnote)
VALUES
('This is footnote 1')
,('Helllloooooo')
,('Size 9')
,('Are these my legs?')
Second table:
-- Create table. This is an example where there are 2 footnote columns in the source table (that comes in from an external source)
CREATE TABLE TempparsedHeaderCSVData
(Footnote nvarchar(1000) NULL, Footnote1 nvarchar(1000) NULL)
-- Insert test data
INSERT INTO TempparsedHeaderCSVData
(Footnote, Footnote1)
VALUES
('This is a footnote', 'This also is a footnote' )
,('Helllloooooo', 'Goodbye')
,('Size 9', 'But what was the question?')
,('Are these my legs?', 'Size 9')
April 17, 2020 at 8:42 pm
Dynamic SQL should work though, but you will want to be VERY careful with that. It would be possible for someone to do SQL injection attack doing that. I've done similar things with dynamic SQL to get database names and build up my backup scripts (gets database names and their recovery model). Not quite the same thing you are going for, but a similar idea.
If you go with the dynamic SQL route, I would test it with some ' characters in the footer on a test system just to make sure you don't get someone being dumb and injecting SQL into that.
Very good point, thank you. I'm reasonably sure the data is solid because it's coming from a reputable source, that being said can't be too careful so I like the idea of your multiple stored procedures solution
A different suggestion (that I tested) is to have multiple stored procedures for this. Use an IF statement to determine which stored procedure to run, then call the proper stored procedure depending on how many "footer"s there are.
I personally try to avoid dynamic SQL that can have user-input text in it. If I don't know or trust the input data, I don't put it in dynamic SQL.
So to make sure I understand:
April 17, 2020 at 8:44 pm
As a thought, rather than handling this in SQL, could you handle this in your import or export tool? Instead of having multiple "footer" fields, could you concat them before putting them in the database?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
April 17, 2020 at 8:50 pm
As a thought, rather than handling this in SQL, could you handle this in your import or export tool? Instead of having multiple "footer" fields, could you concat them before putting them in the database?
At the moment I'm using BULK INSERT in T-SQL to import the CSV file that comes in from the external source, but I guess I could build something in visual basic that would concatenate...not sure how I'd do that 'dynamically' though to handle situations where the CSV sometimes has multiple columns like footnote, footnote1, footnote2 etc and sometimes only has a single footnote column.
I guess the ideal way to do it would be with Python or something, not really my area though.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply