May 11, 2020 at 10:57 pm
Since I'm stuck at home with way too much time on my hands, I was looking at a really horrid database and wondering if there were an easier way to query it into proper structure. It's a design I inherited and ended up writing VBA to fix in Access. And the super handy thing in VBA is that I can return essentially an array of columns if I use a TableDef object. Here' s a horrid table definition and a few records:
CREATE TABLE [dbo].[ToxBad](
[PatientID] [int] NOT NULL,
[Cycle] [tinyint] NOT NULL,
[ALOPECIA] [tinyint] NULL,
[Causality1] [tinyint] NULL,
[Relatedness1] [tinyint] NULL,
[ANEMIA] [tinyint] NULL,
[Causality2] [tinyint] NULL,
[Relatedness2] [tinyint] NULL,
[APOXIA] [tinyint] NULL,
[Causality3] [tinyint] NULL,
[Relatedness3] [tinyint] NULL,
[ANOREXIA] [tinyint] NULL,
[Causality4] [tinyint] NULL,
[Relatedness4] [tinyint] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[ToxBad] ([PatientID], [Cycle], [ALOPECIA], [Causality1], [Relatedness1], [ANEMIA], [Causality2], [Relatedness2], [APOXIA], [Causality3], [Relatedness3], [ANOREXIA], [Causality4], [Relatedness4]) VALUES (10001, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, NULL, NULL, NULL)
GO
INSERT [dbo].[ToxBad] ([PatientID], [Cycle], [ALOPECIA], [Causality1], [Relatedness1], [ANEMIA], [Causality2], [Relatedness2], [APOXIA], [Causality3], [Relatedness3], [ANOREXIA], [Causality4], [Relatedness4]) VALUES (10002, 1, NULL, NULL, NULL, 2, 2, 2, NULL, NULL, NULL, NULL, NULL, NULL)
GO
INSERT [dbo].[ToxBad] ([PatientID], [Cycle], [ALOPECIA], [Causality1], [Relatedness1], [ANEMIA], [Causality2], [Relatedness2], [APOXIA], [Causality3], [Relatedness3], [ANOREXIA], [Causality4], [Relatedness4]) VALUES (10002, 3, NULL, NULL, NULL, 2, 2, 2, NULL, NULL, NULL, NULL, NULL, NULL)
GO
INSERT [dbo].[ToxBad] ([PatientID], [Cycle], [ALOPECIA], [Causality1], [Relatedness1], [ANEMIA], [Causality2], [Relatedness2], [APOXIA], [Causality3], [Relatedness3], [ANOREXIA], [Causality4], [Relatedness4]) VALUES (100001, 1, 1, 1, 1, 2, 1, 2, NULL, NULL, NULL, 3, 2, 3)
GO
INSERT [dbo].[ToxBad] ([PatientID], [Cycle], [ALOPECIA], [Causality1], [Relatedness1], [ANEMIA], [Causality2], [Relatedness2], [APOXIA], [Causality3], [Relatedness3], [ANOREXIA], [Causality4], [Relatedness4]) VALUES (100002, 2, NULL, NULL, NULL, 2, 1, 2, 3, 1, 2, 3, 2, 3)
GO
I managed to sort of brute force the normalization like this, but I was wondering if there were a way to identify the (Symptom, Grade, Causality, Relatedness) using maybe dynamic SQL (because the tables I received could have any number of symptom "groups". If the columns were static, I could use CROSS APPLY to unpivot, like this:
SELECT PatientID
, Cycle
, ToxFixed.Toxicity
, ToxFixed.Grade
, ToxFixed.Relatedness
, ToxFixed.Causality
FROM dbo.ToxBad
CROSS APPLY (VALUES ('Alopecia', ALOPECIA, Causality1, Relatedness1),
('Anemia', ANEMIA, Causality2, Relatedness2),
('Apoxia', APOXIA, Causality3, Relatedness3),
('Anorexia',ANOREXIA, Causality4, Relatedness4)) ToxFixed(Toxicity, Grade, Causality, Relatedness)
WHERE ToxFixed.Grade IS NOT NULL;
is there a fairly painless way to write some dynamic SQL to get the groups? I tried, and I'm not getting terribly far. (Don't hurt yourself laughing.)
/* REPLACE the static VALUES list with the result a query of the sys.columns table */
-- Get the ID of each column that's in uppercase
SELECT columnName
--, Toxicity = LEFT(columnName,1) + LOWER(RIGHT(columnName,LEN(columnName-1)))
, Causality
, Relatedness
, TestConcat = columnName + ',' + QUOTENAME(columnName,'''') + ',' + QUOTENAME(Causality,'''') + QUOTENAME(Relatedness,'''')
FROM
(SELECT xj.colName AS colNameGrp
, xj.column_id
, xj.num
, xj.MatchColumnID
, tb.columnName
, Causality = LEAD(tb.columnName,1) OVER (PARTITION BY xj.colName ORDER BY MatchColumnID)
, Relatedness = LEAD(tb.columnName,2) OVER (PARTITION BY xj.colName ORDER BY MatchColumnID)
FROM
(
SELECT x.colName
, x.column_id
, n.num
, MatchColumnID = x.column_id + n.num
FROM
(SELECT ac.name AS colName
, ac.column_id
FROM sys.all_objects ao
INNER JOIN sys.all_columns ac
On ao.object_id = ac.object_id
where ao.name = 'ToxBad'
and ac.name COLLATE Latin1_General_CS_AS = UPPER(ac.name) ) x
CROSS JOIN
(VALUES (0),(1),(2)) n(num)
) xj
INNER JOIN (SELECT ac.name AS columnName
, ac.column_id
FROM sys.all_objects ao
INNER JOIN sys.all_columns ac
On ao.object_id = ac.object_id
where ao.name = 'ToxBad') tb
ON xj.MatchColumnID = tb.column_id
) colz
WHERE colz.Relatedness IS NOT NULL;
If I just write something like that, I get column names etc in different records, but the grouping is completely screwy. I understand all that about "column order shouldn't matter", etc. I solved this in VBA (on the original problem, which was in Access) by looping over columns and processing the groups one at a time by building dynamic SQL and inserting each chunk of data into a new table, but I can't get this one to work. Is there a way to fix this easily, or is this just a hot mess?
thanks!
Pieter
May 12, 2020 at 3:31 am
Oh wait... I'm so bright somedays, it hurts my eyes. Use a cursor to loop over the column list. After the first 2 columns, count off (1 => column name = Symptom, value = Grade, 2 = Causality, 3 = Relatedness ).
May 13, 2020 at 12:22 pm
Hi pietlinden, I saw this yesterday and was going to reply but then got distracted. Is the ToxBad table in the structure it was received or did you normalize what you receive into that table? The dynamic SQL could be made to work but maybe there's a different starting point?
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
May 13, 2020 at 3:10 pm
That's how I received them (and by the dozen) - that's why I was wondering how to normalize them in T-SQL . It was a totally screwed up situation where I wasn't allowed to change the table structure even when it really needed to be done. (I did the original fixing in Access, and it was infinitely easier for me for some reason.)
This is the solution I came up with... but I'm sure there's an easier way:
CREATE PROC UnpivotedMess
AS
/*
Sorry, this is a bit of a game. I was trying to figure out
how to 'unpivot' sets of repeating groups in T-SQL.
*/
DECLARE @TableName VARCHAR(100) = 'ToxBad';
DECLARE @SkipRows INT = 2;
DECLARE @Sql VARCHAR(5000);
DECLARE @ColumnListing VARCHAR(5000);
DECLARE @sqlFinal VARCHAR(5000);
/*
Answer contains this (I was gonna use it with CROSS APPLY):
('Alopecia', ALOPECIA, Causality1, Relatedness1), ('Anemia', ANEMIA, Causality2, Relatedness2), ('Apoxia', APOXIA, Causality3, Relatedness3), ('Anorexia', ANOREXIA, Causality4, Relatedness4)
*/
SELECT @ColumnListing = STRING_AGG(CONVERT(varchar(max),y.Final),', ')
FROM (
SELECT
1 AS OtherGrp
-- , FixedSymptom = UPPER(LEFT(x.Symptom,1)) + LOWER(RIGHT(x.Symptom,LEN(x.Symptom)-1))
-- , ColumnName
-- , Causality
-- , Relatedness
, Final = '(' + '''' + UPPER(LEFT(x.Symptom,1)) + LOWER(RIGHT(x.Symptom,LEN(x.Symptom)-1)) + '''' + ', ' + ColumnName + ', ' + Causality + ', ' + Relatedness + ')'
FROM
(
SELECT OtherGrp
, Symptom = LOWER(MAX(CASE WHEN GrpNo = 0 THEN columnName ELSE null END))
, ColumnName = MAX(CASE WHEN GrpNo = 0 THEN columnName ELSE null END)
, Causality = MAX(CASE WHEN GrpNo = 1 THEN columnName ELSE null END)
, Relatedness = MAX(CASE WHEN GrpNo = 2 THEN columnName ELSE null END)
FROM
(SELECT ac.column_id
, ac.name AS columnName
, GrpNo = ac.column_id % 3
, OtherGrp = FLOOR(ac.column_id/3.0)
FROM sys.all_objects ao
INNER JOIN sys.all_columns ac
ON ao.object_id = ac.object_id
WHERE ao.name = @TableName
AND ac.column_id > @SkipRows) x
GROUP BY OtherGrp
) x
) y;
SET @sqlFinal =
'SELECT tb.PatientID
, tb.cycle
, ca.Symptom
, ca.Grade
, ca.Causality
, ca.Relatedness
FROM dbo.ToxBad tb
CROSS APPLY ( VALUES ' + @ColumnListing
+ ') ca (Symptom, Grade, Causality, Relatedness)'
+ ' WHERE ca.Grade IS NOT NULL;';
EXEC (@sqlFinal);
It's really ugly, I'm sure, but it's my first dynamic SQL to actually work! (Yay me!)
May 13, 2020 at 4:36 pm
So....what in the world are you doing where you need to know how toxic Alopecia is?
-------------------------------------------------------------------------------------------------------------------------------------
Please follow Best Practices For Posting On Forums to receive quicker and higher quality responses
May 13, 2020 at 4:48 pm
They're grading side effects. It was cancer drug testing. It was one of those jobs was okay until things slowed down enough that I could stop and think about the data they were collecting and what it meant. It was creepy.
If you want really cheery, find a copy of the database that the CDC or whoever puts out of the Common Toxicity Classifications for cancer... (it's an Access database with a modified extension).
May 13, 2020 at 8:47 pm
This is really good. The syntax is a little inconsistent and the use of sub-queries instead of cte's is not really considered current. Also, no need for the floor function when dealing with integers. Also, instead of EXEC() it's a good idea to get in the habit of using sp_executesql to run dynamic sql. sp_executesql can be parameterized and it can re-use query plans.
drop table if exists dbo.test_ToxBad
go
create table [dbo].test_ToxBad(
[PatientID] [int] NOT NULL,
[Cycle] [tinyint] NOT NULL,
[ALOPECIA] [tinyint] NULL,
[Causality1] [tinyint] NULL,
[Relatedness1] [tinyint] NULL,
[ANEMIA] [tinyint] NULL,
[Causality2] [tinyint] NULL,
[Relatedness2] [tinyint] NULL,
[APOXIA] [tinyint] NULL,
[Causality3] [tinyint] NULL,
[Relatedness3] [tinyint] NULL,
[ANOREXIA] [tinyint] NULL,
[Causality4] [tinyint] NULL,
[Relatedness4] [tinyint] NULL
) ON [PRIMARY]
go
insert [dbo].test_ToxBad ([PatientID], [Cycle], [ALOPECIA], [Causality1], [Relatedness1], [ANEMIA], [Causality2], [Relatedness2], [APOXIA], [Causality3], [Relatedness3], [ANOREXIA], [Causality4], [Relatedness4]) values
(10001, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, NULL, NULL, NULL),
(10002, 1, NULL, NULL, NULL, 2, 2, 2, NULL, NULL, NULL, NULL, NULL, NULL),
(10002, 3, NULL, NULL, NULL, 2, 2, 2, NULL, NULL, NULL, NULL, NULL, NULL),
(100001, 1, 1, 1, 1, 2, 1, 2, NULL, NULL, NULL, 3, 2, 3),
(100002, 2, NULL, NULL, NULL, 2, 1, 2, 3, 1, 2, 3, 2, 3);
go
drop proc if exists test_UnpivotedMess;
go
create proc test_UnpivotedMess
AS
set nocount on;
set ansi_warnings off;
declare
@TableName nvarchar(100)=N'test_ToxBad',
@SkipRows int=2,
@ColumnListing nvarchar(max),
@sqlFinal nvarchar(max);
with
max_cte(OtherGroup, Symptom, ColumnName, Causality, Relatedness) as (
select
ac.column_id/3,
lower(max(iif(ac.column_id%3=0, ac.[name], null))),
max(iif(ac.column_id%3=0, ac.[name], null)),
max(iif(ac.column_id%3=1, ac.[name], null)),
max(iif(ac.column_id%3=2, ac.[name], null))
from
sys.all_objects ao
join
sys.all_columns ac ON ao.object_id = ac.object_id
where
ao.[name] = @TableName
AND ac.column_id > @SkipRows
group by
ac.column_id/3)
select
@ColumnListing=string_agg(concat('(', '''', upper(left(Symptom,1)), lower(right(Symptom, len(Symptom)-1)), '''', ', ', ColumnName, ', ', Causality, ', ', Relatedness, ')'),', ')
from max_cte;
select @sqlFinal =
N'SELECT tb.PatientID
, tb.cycle
, ca.Symptom
, ca.Grade
, ca.Causality
, ca.Relatedness
FROM dbo.test_ToxBad tb
CROSS APPLY ( VALUES ' + @ColumnListing
+ ') ca (Symptom, Grade, Causality, Relatedness)'
+ ' WHERE ca.Grade IS NOT NULL;';
exec sp_executesql @sqlFinal;
go
exec test_UnpivotedMess;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
May 13, 2020 at 11:10 pm
Oh cool, super helpful!! Like I said, I have a lot to learn still! But I'll work my way through it.
Thanks for the feedback!
May 14, 2020 at 5:38 am
Just my dip of the toe into the problem. I have this thing for making my dynamic SQL look the same as static SQL. I also used a little trick to combine MAX type of CROSSTAB with STRING_AGG() and I also tend to dry out formulas just as an old habit that I'll never shake.
This uses the test data that scdecade was kind enough to put together. I didn't use variables for a lot of the stuff he did either. I should really get into that habit, but not tonight. 😀
--===== Local Variables
DECLARE @SQL NVARCHAR(MAX)
;
--===== Create the dynamic SQL to unpivot/normalize the data.
WITH
cteGroup AS
(--==== Prep for unpivoting the column names.
-- This also DRYs out some of the formulas that follow and
-- we don''t need to subtract anything from column_id because
-- we got lucky with the column_id of the row we start with.
SELECT GroupNum = (column_id)/3
,ColNum = (column_id)%3
,ColName = name
FROM sys.Columns
WHERE column_id >= 3
AND object_id = OBJECT_ID(N'dbo.ToxBad') --No join to sys.objects required.
)
,cteValues AS
(--==== This puts together the lines for the VALUES operator.
-- It''s a funny but really short way of combining a CROSSTAB with a STRING_AGG.
SELECT Vals = N'('+STRING_AGG(
+IIF(ColNum = 0, QUOTENAME(ColName,N'''')+N',',N'') --Symptom
+IIF(ColNum = 0, ColName,N'') --Grade
+IIF(ColNum = 1, ColName,N'') --Causality
+IIF(ColNum = 2, ColName,N'') --Relatedness
,N',')+N')'+NCHAR(10)
FROM cteGroup
GROUP BY GroupNum
)--==== Put the query together as Dynamic SQL.
SELECT @SQL = N'
SELECT tb.PatientID
,tb.Cycle
,Symptom = STUFF(LOWER(ca.Symptom),1,1,LEFT(UPPER(ca.Symptom),1))
,ca.Grade
,ca.Causality
,ca.Relatedness
FROM dbo.ToxBad tb
CROSS APPLY
(VALUES
' + SPACE(8)+STRING_AGG(Vals,SPACE(8)+N',')
+ SPACE(8)+N') ca (Symptom,Grade,Causality,Relatedness)
WHERE ca.Grade IS NOT NULL
ORDER BY PatientID,Cycle,Symptom
;'
FROM cteValues
;
--===== Display the query and execute it
PRINT @SQL;
EXEC sp_ExecuteSQL @SQL
;
If you aren't interested in DRY code or comments, the code gets a lot shorter. I just wouldn't want to look at it 6 months from now. 😀
I'll also state how much I LOVE STRING_AGG() compared to earlier brute force methods.
DECLARE @SQL NVARCHAR(MAX)
;
WITH cteValues AS
(
SELECT Vals = N'('+STRING_AGG(
+IIF(column_id%3 = 0, QUOTENAME(name,N'''')+N',',N'')
+IIF(column_id%3 = 0, name,N'')
+IIF(column_id%3 = 1, name,N'')
+IIF(column_id%3 = 2, name,N'')
,N',')+N')'+NCHAR(10)
FROM sys.columns
WHERE column_id >= 3
AND object_id = OBJECT_ID(N'dbo.ToxBad')
GROUP BY column_id/3
)
SELECT @SQL = N'
SELECT tb.PatientID
,tb.Cycle
,Symptom = STUFF(LOWER(ca.Symptom),1,1,LEFT(UPPER(ca.Symptom),1))
,ca.Grade
,ca.Causality
,ca.Relatedness
FROM dbo.ToxBad tb
CROSS APPLY
(VALUES
' + SPACE(8)+STRING_AGG(Vals,SPACE(8)+N',')
+ SPACE(8)+N') ca (Symptom,Grade,Causality,Relatedness)
WHERE ca.Grade IS NOT NULL
ORDER BY PatientID,Cycle,Symptom
;'
FROM cteValues
;
EXEC sp_ExecuteSQL @SQL
;
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2020 at 12:52 pm
Jeff, you have 2 solutions here, which is the preferred? I like the 2nd one. And why is there no INITCAP() function in SQL Server? That's waaaay overdue.
Regarding labeling query columns with "group =" syntax, is there a functional reason for that style of notation? Maybe I have C# on the brain but a single "=" implies assignment and not necessarily equivalence. "Select @var=" is an assignment. "Select var=" is just a labeling of query columns. Unless there's a reason 🙂 Maybe this is just style?
The reason this solution makes me happy is because it uses sys tables to solve a problem. I would like to do more of this in my own projects.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
May 14, 2020 at 2:43 pm
To answer the question, I like the first one better because the first CTE DRYs out some of the formulas. It doesn't make much difference here but, when you have much larger formulas, it means the world when it comes to readability. It can also mean the world if aggregation is involved... if you can pre-aggregate answers AND use a blocking operator in that same CTE to kind of materialize the aggregation (almost as good as a TempTable that does the same thing), you can get some seriously massive performance improvements.
The first one is also "documented" where the second is devoid of any such documentation.
As for me using column names on the left side of an equals sign instead of with "AS" on the right, I prefer it on the left with the "=" sign. To explain, I agree that form of code is an "assignment" in most languages and it is in SQL Server, as well. In this case, I'm assigned a value to an assigned column alias for an expression instead of a variable. I used to be a front-end programmer and a "batch run" programmer. When I started working with SQL, it bugged me to have all the column aliases after "AS" on the right. The alias names were really hard to pick out in the clutter especially when really long formulas were used not to mention that I was already used to having them on the left as variables in assignment statements. I already wrote what I call "River Code" (vertically aligned related sections... a form of "indenting") and so it all came very natural to me to have them on the left.
That's also when I actually had the epiphany that made me understand what "Set-Based" programming is when you boil it down and I used to say it to myself when my old managed code habits crept in. It's actually in my signature below. "Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
That and the "River Code" has really helped a lot since "vertical selection and editing" came to pass and also made me have a real appreciation for leading commas. If I'm working with something that has a lot of column names or aliases and I need to copy them, the copy is super easy because they're all left aligned with the commas. It also makes the code wicked easy to read.
The other thing is based on my revelation about under standing "Set-Based" code better. I work with a whole lot of front-enders and, just like it did for me, it helps them more easily understand that instead of using variables for each row, your still using the same thought process but you're assigning values by column while SQL Server keeps track of the rows. I've had many developers tell me that it made things a whole lot easier to understand even after they'd been writing SQL for years.
Last but not least, since all of the column names (aliased or otherwise) are all left aligned at the beginning of each line in the SELECT list, it makes it a lot quicker to find column names during "emergency changes" when 10,000 managers all try to get into your cube at once to see what you're doing about the problem. 😀
p.s. And the "River Code" I write looks like it might be a real PITA to get used to. I can't speak to that because I've been doing it so long I just do it auto-magically. I sometimes have to go back to fix an alignment but that also forces me to look at the code I've already written and I've found many a mistake that way. It's all a part of the "Make it work, make it fast, make it pretty... and it ain't done 'til it's pretty" habit of mine.
As a bit of a side bar, I will usually write some comments as to WHY I need to write a section of code and then write the code. It not only forces me to document the code but it helps a whole lot because I get interrupted a whole lot. I almost never have to say "Ok, where the hell was I?" after returning to what I was doing after being interrupted.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2020 at 4:34 pm
Incorporating Jeff's improvements and refactoring again this is what I came up. Since this is the 2019 thread I created an INITCAPS scalar function which should execute inline, although it's not tested because I'm only on compatibility level 140. Inline scalar execution is on 'by default' in 2019 (compatibility 150). The code inside the function is quite borrowed 🙂 I also made it so the proc takes the table name as an input parameter.
drop function if exists dbo.test_initcaps;
go
create function dbo.test_initcaps(@string nvarchar(4000))
returns nvarchar(4000) as
begin
return stuff(lower(@string),1,1,left(upper(@string),1));
end
go
drop proc if exists test_UnpivotedMess;
go
create proc test_UnpivotedMess(
@TableName nvarchar(256))
AS
set nocount on;
declare
@SkipRows int=3,
@sql nvarchar(max);
with
max_cte(Vals) as (
select
concat(N'(', string_agg(concat(iif(column_id%@SkipRows = 0, quotename(name,N'''')+N',',N''),
iif(column_id%@SkipRows = 0, name,N''),
iif(column_id%@SkipRows = 1, name,N''),
iif(column_id%@SkipRows = 2, name,N'')), N','), N')')
from sys.columns
where column_id >= @SkipRows
and object_id = object_id(@TableName)
group by column_id/@SkipRows)
select
@sql = N'SELECT tb.PatientID
,tb.Cycle
,dbo.test_initcaps(ca.Symptom)
,ca.Grade
,ca.Causality
,ca.Relatedness
FROM ' + @TableName + N' tb
CROSS APPLY
(VALUES' + string_agg(Vals, N',') + N') ca (Symptom, Grade, Causality, Relatedness)
WHERE ca.Grade IS NOT NULL
ORDER BY PatientID,Cycle,Symptom;'
from
max_cte;
exec sp_executesql @sql;
go
exec test_UnpivotedMess N'test_ToxBad';
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
May 14, 2020 at 7:08 pm
Does the other "Initial Caps" function that you're talking about only hit the first letter of the entire string or is it more like "Title Caps"?
And this is what I love about the community here on SSC... innovative discussion without some moderator bashing someone for going past the original question.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2020 at 7:24 pm
It's title caps. In Oracle since 8i
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
May 14, 2020 at 7:29 pm
Interesting. Does it correctly handle names like MacDonald, McDonald, van Halen, O'Leary, and things like "This is a Book"?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply