May 17, 2012 at 4:02 am
I have a query that should return the fields of a table along with the field properties like type, nullability, default, etc. (and puts them into a string so that it can be used in a 'CREATE TABLE' statement but that's subsidiary). Here it is:
DECLARE @sql NVARCHAR(MAX) = ''
, @sema SYSNAME = 'dbo', @tabla SYSNAME = 'arbitrary_table'
SELECT
TOP 1000000000
+ CASE
WHEN c.column_id = 1 THEN ''
ELSE ','
END
+ c.[name] + ''
+ CASE
WHEN st.[name] LIKE '%char%' OR st.[name] LIKE '%binary%' THEN UPPER( st.[name] ) + '(' + CAST( c.max_length AS VARCHAR( 64 ) ) + ')'
WHEN st.[name] = 'DECIMAL' OR st.[name] = 'NUMBER' THEN UPPER( st.[name] ) + '(' + CAST( c.[precision] AS VARCHAR(64)) + ', ' + CAST(c.scale AS VARCHAR(64)) + ')'
ELSE UPPER( st.[name] )
END
+ CASE
WHEN c.is_identity = 1 THEN ' IDENTITY(1,1)'
ELSE ''
END
+ CASE c.is_nullable
WHEN 0 THEN ' NOT NULL'
WHEN 1 THEN ' NULL'
ELSE 'WTH???'
END
+ CASE WHEN dc.[name] IS NOT NULL THEN ' CONSTRAINT ' + dc.[name] + ' DEFAULT ' + dc.[definition] + ' ' ELSE ' -- no default ' END
+ CASE
WHEN RANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1 THEN ''
ELSE CHAR( 10 )
END
FROM
sys.schemas s
INNER JOIN
sys.tables t
ON t.[schema_id] = s.[schema_id]
INNER JOIN
sys.[columns] c
ONc.[object_id] = t.[object_id]
INNER JOIN
sys.systypes st
ONst.xtype = c.system_type_id
LEFT OUTER JOIN sys.default_constraints AS dc
ON t.[object_id] = dc.parent_object_id
AND c.column_id = dc.parent_column_id
WHERE s.[name] = @sema AND t.[name] = @tabla AND st.[name] != 'sysname' AND st.uid = 4
ORDER BY c.column_id ASC
PRINT @sql
Though it might not cover all the special cases, it works fine. Until I comment the 'TOP 1000000000' line after which it returns only the last field. I'm obviously missing something important here but I can't figure what it is. Any ideas?
EDIT:
@michael-2 vessey: Actually, I removed an unnecessary line and the tail of it stayed there because of the word wrap. I've fixed that, thanks for the warning.
May 17, 2012 at 4:19 am
there seems to be some syntax issues in your code
Msg 156, Level 15, State 1, Line 28
Incorrect syntax near the keyword 'DESC'.
i think it's a cut and paste issue
MVDBA
May 17, 2012 at 4:36 am
Fixed, thanks for the warning.
May 17, 2012 at 4:40 am
if you take out the assertion and just return a recordset it works fine - i can't see why the top clause if affecting you when concatenating into a string
initially i was looking to see if there was some sort of NULL issue, but i tested it woth concat_null_yields_null OFF
🙁
MVDBA
May 17, 2012 at 5:30 am
Can you tell me what is use of statment
RANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1
May 17, 2012 at 5:39 am
i also tested it though a cursor and the cursor worked fine...
interestinigly - if you remove the order by clause then the query also works - very frustrating
i've been comparing the query plans , and the only difference (other than sort/Sort top N) is the version with the TOP clause in has an extra Compute scalar that seems to be typecasting data.. I suspected this is some sort of typecasting, but i explicitly cast everything as nvarchar(max) and it still get truncated
i then tried this using varchar(8000) and you still get the same problem - although i can't replicate it with other queries
i just noticed that if you remove the char(10) it also begins to work - i replaced it with '' and it works, but if i put 'a' instead of char(10) it fails again
finally came to the conclusion - the RANK OVER function is causing an issue in some way
if i change the PARTITION BY to c.columnid it works (albeit the wrong data), but c.object_id does not
is definatly in that section of code , but it all looks fine to me - maybe raise a case with MS
how did you come to the step of using TOP 1000000?
MVDBA
May 17, 2012 at 5:40 am
srikant maurya (5/17/2012)
Can you tell me what is use of statmentRANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1
Sure. This query is a part of a snippet that creates a script that reproduces an arbitrary table with all of its side-objects (constraints, indices, etc.) The reason for the statement you ask is merely cosmetics (it checks whether there should be a linefeed or not). Without it the result would be something like this:
CREATE TABLE arbitrary_table_name
.
.
.
,lastcolumnSMALLINT NOT NULL CONSTRAINT DF_arbitrary_table_name__lastcolumn DEFAULT ((25))
-- removing that check would result in this unwanted blank line here
)
GO
The statement is meaningless in the example here.
May 17, 2012 at 5:50 am
The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.
There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.
May 17, 2012 at 5:52 am
michael vessey (5/17/2012)
how did you come to the step of using TOP 1000000?
Yes, I compared the exec plan as well, but I did not find anything meaningful. Tried a few things you did (except cursors) to no avail.
'TOP 1000000' was part of my anger management technique 🙂 I began with 'TOP 1' and then increased x in 'TOP x' to see which column is the suspect for this villainy. When x went over the max number of columns I removed it and stared dumbly at the one-line result once again. So I chose a random large number that was well over the column count to see if it makes any difference. It didn't 🙁
May 17, 2012 at 6:05 am
HowardW (5/17/2012)
The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.
I am, but that does not explain this issue. The issue is not the order of the returned data, it's the lack of returned data. Omitting the TOP clause results in data loss (even when using MAXDOP 1 query hint, so it's not the parallelism either).
May 17, 2012 at 7:05 am
gemisigo (5/17/2012)
HowardW (5/17/2012)
The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.
I am, but that does not explain this issue. The issue is not the order of the returned data, it's the lack of returned data. Omitting the TOP clause results in data loss (even when using MAXDOP 1 query hint, so it's not the parallelism either).
Fair enough, but it's not a bug, it's an unreasonable expectation that the internal query processor works in a way that you can use it for running procedural code within a select statement. It can segment out the delivery of rows in a select statement in various ways and there's no guarantee that it will persist the variable's value between these segments.
May 17, 2012 at 8:01 am
HowardW (5/17/2012)
Fair enough, but it's not a bug, it's an unreasonable expectation that the internal query processor works in a way that you can use it for running procedural code within a select statement. It can segment out the delivery of rows in a select statement in various ways and there's no guarantee that it will persist the variable's value between these segments.
I haven't said it was a bug. And yes, it's undocumented and though it was rather "reliable" in the last don't even know how many cases, I know it was stated unreliable by some (by the way, has anyone proved that yet?), therefore I heed your warning and handle it with care. If you compare execution plans they are pretty much the same (yes, Segments as well) all the way up to Sequence Project node. Beyond that there are differences, of course (TOP has TopN Sort and there is a "surplus" Compute Scalar, etc.). Also, both plans yield the same number of records.
The only other difference is that the TOP version retains the contents of the variable while the TOPless keeps only the bottom 🙂 I can live with the fact it is unreliable here and it does not work the way it's supposed. My question is, why does TOP alter that behavior?
p.s. I don't think it's unreasonable to expect that the internal query processor (or anything else) works in a way that makes my task easier, the query faster and the returned dataset consistent. I believe that's the reason why these things were created in the first place. I accept, though, that personal opinions about how to do this or that might differ. But then again, it's just my personal opinion. Cheers 😉
May 17, 2012 at 8:13 am
interestingly enough i tried another workarount using a CTE.
it works fine every time - saves you having to use a temp table
🙂
as for whether it's a bug or not, well why not let microsoft know ??? it's an interesting and replicable behaviour that may have other far reaching implications we don't know of!!!
CTE CODE BELOW
GO
PRINT 'next'
DECLARE @sql VARCHAR(8000) = ''
, @sema SYSNAME = 'dbo', @tabla SYSNAME = 'defer';
with x(data,id) as (
SELECT
CASE
WHEN c.column_id = 1 THEN convert(varchar(max),'')
ELSE convert(varchar(max),',')
END
+ convert(varchar(max),c.[name]) + convert(varchar(max),'' )
+ CASE
WHEN st.[name] LIKE '%char%' OR st.[name] LIKE '%binary%' THEN convert(varchar(max),UPPER( st.[name] )) + convert(varchar(max),'(') + CAST( c.max_length AS VARCHAR( max ) ) + convert(varchar(max),')')
WHEN st.[name] = 'DECIMAL' OR st.[name] = 'NUMBER' THEN convert(varchar(max),UPPER( st.[name] ) )+ convert(varchar(max),'(') + CAST( c.[precision] AS VARCHAR(max)) +convert(varchar(max), ', ') + CAST(c.scale AS VARCHAR(max)) + convert(varchar(max),')')
ELSE convert(varchar(max),UPPER( st.[name] ))
END
+ CASE
WHEN c.is_identity = 1 THEN convert(varchar(max),' IDENTITY(1,1)')
ELSE convert(varchar(max),'')
END
+ CASE c.is_nullable
WHEN 0 THEN convert(varchar(max),' NOT NULL')
WHEN 1 THEN convert(varchar(max),' NULL')
ELSE convert(varchar(max),'WTH???')
END
+ CASE WHEN dc.[name] IS NOT NULL THEN convert(varchar(max),' CONSTRAINT ') +convert(varchar(max), dc.[name]) + convert(varchar(max),' DEFAULT ') + convert(varchar(max),dc.[definition]) + convert(varchar(max),' ') ELSE convert(varchar(max),' -- no default ') END
+ CASE
RANK() OVER( PARTITION BY t.name ORDER BY c.column_id DESC ) when 1 THEN ''
else CHAR(10)
END as data, RANK() OVER( PARTITION BY t.name ORDER BY c.column_id DESC )as id
FROM
sys.schemas s
INNER JOIN
sys.tables t
ON t.[schema_id] = s.[schema_id]
INNER JOIN
sys.[columns] c
ONc.[object_id] = t.[object_id]
INNER JOIN
sys.systypes st
ONst.xtype = c.system_type_id
LEFT OUTER JOIN sys.default_constraints AS dc
ON t.[object_id] = dc.parent_object_id
AND c.column_id = dc.parent_column_id
WHERE s.[name] = @sema AND t.[name] = @tabla AND st.[name] != 'sysname' AND st.uid = 4
)
select @sql=@sql+DATA from x ORDER BY x.id ASC;
PRINT @sql
GO
MVDBA
Viewing 15 posts - 1 through 15 (of 39 total)
You must be logged in to reply to this topic. Login to reply