Hidden Formatting Troubles with STR() (SQL Spackle)

Question

Hidden Formatting Troubles with STR() (SQL Spackle)

Viewing 15 posts - 61 through 75 (of 80 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004419 More actions · Answer 1

jimgawn (1/3/2014)
Thank you for your gracious response!
With respect to formatted Date tables, I think a lot of data warehousing folks would beg to differ. People like Ralph Kimball advocate using a date table to (a) simplify query-writing and (b) to reduce computational overhead when running queries and generating reports.

That would certainly be one of the exceptions to the rule. Thanks again for the feedback and welcome aboard!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182917 More actions · Answer 2

Thank you Jeff for a good "filler", there are many cracks "float"ing around when it comes to math and numbers on the platform. It is also far to easy to miss the fact that float and real are Approximate-number data types, in fact not reliable after the 15th digit as this little factorial sample here shows;

/* Calculate Factorial */

/* SQL Server => 2012 */

DECLARE @TOPBIGINT= 22;

;WITH NUMBERS AS

(

SELECT

CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS FLOAT(53)) AS N

FROM

(VALUES (1), (1), (1), (1), (1), (1), (1), (1)) AS A(N)

,(VALUES (1), (1), (1), (1), (1), (1), (1), (1)) AS B(N)

ORDER BY 1 OFFSET 0 ROWS FETCH FIRST (@TOP) ROWS ONLY

)

,N_EXP AS

(

SELECT

ROW_NUMBER() OVER (PARTITION BY N2.N ORDER BY (SELECT NULL)) AS N_RID

,N2.N AS N

,EXP(SUM(LOG(N1.N)) OVER (PARTITION BY N2.N)) AS N_EXP

,CAST(EXP(CAST((SUM(CAST(LOG(N1.N) AS DECIMAL(38,16)))

OVER (PARTITION BY N2.N)) AS DECIMAL(38,16)))

AS DECIMAL(38,16)) AS N_DCEXP

FROM NUMBERS N1

CROSS JOIN NUMBERS N2

WHERE N1.N <= N2.N

)

SELECT

NX.N AS N

,NX.N_EXP AS NX_EXP

,CAST(NX.N_EXP AS DECIMAL(38,16)) AS NX_EXPD

,NX.N_DCEXP AS NX_DDEXP

,STR(NX.N_EXP,38,16) AS NX_STR

FROM N_EXP NX

WHERE NX.N_RID = 1;

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

Eirikur Eiriksson (1/4/2014)
Thank you Jeff for a good "filler", there are many cracks "float"ing around when it comes to math and numbers on the platform. It is also far to easy to miss the fact that float and real are Approximate-number data types, in fact not reliable after the 15th digit as this little factorial sample here shows;
/* Calculate Factorial */
/* SQL Server => 2012 */
DECLARE @TOPBIGINT= 22;
;WITH NUMBERS AS
(
SELECT
CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS FLOAT(53)) AS N
FROM
(VALUES (1), (1), (1), (1), (1), (1), (1), (1)) AS A(N)
,(VALUES (1), (1), (1), (1), (1), (1), (1), (1)) AS B(N)
ORDER BY 1 OFFSET 0 ROWS FETCH FIRST (@TOP) ROWS ONLY
)
,N_EXP AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY N2.N ORDER BY (SELECT NULL)) AS N_RID
,N2.N AS N
,EXP(SUM(LOG(N1.N)) OVER (PARTITION BY N2.N)) AS N_EXP
,CAST(EXP(CAST((SUM(CAST(LOG(N1.N) AS DECIMAL(38,16)))
OVER (PARTITION BY N2.N)) AS DECIMAL(38,16)))
AS DECIMAL(38,16)) AS N_DCEXP
FROM NUMBERS N1
CROSS JOIN NUMBERS N2
WHERE N1.N <= N2.N
)
SELECT
NX.N AS N
,NX.N_EXP AS NX_EXP
,CAST(NX.N_EXP AS DECIMAL(38,16)) AS NX_EXPD
,NX.N_DCEXP AS NX_DDEXP
,STR(NX.N_EXP,38,16) AS NX_STR
FROM N_EXP NX
WHERE NX.N_RID = 1;

I have a question about your code above, why the semicolon (;) before the WITH when the preceding statement is terminated by a semicolon?

Eirikur Eiriksson SSC Guru Points: 182917 More actions · Answer 4

Hi Lynn,its an old habit I guess rather than belt and suspender kind of thing.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Okay, I was just curious as it is a pet peeve of mine. The semicolon is a statement terminator not a statement begininator.

Jeff Moden SSC Guru Points: 1004419 More actions · Answer 6

Eirikur Eiriksson (1/4/2014)
Thank you Jeff for a good "filler", there are many cracks "float"ing around when it comes to math and numbers on the platform. It is also far to easy to miss the fact that float and real are Approximate-number data types, in fact not reliable after the 15th digit as this little factorial sample here shows;

Thanks for the feedback, Eirikur,

Just to be clear and to prevent other bombastic posts from appearing on this thread, both FLOAT and REAL are entirely reliable with what they were designed to do. The times when they appear to be "unreliable" is when you have full-precision decimal expectations rather than the expectations of what FLOAT and REAL were designed for. The other thing that folks need to understand is that the floating point math behind the scenes of FLOAT and REAL is not base 10... its base 2 or binary. That's why certain numbers can't be duplicated within the precision of FLOAT and REAL.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004419 More actions · Answer 7

Lynn Pettis (1/4/2014)
Okay, I was just curious as it is a pet peeve of mine. The semicolon is a statement terminator not a statement begininator.

BWAAA-HAAAA!!!! The use of semicolons at all used to be a pet peeve of mine and still kinda is. I love it when some programming language comes back at me and basically states "Error... you're missing a semicolon at line #nnn". Great... if the language is that bloody smart, why does it need a semicolon to begin with. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182917 More actions · Answer 8

Think you nailed it here, just what I was trying to say :Whistling:

Dwain Camps SSC Guru Points: 86978 More actions · Answer 9

Lynn Pettis (1/4/2014)
Okay, I was just curious as it is a pet peeve of mine. The semicolon is a statement terminator not a statement begininator.

I confess to being one of Lynn's converts. 😉

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 10

One slip Jeff, and I am surprised both that you made it and that no one else mentioned it yet (in my quick review of the thread): you used an Nvarchar(10) to store the output. That introduces a CONVERT_IMPLICIT into the Compute Scalar operation that is unnecessary and which skews the CPU and duration of the runs.

Second, I tried this, expecting the CONVERT_IMPLICIT to float removal to make the STR() run much more efficient. I was mistaken and quite surprised that it did not make it much faster!

CREATE TABLE #t2 (a float)

GO

INSERT #t2

SELECT CAST(somenumber AS float)

FROM #MyHead

DECLARE @Bitbucket VARCHAR(10)

SELECT @Bitbucket = STR(a,10) FROM #t2

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Dwain Camps SSC Guru Points: 86978 More actions · Answer 11

Jeff Moden (1/3/2014)
...
Ok... that tears it. :blush: I have to get off my dead hinny and install 2K12. Thanks to folks like you, Wayne Sheffield, and a couple of others, I not falling behind in knowledge of the version but I really need to start playing with it because of things like the test you did above.

I knew some good would come of that!

Jeff Moden (1/3/2014)
I do have a suggestion though. As I pointed out in the article at http://www.sqlservercentral.com/articles/T-SQL/91724/, SET STATISTICS can have some really undesirable impact on what gets reported for performance. Would you try the same test using SQL Profiler and see if you get similar results?
Thanks for that and thanks for the feedback.

Indeed, I am happy to oblige. I didn't expect much difference given the magnitude of the original results I obtained.

SQL Profiler results (1 run):

CPU Reads Writes Duration

Cast, Concatenate, and Size 343 1612 0 369

The STR() Method is SLOWER 858 1612 0 888

The FORMAT() Method is WAY, WAY SLOWER 35319 1612 0 44126

To be fair to Microsoft and FORMAT, the function does a lot more than just what is being done here. It does follow the pattern that I espouse, namely that a generalized tool will usually perform more slowly than one that is specifically tailored to the task at hand. That doesn't mean you shouldn't use them, just that you need to make sure in the end after your query is working that you're satisfied with the performance and that maybe it would take great pains to make it only a little faster (clearly that is not the case here).

I am curious about one thing. I read the article you linked to when it came out and it seemed like the specific case of biased time results for SET STATISTICS was limited to testing of UDFs. At least that is what I (for some reason) assumed when I read it.

I believe you've suggested before that it may not only apply to that limiting case. Do you have some specific evidence of that Jeff? It is not like you to make generalizations unless you've got some evidence to back it up.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Jeff Moden SSC Guru Points: 1004419 More actions · Answer 12

First, thanks for the test. On the subject of defending MS for the slowness of FORMAT based on the fact that it's a generic function, I'm sure that it was designed to operate at the machine language level and the fact that it's 119 times slower than some T-SQL "hack" code is absolutely deplorable. They should turn in their "man" cards as punishment. 😉

On the SET STATISTICS thing, I haven't looked for evidence that it affects anything other than Scalar and Multi-Statement UDFs (not to be confused with iTVFs) but it has made me very, very cautious in its use. It would have been a joy if that was the reason why FORMAT ran so slow instead of it falling into the category of being seriously performance challenged code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

paul.goldstraw SSCrazy Points: 2626 More actions · Answer 13

Sorry if I'm just being dense by asking this, I've never used STR() either so this is new to me. The article says many people use STR() for its ability to right-justify figures, and that besides the rounding issues, which I understand and am now aware of, there is a performance penalty when using it. You then run a timed test to compare it to a more complex, but quicker approach. However when I run a modified version of the code (to strip out the timing code - I'll take your word for it it's quicker) and the output to a variable, the faster approach doesn't right justify the data like STR() does.

Here is what I'm running (limited to 10 rows here)

--===== Conditionally drop and rebuild a test table in a nice safe place

-- that everyone has. This only takes several seconds.

IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL

DROP TABLE #MyHead

;

SELECT TOP 10

SomeNumber = IDENTITY(INT,1,1)

INTO #MyHead

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

SELECT RIGHT('0000000000' + CAST(SomeNumber AS VARCHAR(10)),10)

FROM #MyHead

;

SELECT STR(SomeNumber,10)

FROM #MyHead

and here is the output

(10 row(s) affected)

----------

0000000001

0000000002

0000000003

0000000004

0000000005

0000000006

0000000007

0000000008

0000000009

0000000010

(10 row(s) affected)

----------

1

2

3

4

5

6

7

8

9

10

(10 row(s) affected)

Is there something I'm not understanding, or am doing incorrectly? I would expect that any alternative would output the same results as STR(). Any insight would be appreciated.

Thanks

Paul

Jeff Moden SSC Guru Points: 1004419 More actions · Answer 14

Sure. There are actually three implied reasons. The first is that zeros are simply easier to see and verify the presence of in an article than a blob of spaces are. The second is to show a bit of flexibility of using different characters, the more common of which is "0". The third is that "a lot of people" have to produce output for fixed length files and the numeric fields need to be left-padded with zeros, something that the already slower STR() doesn't do without extra code. Those weren't specifically mentioned in the article because I didn't want to distract folks from the main purposes of the article. As with any article, there is the potential for casual learning. Hopefully, even beginners will immediately understand that you should simply replace the zeros with spaces to make the output like that of the STR() function.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

paul.goldstraw SSCrazy Points: 2626 More actions · Answer 15

Thanks Jeff, makes sense. If someone asked me to right justify some text I think the quicker solution would be closer to what I would have attempted anyway as i'm more used to using those sorts of string manipulations, but at least now I know for sure it's the better implementation

Thanks

Paul