February 11, 2019 at 7:04 pm
Microsoft SQL Server 2016 (SP1-GDR) (KB4019089) - 13.0.4206.0 (X64)
I am performing a very simple table insert and seeing the dreaded
Msg 8152, Level 16, State 2, Line 669
String or binary data would be truncated.
The statement has been terminated.
I check all my lengths and everything is fine. This is a super simple insert. Only 33 rows! I can see everything, I even do a LEN on every column. Data types match as well.I get the idea to run this through a while loop and, amazingly, it works! Just for kicks I delete my inserts I made with the while loop and try again. Nope! It simply will not insert with the set based approach. What gives?
This works:
select 1 --just to set initial count so it runs reliably
while @@ROWCOUNT > 0
BEGIN
INSERT INTO BIOGRAPH_MASTER
(ID_NUM, GENDER, BIRTH_DTE
, user_name, job_name, job_time)
SELECT n.ID_NUM, t.Gender, t.Birth_date
,'TE_ADMIN' as USER_NAME, 'recruitment_data_insert_v99' as JOB_NAME, GETDATE() as JOB_TIME
from NAME_MASTER n join temp_ad_act_20190208 t
on n.EMAIL_ADDRESS = t.EMAIL
left join BIOGRAPH_MASTER b on b.ID_NUM = n.ID_NUM
where b.ID_NUM is null
and t.SEQ_NUM =
(
select MIN(b.SEQ_NUM)
from NAME_MASTER a join temp_ad_act_20190208 b
on a.EMAIL_ADDRESS = b.EMAIL
left join BIOGRAPH_MASTER c on c.ID_NUM = a.ID_NUM
where c.ID_NUM is null
)
END
This does NOT work!:
INSERT INTO BIOGRAPH_MASTER
(ID_NUM, GENDER, BIRTH_DTE
, user_name, job_name, job_time)
SELECT n.ID_NUM, t.Gender, t.Birth_date
,'TE_ADMIN' as USER_NAME, 'recruitment_data_insert_v99' as JOB_NAME, GETDATE() as JOB_TIME
from NAME_MASTER n join temp_ad_act_20190208 t
on n.EMAIL_ADDRESS = t.EMAIL
left join BIOGRAPH_MASTER b on b.ID_NUM = n.ID_NUM
where b.ID_NUM is null
I've even run it row by row, without the while loop, for 33 times, and every row inserts just fine. It's driving me nuts! Why won't the good ol' set based insert work? Where am I going wrong here?
February 11, 2019 at 7:41 pm
Could you post one or two "row-by-row" inserts that worked as well as the table definition of BIOGRAPH_MASTER table? There's not enough information here to figure out what's wrong.
February 12, 2019 at 7:16 am
Please post the table definitions for NAME_MASTER, temp_ad_act_20190208 and BIOGRAPH_MASTER. Or at least post the definitions of the columns used.
This has probably nothing to do with your problem, but I recommend that you use NOT EXISTS instead of the LEFT JOIN.
INSERT INTO BIOGRAPH_MASTER
(
ID_NUM,
GENDER,
BIRTH_DTE,
user_name,
job_name,
job_time
)
SELECT n.ID_NUM,
t.Gender,
t.Birth_date,
'TE_ADMIN' as USER_NAME,
'recruitment_data_insert_v99' as JOB_NAME,
GETDATE() as JOB_TIME
FROM NAME_MASTER n
JOIN temp_ad_act_20190208 t ON n.EMAIL_ADDRESS = t.EMAIL
WHERE NOT EXISTS( SELECT *
FROM BIOGRAPH_MASTER b
WHERE b.ID_NUM = n.ID_NUM);
February 12, 2019 at 11:25 am
Here are the column definitions of anything I believe is relevant. NAME_MASTER
BIOGRAPH_MASTER
temp_ad_act_20190208
I'm showing 90% of the temp file columns so you could see that it is a very simple file. What I am doing (for starters) is matching them on obvious fields like e-mail address. In this case, I have 33 matches, that I have verified to match on name as well. I've done this many, many times, (much to my chagrin!). As much as I try to get a "clean" file, there are always issues with column length, etc. That's why I just import most columns in as a large nvarchar. Yes, I HAVE tried LTRIM and RTRIM around all of these columns and run extensive LEN testing. It's all the same.The set based fails and the loop (or one by one) works. Could it be that SSMS is looking at those large nvarchars and freaking out? If so, why doesn't it also short-circuit when I run them one by one?
Sure, just leave the while loop out in the example above. Like this:
SELECT n.ID_NUM, t.Gender, t.Birth_date This section specifically is what restricts the output to one row at a time: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thank You Luis. I'm assuming it's more efficient to do so? /***********************/ Some screenshots for y'all: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I randomized the data here, don't want to show the true ID_NUMs of course. They are the same length, though. A sample of what I am trying to insert: |
February 12, 2019 at 12:01 pm
The only thing that I could be happening is with the large gender column on your temp table. Maybe you have some non-visible characters in there.
Here's an example:
CREATE TABLE NAME_MASTER(
ID_NUM int NOT NULL,
EMAIL_ADDRESS char(60)
)
CREATE TABLE BIOGRAPH_MASTER(
ID_NUM int NOT NULL,
BIRTH_DTE datetime NULL,
GENDER char(1) NULL ,
USER_NAME varchar(513) NULL,
JOB_NAME varchar(30) NULL,
JOB_TIME datetime NULL
)
CREATE TABLE temp_ad_act_20190208 (
--Column_name Type Computed Length Prec Scale Nullable TrimTrailingBlanks FixedLenNullInSource Collation
Last_Name nvarchar(255),
First_Name nvarchar(255),
Addr_Line_1 nvarchar(255),
City nvarchar(255),
State nvarchar(255),
Zip float,
Email nvarchar(255),
Gender nvarchar(255),
Birth_date datetime,
SEQ_NUM int
)
INSERT INTO NAME_MASTER VALUES(1, 'luis@hotmail.com'),(2, 'lcazares@hotmail.com')
INSERT INTO temp_ad_act_20190208 VALUES( 'Cazares', 'Luis', 'Unknown', 'Atlanta', 'GA', 30302, 'luis@hotmail.com', 'M', '19000101', 1)
INSERT INTO temp_ad_act_20190208 VALUES( 'Cazares', 'Luis', 'Unknown', 'Atlanta', 'GA', 30302, 'luisverylongemailaddresthatwontfitonsixtycharacters@hotmail.com', 'M', '19000101', 2)
INSERT INTO temp_ad_act_20190208 VALUES( 'Cazares', 'Luis', 'Unknown', 'Atlanta', 'GA', 30302, 'lcazares@hotmail.com', 'M' + CHAR(0), '19000101', 3)
SELECT n.ID_NUM,
t.Gender,
t.Birth_date,
'TE_ADMIN' as USER_NAME,
'recruitment_data_insert_v99' as JOB_NAME,
GETDATE() as JOB_TIME
FROM NAME_MASTER n
JOIN temp_ad_act_20190208 t ON n.EMAIL_ADDRESS = t.EMAIL
WHERE NOT EXISTS( SELECT *
FROM BIOGRAPH_MASTER b
WHERE b.ID_NUM = n.ID_NUM);
INSERT INTO BIOGRAPH_MASTER
(
ID_NUM,
GENDER,
BIRTH_DTE,
user_name,
job_name,
job_time
)
SELECT n.ID_NUM,
t.Gender,
t.Birth_date,
'TE_ADMIN' as USER_NAME,
'recruitment_data_insert_v99' as JOB_NAME,
GETDATE() as JOB_TIME
FROM NAME_MASTER n
JOIN temp_ad_act_20190208 t ON n.EMAIL_ADDRESS = t.EMAIL
WHERE NOT EXISTS( SELECT *
FROM BIOGRAPH_MASTER b
WHERE b.ID_NUM = n.ID_NUM);
GO
--DROP TABLE BIOGRAPH_MASTER, NAME_MASTER, temp_ad_act_20190208
EDIT: Yes, NOT EXISTS is more efficient and clearer on what it does. This article explains it with examples: https://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/
February 12, 2019 at 12:24 pm
Ok. Just for kicks, I will move that column somewhere else, drop it, recreate it to char(1), restore, and give it a go again. Still, it doesn't make sense that it DOES work when I run them one by one. :\
February 12, 2019 at 12:52 pm
Check for INSERT triggers on the table being INSERTed into, just to be sure. The error could be occurring in a trigger rather than for the main INSERT.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 12, 2019 at 5:07 pm
Success! Yet, the mystery continues as I do not feel satisfied with WHY this happens in T-SQL.
When I ran this, in an attempt to modify the Gender column:alter table temp_ad_act_20190208
add Gender_Temp varchar(1)
BEGIN
update a
set Gender_Temp = a.Gender
from temp_ad_act_20190208 a
END
I received the EXACT same error as before. So there, it's definitely that column that is the culprit. Let me be clear, I did check for LEN before, however, I was only checking in my results set, NOT the entire temp file. I updated the two values (out of 3,000+ rows) with a long string to (null), then the code below worked just fine. INSERT INTO BIOGRAPH_MASTER
(ID_NUM, GENDER, BIRTH_DTE
, USER_NAME, JOB_NAME, JOB_TIME)
SELECT n.ID_NUM, t.Gender, t.Birth_date
,'TE_ADMIN' as USER_NAME, 'recruitment_data_insert_v99' as JOB_NAME, GETDATE() as JOB_TIME
from NAME_MASTER n join temp_ad_act_20190208 t
on n.EMAIL_ADDRESS = t.EMAIL and t.Gender is not null
where NOT EXISTS
(SELECT *
FROM BIOGRAPH_MASTER b
WHERE b.ID_NUM = n.ID_NUM)
The remaining BURNING question
What explains the inconsistency in how the SQL engine is evaluating column values? It seems that it evaluates ALL (distinct?) column values when using a set based approach, yet, when running a single row insert, it gives you a free pass.
February 12, 2019 at 6:10 pm
zuma-1122451 - Tuesday, February 12, 2019 5:07 PMSuccess! Yet, the mystery continues as I do not feel satisfied with WHY this happens in T-SQL.When I ran this, in an attempt to modify the Gender column:
alter table temp_ad_act_20190208
add Gender_Temp varchar(1)BEGIN
update a
set Gender_Temp = a.Gender
from temp_ad_act_20190208 a
END
I received the EXACT same error as before. So there, it's definitely that column that is the culprit. Let me be clear, I did check for LEN before, however, I was only checking in my results set, NOT the entire temp file. I updated the two values (out of 3,000+ rows) with a long string to (null), then the code below worked just fine.INSERT INTO BIOGRAPH_MASTER
(ID_NUM, GENDER, BIRTH_DTE
, USER_NAME, JOB_NAME, JOB_TIME)SELECT n.ID_NUM, t.Gender, t.Birth_date
,'TE_ADMIN' as USER_NAME, 'recruitment_data_insert_v99' as JOB_NAME, GETDATE() as JOB_TIME
from NAME_MASTER n join temp_ad_act_20190208 t
on n.EMAIL_ADDRESS = t.EMAIL and t.Gender is not null
where NOT EXISTS
(SELECT *
FROM BIOGRAPH_MASTER b
WHERE b.ID_NUM = n.ID_NUM)The remaining BURNING question
What explains the inconsistency in how the SQL engine is evaluating column values? It seems that it evaluates ALL (distinct?) column values when using a set based approach, yet, when running a single row insert, it gives you a free pass.
Checking for LEN() will NOT check for trailing spaces. Make the check using DATALENGTH(). If the data you're checking is NVARCHAR or NCHAR, don't forget to divide the result of DATALENGTH() by 2 to get the number of characters.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 12, 2019 at 6:23 pm
Good point Jeff. In this case, I didn't have any trailing spaces but your point is well taken. LEN indeed doesn't count trailing spaces. In this file, the gender column actually said 'No specification' in a couple of the rows. Any idea what explains the behavior I witnessed? Why would it look at column values that were not even present in my result set, yet conveniently ignore them when updating one row at a time?
February 12, 2019 at 6:56 pm
I think why it is failing on the set-based vs the RBAR approach is because of where the subquery exists in the statement. In order to evaluate the subquery, it has to evaluate the join conditions first, thus generating a set which includes rows that overflow the Gender column. (The way the engine works, it first builds a super-set based on the join criteria, and then whittles it down based on conditions in the WHERE clause. I've probably grossly over-simplified things, but you get the general idea.)
If you can find a way to pre-filter temp_ad_act_20190208 first before the insert, this should avoid the error on the set-based approach. (I assume by the naming convention that this is some sort of perma-temp table?)
February 13, 2019 at 11:44 am
That sounds plausible and it's what I have been thinking of as well. The execution plan is simply different for each approach. It's different enough that, as you state, the set-based approach makes it evaluate the whole table somehow. Perhaps I could poke around in the execution plans and try to make sense of them. I'm importing students from a list our Admissions department purchases, students who have taken their ACT tests. They use them for leads and, of course, a lot of them are already in our database from other sources. It's a lot of fun. :\
I'm kind of obsessing about this behavior now. When I get a moment, I'm going to attempt to recreate this scenario with just a couple of tables and very few rows. In the end, it all comes back to garbage in, garbage out. I know these files are not the greatest (even though we've paid for them!). Getting other departments to help in the cleanup of the file is like pulling teeth, though. Lesson learned. An ounce of prevention is better than...blah blah blah. You know the saying. You never know what SQL is gonna do sometimes! It doesn't even annoy me. Somehow I find it incredibly interesting. I guess that is just one more reason I really like being a DBA!
February 13, 2019 at 11:48 am
You can always select all that data into a new table and see what the datatypes are. It may clue you in on which column and row
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
February 13, 2019 at 12:15 pm
Mike01, we're past that now. It was two rouge rows in the Gender column that were making SQL unhappy. The grand mystery was that SQL was evaluating the entire column when deciding there would be a truncation error. The rows I was inserting into the table did NOT contain anything longer than char(1).
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply