May 2, 2011 at 8:33 am
Hello All,
One of my 'partners' sends me a csv file where they combine text and numeric data in one column. There seems to be junk in the file which was imported into a table and now I am getting the error: 'Error converting data type nvarchar to numeric'
The datatype is nvarchar(50)
An example of the data would be '0.99' but when I do LEN on this column I get 5 not 4.
RTRIM does not seem to eliminate the space. LEN(RTRIM(Value)) still returns 5.
How can I see what the bad data is? My partner doesn't seem to recognize the problem because Excel can open the file correctly and the problem is not visible in the csv file.
Any tips or pointers on identifying and cleaning this data would be most welcome.
Thanks.
May 2, 2011 at 8:38 am
the problem is not visible in the csv file.
Are you sure? How did you check? What do you want to do with the bad data?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 2, 2011 at 9:06 am
A very kludgey way to check, but try:
Select '*'+TheColumn+'*' from The DB
that will tell you if there are any stray characters.
Also, check out the IsNumeric function. While not perfect, it'd be a start.
Select * from MyDB where IsNumeric(TheColumn) = 0
(or add it to the above)
Also, if it's nvarchar you may have some odd unicode characters hanging out in the data, though not likely since it's coming from a csv file.
For grins, you may want to try a variation of this:
WITH tally ( N )
AS ( SELECT TOP 1000000
row_number() OVER ( ORDER BY sc1.id )
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2
)
SELECT Tally.N AS subloc ,
UNICODE(SUBSTRING(@OriginalText, Tally.N,
1))
FROM Tally
WHERE Tally.N <= LEN(@OriginalText)
May 2, 2011 at 11:21 am
Phil Parkin (5/2/2011)
the problem is not visible in the csv file.
Are you sure? How did you check? What do you want to do with the bad data?
I checked in a text editor. I want to demonstrate to the provider of the data that they are not sending me quality data.
I guess the other alternative is that the source data is fine but the junk was brought in during the SSIS import.
May 2, 2011 at 11:31 am
SELECT'*'+IssuerValue+'*' returns '*0.99 *' as an example. It appears to be a space. If I add IssuerValue LIKE '% ' to my WHERE statement nothing is returned.
ISNUMERIC(IssuerValue) = 0 returns all rows.
I am trying to understand and correctly modify the tally code.
May 2, 2011 at 11:38 am
OK, so you do indeed have stray characters in the rows. IsNumeric=0 means all values have non-numeric data of some sort.
Try this:
DECLARE @OriginalText NVARCHAR(50)
SELECT @OriginalText = MyColumn
FROM MyTable
WHERE /* put in criteria to get you to the '*0.99 *' row*/
WITH tally ( N )
AS ( SELECT TOP 1000000
row_number() OVER ( ORDER BY sc1.id )
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2
)
SELECT Tally.N AS subloc ,
UNICODE(SUBSTRING(@OriginalText, Tally.N, 1))
FROM Tally
WHERE Tally.N <= LEN(@OriginalText)
May 2, 2011 at 11:45 am
I get this
subloc(No column name)
148
246
357
457
513
May 2, 2011 at 11:49 am
That's it.
The last character (subloc 5) is a carriage return
(UNICODE 13)
see here: http://www.ssec.wisc.edu/~tomw/java/unicode.html#x0000
So what you have in that column is:
0.99[Carriage return]
Sounds like your parsing needs to account for the carriage returns at the end.
May 2, 2011 at 11:59 am
Thank you so much. A few more questions if I may.
Am I correct in thinking that this character may not show up, that is be visible, in a normal text editor?
Is there any application that would allow me to view these characters?
My partner charges me for this data and I need to be able to demonstrate to them that the data file they are sending is flawed.
May 2, 2011 at 12:01 pm
You'll need a hex editor to see the 'invisible' characters.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 2, 2011 at 12:06 pm
What Phil said. That's why I do the *+datavalue+* method quick check.
Some text editors have settings to show these return characters. I think WordPad does but I'm not 100% positive.
May 2, 2011 at 1:10 pm
Thank you both. This has been educational and I have been able to unequivocally demonstrate to my very expensive and supposedly enterprise-level vendor that their product is just not very good.
May 2, 2011 at 1:16 pm
Wahoo!
Sucks that you have to go through this but good that you can demonstrate the issue. Let's hope they fix it fast for you.
May 2, 2011 at 1:30 pm
Chrissy321 (5/2/2011)
Thank you both. This has been educational and I have been able to unequivocally demonstrate to my very expensive and supposedly enterprise-level vendor that their product is just not very good.
Almost makes it fun when you can do that 😀
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply