Openrowset JSON and national characters - UTF8/ANSI problem

  • Hi
    A CMD line tool creates a file for me. It is the middle part below. Filename: slet.json
    This file cannot be read by SQL
    I open the file in Notepad and save it to another name. Changing UTF-8 to ANSI when saving. It is the top part below. Filename: sletansi.json

    Here is the SQL used:
    select 'ANSI', hygg.*
    from openrowset (bulk 'd:\hygge\sletansi.json', single_clob, CODEPAGE = 'UTF-8') as j
    cross apply openjson(bulkColumn)
    with (SourceFile varchar(100), [Comment-dan] varchar(100)) as hygg
    union
    select 'UTF8', hygg.*
    from openrowset (bulk 'd:\hygge\slet.json', single_clob, CODEPAGE = 'UTF-8') as j
    cross apply openjson(bulkColumn)
    with (SourceFile varchar(100), [Comment-dan] varchar(100)) as hygg

    No matter what value i user for Codepage... Nothing happens


    My windows 10 laptop is Danish and in CMD the CHCP tells me it uses CodePage 850
    SQL Server is ver. 13.0.4202.2  with Language English (United States)
    Server Collation Danish_Norwegian_CI_AS
    In advanced server properties is has Default Language = English

    I want to avoid having to copy and save using Notepad in order to get a file I can use as input
    Any suggestions ?

  • I know it would be a fair amount of work, but you could create an SSIS package that could transform the code page on the file (you may need to actually read the file in and write it back out in a Script Task), and then run the query against it in an Execute SQL Task.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • 1. You can convert the file encode to UTF-16 LE and use 

    DECLARE @ForumsDiscussions NVARCHAR(MAX)
    SELECT @ForumsDiscussions = BulkColumn
    FROM OPENROWSET (
        BULK 'E:\Forums.json', SINGLE_NCLOB
    ) as j
    select @ForumsDiscussions as OriginalText

    2. You can leave the file as it is (UTF-8) and use BULK INSERT (you can insert the data to temp table)

    DROP TABLE IF EXISTS MyBulk;
    create table MyBulk(txt NVARCHAR(MAX))
    BULK INSERT MyBulk FROM N'E:\Forums.json' WITH(ROWTERMINATOR =' |\n' );
    select txt from MyBulk

    Senior consultant and architect, data platform and application development, Microsoft MVP.

  • I get this Error on version 1.

    SINGLE_NCLOB requires a UNICODE (widechar) input file. The file specified is not Unicode.

    version 2 changes the letters

    I use for testing SQLEXPRESS (SQL Server 14.0.3381)

    the included file contains a json content of:

    {"name":"María Pérez Jiménez"}

    • This reply was modified 3 years, 5 months ago by  Joseph-164998.
  • Try SINGLE_BLOB instead of SINGLE_CLOB. Microsoft official documentation on OPENROWSET says

    We recommend that you import XML data only using the SINGLE_BLOB option, rather than SINGLE_CLOB and SINGLE_NCLOB, because only SINGLE_BLOB supports all Windows encoding conversions.

    I'm not sure if it is also true for JSON. But you can give it a try.

    What's the version of your SQL Server?

    Versions prior to SQL Server 2016 (13.x) do not support code page 65001 (UTF-8 encoding).

    Additionally, if your SQL Server is 2016 and onwards then try supplying the Code Page number 65001 instead of "UTF-8".

  • to bad, SINGLE_BLOB didnt work.

    I use for testing SQLEXPRESS 2017 (SQL Server 14.0.3381). Could it be a difference, when I'd work on the Fullversion on the Server?

  • can you post a sample file so we can try it?

     

    and what happens if you change the following to nvarchar - with (SourceFile varchar(100), [Comment-dan] varchar(100)) as hygg

  • DECLARE @JSON VARCHAR(MAX) --tried also  NVARCHAR(MAX)
    SELECT @JSON = BulkColumn
    FROM OPENROWSET
    (BULK 'C:\Users\jsail\Desktop\testCodeUtf8.json', SINGLE_BLOB) AS j    --tried also SINGLE_CLOB and SINGLE_NCLOB
    SELECT @JSON AS 'jsonText'

    The Uploaded File contains only

    {"name":"María Pérez Jiménez"}

    Attachments:
    You must be logged in to view attached files.
  • using the bulk insert option works fine (in SQL 2016)

    the openrowset I didn't manage to make it work - but didn't spend too much time on it either.

    DROP TABLE IF EXISTS MyBulk;
    create table MyBulk(txt NVARCHAR(MAX))
    BULK INSERT MyBulk FROM N'C:\temp\testCodeUtf8.json' WITH(ROWTERMINATOR =' |\n', CODEPAGE = '65001' );
    select txt from MyBulk

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply