parsing log data

  • hi everyone, i have case to parsing the log data like this on sql server

    { "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }
    { "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }

    I have tried substring and patindex but it still hasn't succeeded because of the data position that doesn't have a pattern 🙁 
    then i need help on how to query to get results like this

    NameGenderAddress
    DavidMale125 Gambler Lane
    SteffanyFemale4312 Poco Mas Drive

  • Each row appears to be a JSON fragment, so you can just do:
    Declare @log varchar(max) ='{ "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }
    { "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }'
    select 
    X.name,
    X.address,
    X.gender
    from string_split(@log,char(13)) L
    cross apply openjson(L.value)
    with
    (
    name varchar(30) '$.Name',
    address varchar(300) '$.Address',
    gender varchar(10) '$.Gender'
    ) X
  • Here is a simple solution
    😎
    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @TMSG VARCHAR(MAX) = '{ "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }
    { "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }';

    SELECT
      X.[Name]
     ,X.[Address]
     ,X.Gender
    FROM OPENJSON(REPLACE(CONCAT(CHAR(91),@TMSG,CHAR(93)),CHAR(13),CHAR(44)))
    WITH
    (
    NAME VARCHAR(30) '$.Name',
    ADDRESS VARCHAR(300) '$.Address',
    GENDER VARCHAR(10) '$.Gender'
    ) X;

    output

    Name      Address              Gender
    --------- -------------------- -------
    David     125 Gambler Lane     Male
    Steffany  4312 Poco Mas Drive  Female


    Haven't tested the performance difference between constructing a JSon set using string replace function and string_split but I have a feeling that the replace method is more efficient. Something to look into😉

    How are you reading the log file?

  • My answer would be to simply pork chop the people that are creating such inconsistent data and doubling the size of the log files with totally unnecessary labels.  Things like this just shouldn't be allowed to happen in the world of data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • andycadley - Saturday, August 4, 2018 8:53 AM

    Each row appears to be a JSON fragment, so you can just do:
    Declare @log varchar(max) ='{ "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }
    { "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }'
    select 
    X.name,
    X.address,
    X.gender
    from string_split(@log,char(13)) L
    cross apply openjson(L.value)
    with
    (
    name varchar(30) '$.Name',
    address varchar(300) '$.Address',
    gender varchar(10) '$.Gender'
    ) X

    Hi Andy,

    Thanks for sharing your post.
    Can you explain how does each code block works??I mean use of string_split function  and cross apply operator.

    Thanks in advance

  • Sure. The string_split is just breaking up the text into individual rows based on the newline character (this wouldn't be necessary if you were reading in the data via bulk import or similar), so we end up with a table with two rows of data:
    row1 = { "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }
    row2 = { "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }

    Then we do an OPENJSON on the result and use it's WITH clause to define the values we're interested in and map them to specific column types and name. Each line of the WITH clause defines the resultant column name, it's type and then the JSON path to reach the data we want in that column. The $ in the path refers to the outermost object (defined by the curly braces) and the dot syntax refers to named "fields" in the json.

    This is roughly equivalent to the slightly more verbose:

    Declare @T Table
    (
     rowtext nvarchar(max)
    )

    Insert Into @T
    Values (N'{ "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }'),
      (N'{ "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }')

    Select * From @T

    Select
    JSON_VALUE(rowtext, '$.Name') as name,
    JSON_VALUE(rowtext, '$.Gender') as gender,
    JSON_VALUE(rowtext, '$.Address') as address
    From @T

  • andycadley - Saturday, August 4, 2018 8:53 AM

    Each row appears to be a JSON fragment, so you can just do:
    Declare @log varchar(max) ='{ "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }
    { "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }'
    select 
    X.name,
    X.address,
    X.gender
    from string_split(@log,char(13)) L
    cross apply openjson(L.value)
    with
    (
    name varchar(30) '$.Name',
    address varchar(300) '$.Address',
    gender varchar(10) '$.Gender'
    ) X

    andycadley - Saturday, August 4, 2018 12:49 PM

    Sure. The string_split is just breaking up the text into individual rows based on the newline character (this wouldn't be necessary if you were reading in the data via bulk import or similar), so we end up with a table with two rows of data:
    row1 = { "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }
    row2 = { "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }

    Then we do an OPENJSON on the result and use it's WITH clause to define the values we're interested in and map them to specific column types and name. Each line of the WITH clause defines the resultant column name, it's type and then the JSON path to reach the data we want in that column. The $ in the path refers to the outermost object (defined by the curly braces) and the dot syntax refers to named "fields" in the json.

    This is roughly equivalent to the slightly more verbose:

    Declare @T Table
    (
     rowtext nvarchar(max)
    )

    Insert Into @T
    Values (N'{ "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }'),
      (N'{ "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }')

    Select * From @T

    Select
    JSON_VALUE(rowtext, '$.Name') as name,
    JSON_VALUE(rowtext, '$.Gender') as gender,
    JSON_VALUE(rowtext, '$.Address') as address
    From @T

    Thanks it's work with Json_value :kiss:

    Eirikur Eiriksson - Saturday, August 4, 2018 9:31 AM

    Here is a simple solution
    😎
    USE TEEST;
    GO
    SET NOCOUNT ON;

    DECLARE @TMSG VARCHAR(MAX) = '{ "Name": "David", "Gender": "Male", "Address": "125 Gambler Lane" }
    { "Address": "4312 Poco Mas Drive", "Gender": "Female", "Name": "Steffany" }';

    SELECT
      X.[Name]
     ,X.[Address]
     ,X.Gender
    FROM OPENJSON(REPLACE(CONCAT(CHAR(91),@TMSG,CHAR(93)),CHAR(13),CHAR(44)))
    WITH
    (
    NAME VARCHAR(30) '$.Name',
    ADDRESS VARCHAR(300) '$.Address',
    GENDER VARCHAR(10) '$.Gender'
    ) X;

    output

    Name      Address              Gender
    --------- -------------------- -------
    David     125 Gambler Lane     Male
    Steffany  4312 Poco Mas Drive  Female


    Haven't tested the performance difference between constructing a JSon set using string replace function and string_split but I have a feeling that the replace method is more efficient. Something to look into😉

    How are you reading the log file?

    The log file originally has a .dat format, but data in the .dat file already loaded into the sql server table.
    and thanks, your simple solution works for me :kiss:

    Jeff Moden - Saturday, August 4, 2018 11:55 AM

    My answer would be to simply pork chop the people that are creating such inconsistent data and doubling the size of the log files with totally unnecessary labels.  Things like this just shouldn't be allowed to happen in the world of data.

    :laugh::laugh:
    you really know what i feel 😛

  • and be thankful you are on 2016/2017.  JSON was not supported in earlier versions 🙂
    Alternatively, tell your boss it can't be done in SQL and get him to pay for a Python course.  Python is really good at that sort of string parsing.

    +1 for Jeff's suggestion of finding the original developers and warmly shaking them by the throat

  • I dunno, I'd take a slightly bloaty log file that can be robustly passed by known tools over a comma/tab delimited file every single time. Disk is cheap and working around the quirks when stray delimiters get into a file expensive.

  • Jeff Moden - Saturday, August 4, 2018 11:55 AM

    My answer would be to simply pork chop the people that are creating such inconsistent data and doubling the size of the log files with totally unnecessary labels.  Things like this just shouldn't be allowed to happen in the world of data.

    Is it okay if the pork chops are frozen?

  • pietlinden - Tuesday, August 7, 2018 1:13 PM

    Jeff Moden - Saturday, August 4, 2018 11:55 AM

    My answer would be to simply pork chop the people that are creating such inconsistent data and doubling the size of the log files with totally unnecessary labels.  Things like this just shouldn't be allowed to happen in the world of data.

    Is it okay if the pork chops are frozen?

    When they hit you at 20000 fps, you can cannot tell the difference :-0
    😎

    l

  • Eirikur Eiriksson - Tuesday, August 7, 2018 2:30 PM

    pietlinden - Tuesday, August 7, 2018 1:13 PM

    Jeff Moden - Saturday, August 4, 2018 11:55 AM

    My answer would be to simply pork chop the people that are creating such inconsistent data and doubling the size of the log files with totally unnecessary labels.  Things like this just shouldn't be allowed to happen in the world of data.

    Is it okay if the pork chops are frozen?

    When they hit you at 20000 fps, you can cannot tell the difference :-0
    😎

    Yeah, I'm pretty sure that a collision with a pork chop, frozen or otherwise, at a common space object collision speed of about 4 miles per second; just isn't going to matter as to the state of the pork chop.   Neither the chop nor the target is going to survive to care one way or the other....  😀😀😀

    Although ... if the collision is at head level, we're probably looking at explosive decapitation :crazy::crazy::crazy:, if any of that matters...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply