Assistance with flattening some data

  • EDIT

    How in the heck did I mark my question as the answer!!!!

    Eirikur's answer is what helped me. UGH!!

    As my parents used to say about me: The doofus is strong with this one.

    /EDIT

    I have been assigned to work with a legacy database that I am phasing out of existence 😀

    I have to move the data into a new application that requires a very specific format for input and I am having issues with figuring out how to output this format.

    The data is exam type data.

    As the schema and data are 'sensitive', I have had to create fake data.

    Each exam has an ExamID and PersonID. There are also questions, question codes, question id, answer, answer code and answer id.

    There are multiple types and version of exams with n number of questions/answers.

    I need to output in the following format:

    PersonID, ExamID, ExamVersion, "QuestionCode n", "QuestionText n", "AnswerCode n", "AnswerText n", "QuestionCode n+1", "QuestionText n+1", "AnswerCode n+1", "AnswerText n+1" etc...

    The above enclosed in double quotes are literal text "QuestionText 1" or "QuestionText 793" (not actual question text or answer text).

    The ordering of the questions is of little importance (examid->personid->question->answer must be maintained) as the QuestionCode will be mapped to the new application to determine question order.

    Sample table structure

    -- if temp table exists...drop it

    if object_id('TempDB..#exams','U') is not null

    drop table #exams

    -- create temp table

    create table #exams

    (

    StudentID int NOT NULL -- this is actually a uniqueidentifier field, using INT because I am lazy and hate uniqueidentifier fields

    , ExamID int NOT NULL -- this is actually a uniqueidentifier field, using INT because I am lazy and hate uniqueidentifier fields

    , ExamVersion decimal(6,4) NOT NULL

    , QuestionID int NULL -- this is actually a uniqueidentifier field, using INT because I am lazy and hate uniqueidentifier fields

    , QuestionText nvarchar(max) NULL

    , QuestionCode nvarchar(25) NULL

    , AnswerID int NULL -- this is actually a uniqueidentifier field, using INT because I am lazy and hate uniqueidentifier fields

    , Answer nvarchar(max) NULL

    , AnswerCode nvarchar(25) NULL

    )

    Populate the table

    -- populate temp table

    insert into #exams (StudentID, ExamID, ExamVersion, QuestionID, QuestionText, QuestionCode, AnswerID, Answer, AnswerCode)

    select 1, 1, 1.10, 1, 'Question 1 text, potentially 3000+ characters long', 'Q1 code', 3, 'Answer 1', 'Answer 1 code' UNION ALL

    select 1, 1, 1.10, 2, 'Question 2 text', 'Q2 code', 68, 'Answer 2', 'Answer 2 code' UNION ALL

    select 1, 1, 1.10, 3, 'Question 3 text', 'Q3 code', 189, 'Answer 3', 'Answer 3 code' UNION ALL

    select 2, 4, 1.70, 17, 'Question 1 text, can also contain line feed and/or copy-pasted MS Word junk', 'Q1 code', 210, 'Answer 1', 'Answer 1 code' UNION ALL

    select 2, 4, 1.70, 27, 'Question 2 text', 'Q2 code', 450, 'Answer 2', 'Answer 2 code' UNION ALL

    select 2, 4, 1.70, 16, 'Question 3 text', 'Q3 code', 456, 'Answer 3', 'Answer 3 code' UNION ALL

    select 2, 4, 1.70, 198, 'Question 4 text', 'Q4 code', 602, 'Answer 4', 'Answer 4 code' UNION ALL

    select 2, 4, 1.70, 7, 'Question 5 text', 'Q5 code', 17198, 'Answer 5', 'Answer 5 code' UNION ALL

    select 2, 4, 1.70, 299, 'Question 6 text', 'Q6 code', 700, 'Answer 6', 'Answer 6 code'

    Here is what I have done so far.

    Somewhat useful, but my brain is stuck and non-cooperative.

    -- OUTPUT CODE

    declare @cols as nvarchar(max);

    declare @query as nvarchar(max);

    select @cols = stuff((select distinct ', '

    + quotename('QuestionText ' + cast(row_number() over(partition by examid order by examid, studentid) as nvarchar(max)),

    QUOTENAME('Answer' + cast(row_number() over(partition by examid order by examid, studentid) as nvarchar(max))))

    from #exams

    FOR XML PATH(''), TYPE

    ).value('.', 'NVARCHAR(MAX)'), 1, 1, '');

    SELECT @query = 'SELECT examid, studentid, examversion, ' + @cols +

    'FROM

    (

    SELECT

    examid, studentid, examversion, questiontext,

    ''QuestionText '' + CAST(ROW_NUMBER()

    OVER(PARTITION BY examid

    ORDER BY examid, studentid) AS NVARCHAR(max)) rownum

    FROM #exams

    ) t

    PIVOT

    (

    MAX(QuestionText)

    FOR rownum IN(' + @cols + ')' +

    ') p';

    execute(@query);

    Any assistance would be appreciated!!

  • Quick suggestion, use cross tab aggregation, here is an example

    😎

    -- if temp table exists...drop it

    if object_id('TempDB..#exams','U') is not null

    drop table #exams

    -- create temp table

    create table #exams

    (

    StudentID int NOT NULL -- this is actually a uniqueidentifier field, using INT because I am lazy and hate uniqueidentifier fields

    , ExamID int NOT NULL -- this is actually a uniqueidentifier field, using INT because I am lazy and hate uniqueidentifier fields

    , ExamVersion decimal(6,4) NOT NULL

    , QuestionID int NULL -- this is actually a uniqueidentifier field, using INT because I am lazy and hate uniqueidentifier fields

    , QuestionText nvarchar(max) NULL

    , QuestionCode nvarchar(25) NULL

    , AnswerID int NULL -- this is actually a uniqueidentifier field, using INT because I am lazy and hate uniqueidentifier fields

    , Answer nvarchar(max) NULL

    , AnswerCode nvarchar(25) NULL

    );

    -- populate temp table

    insert into #exams (StudentID, ExamID, ExamVersion, QuestionID, QuestionText, QuestionCode, AnswerID, Answer, AnswerCode)

    select 1, 1, 1.10, 1, 'Question 1 text, potentially 3000+ characters long', 'Q1 code', 3, 'Answer 1', 'Answer 1 code' UNION ALL

    select 1, 1, 1.10, 2, 'Question 2 text', 'Q2 code', 68, 'Answer 2', 'Answer 2 code' UNION ALL

    select 1, 1, 1.10, 3, 'Question 3 text', 'Q3 code', 189, 'Answer 3', 'Answer 3 code' UNION ALL

    select 2, 4, 1.70, 17, 'Question 1 text, can also contain line feed and/or copy-pasted MS Word junk', 'Q1 code', 210, 'Answer 1', 'Answer 1 code' UNION ALL

    select 2, 4, 1.70, 27, 'Question 2 text', 'Q2 code', 450, 'Answer 2', 'Answer 2 code' UNION ALL

    select 2, 4, 1.70, 16, 'Question 3 text', 'Q3 code', 456, 'Answer 3', 'Answer 3 code' UNION ALL

    select 2, 4, 1.70, 198,'Question 4 text', 'Q4 code', 602, 'Answer 4', 'Answer 4 code' UNION ALL

    select 2, 4, 1.70, 7, 'Question 5 text', 'Q5 code', 17198, 'Answer 5', 'Answer 5 code' UNION ALL

    select 2, 4, 1.70, 299,'Question 6 text', 'Q6 code', 700, 'Answer 6', 'Answer 6 code' ;

    ;WITH BASE_DATA AS

    (

    SELECT

    EX.StudentID

    ,EX.ExamID

    ,EX.ExamVersion

    ,ROW_NUMBER() OVER

    (

    PARTITION BY EX.StudentID

    ,EX.ExamID

    ,EX.ExamVersion

    ORDER BY EX.QuestionID

    ) AS SQV_RID

    ,EX.QuestionID

    ,EX.QuestionText

    ,EX.QuestionCode

    ,EX.AnswerID

    ,EX.Answer

    ,EX.AnswerCode

    FROM #exams EX

    )

    SELECT

    BD.StudentID

    ,BD.ExamID

    ,BD.ExamVersion

    ,MAX(CASE WHEN BD.SQV_RID = 1 THEN BD.QuestionCode END) AS QuestionCode_01

    ,MAX(CASE WHEN BD.SQV_RID = 1 THEN BD.QuestionText END) AS QuestionText_01

    ,MAX(CASE WHEN BD.SQV_RID = 1 THEN BD.AnswerCode END) AS AnswerCode_01

    ,MAX(CASE WHEN BD.SQV_RID = 1 THEN BD.Answer END) AS Answer_01

    ,MAX(CASE WHEN BD.SQV_RID = 2 THEN BD.QuestionCode END) AS QuestionCode_02

    ,MAX(CASE WHEN BD.SQV_RID = 2 THEN BD.QuestionText END) AS QuestionText_02

    ,MAX(CASE WHEN BD.SQV_RID = 2 THEN BD.AnswerCode END) AS AnswerCode_02

    ,MAX(CASE WHEN BD.SQV_RID = 2 THEN BD.Answer END) AS Answer_02

    ,MAX(CASE WHEN BD.SQV_RID = 3 THEN BD.QuestionCode END) AS QuestionCode_03

    ,MAX(CASE WHEN BD.SQV_RID = 3 THEN BD.QuestionText END) AS QuestionText_03

    ,MAX(CASE WHEN BD.SQV_RID = 3 THEN BD.AnswerCode END) AS AnswerCode_03

    ,MAX(CASE WHEN BD.SQV_RID = 3 THEN BD.Answer END) AS Answer_03

    ,MAX(CASE WHEN BD.SQV_RID = 4 THEN BD.QuestionCode END) AS QuestionCode_04

    ,MAX(CASE WHEN BD.SQV_RID = 4 THEN BD.QuestionText END) AS QuestionText_04

    ,MAX(CASE WHEN BD.SQV_RID = 4 THEN BD.AnswerCode END) AS AnswerCode_04

    ,MAX(CASE WHEN BD.SQV_RID = 4 THEN BD.Answer END) AS Answer_04

    ,MAX(CASE WHEN BD.SQV_RID = 5 THEN BD.QuestionCode END) AS QuestionCode_05

    ,MAX(CASE WHEN BD.SQV_RID = 5 THEN BD.QuestionText END) AS QuestionText_05

    ,MAX(CASE WHEN BD.SQV_RID = 5 THEN BD.AnswerCode END) AS AnswerCode_05

    ,MAX(CASE WHEN BD.SQV_RID = 5 THEN BD.Answer END) AS Answer_05

    ,MAX(CASE WHEN BD.SQV_RID = 6 THEN BD.QuestionCode END) AS QuestionCode_06

    ,MAX(CASE WHEN BD.SQV_RID = 6 THEN BD.QuestionText END) AS QuestionText_06

    ,MAX(CASE WHEN BD.SQV_RID = 6 THEN BD.AnswerCode END) AS AnswerCode_06

    ,MAX(CASE WHEN BD.SQV_RID = 6 THEN BD.Answer END) AS Answer_06

    FROM BASE_DATA BD

    GROUP BY BD.StudentID

    ,BD.ExamID

    ,BD.ExamVersion;

    Results

    StudentID ExamID ExamVersion QuestionCode_01 QuestionText_01 AnswerCode_01 Answer_01 QuestionCode_02 QuestionText_02 AnswerCode_02 Answer_02 QuestionCode_03 QuestionText_03 AnswerCode_03 Answer_03 QuestionCode_04 QuestionText_04 AnswerCode_04 Answer_04 QuestionCode_05 QuestionText_05 AnswerCode_05 Answer_05 QuestionCode_06 QuestionText_06 AnswerCode_06 Answer_06

    ----------- ------- ------------- ---------------- ---------------------------------------------------- -------------- ---------- ---------------- ---------------- -------------- ---------- ---------------- ---------------------------------------------------------------------------- -------------- ---------- ---------------- ---------------- -------------- ---------- ---------------- ---------------- -------------- ---------- ---------------- ---------------- -------------- ----------

    1 1 1.1000 Q1 code Question 1 text, potentially 3000+ characters long Answer 1 code Answer 1 Q2 code Question 2 text Answer 2 code Answer 2 Q3 code Question 3 text Answer 3 code Answer 3 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL

    2 4 1.7000 Q5 code Question 5 text Answer 5 code Answer 5 Q3 code Question 3 text Answer 3 code Answer 3 Q1 code Question 1 text, can also contain line feed and/or copy-pasted MS Word junk Answer 1 code Answer 1 Q2 code Question 2 text Answer 2 code Answer 2 Q4 code Question 4 text Answer 4 code Answer 4 Q6 code Question 6 text Answer 6 code Answer 6

  • Thanks for the feedback.

    I thought of something like this, but was having trouble translating it to dynamic SQL.

    I have to make this generic enough to pull from any of the exams (we have hundreds) and it seems that no exam has the same number of questions/answer combinations (range from 30 questions to 400 questions and possible more :w00t: questions in near future).

  • I have been stuck in a MS environment way too long and my thought process has devolved because of it.

    I would have never bothered with trying to write dynamic SQL for something like this back in my Unix/Oracle days.

    I would have written a shell script to auto-create and auto-run the SQL for me.

    I broke out my very creaky PowerShell skills and wrote a script that use's Eirikur's code above and auto generates the aggregation section.

    As this is not going to be a repeatable process, I don't mind doing some of the work by hand.

    Here is the PowerShell code:

    $strVersion = "1.20"

    $strDivision = "Lab"

    $strAgeGroup = "Adult"

    $strFileName = $strDivision + "_" + $strAgeGroup + "_" + $strVersion + ".txt"

    $strOutputFile = "C:\YOUR\FILE\STRUCTURE\HERE\" + $strFileName

    out-file $strOutputFile -encoding ASCII

    for ($i = 1; $i -le 400; $i++) {

    $strSQL = ",MAX(CASE WHEN BD.SQV_RID = " + $i + " THEN BD.QuestionCode END) AS QuestioniCode_" + $i + "`n"

    $strSQL = $strSQL + ",MAX(CASE WHEN BD.SQV_RID = " + $i + " THEN BD.QuestionText END) AS QuestionText_" + $i + "`n"

    $strSQL = $strSQL + ",MAX(CASE WHEN BD.SQV_RID = " + $i + " THEN BD.AnswerCode END) AS AnswerCode_" + $i + "`n"

    $strSQL = $strSQL + ",MAX(CASE WHEN BD.SQV_RID = " + $i + " THEN BD.Answer END) AS Answer_" + $i + "`n`r"

    add-content $strOutputFile $strSQL

    }

    Later I will add in code to query each exam version for number of questions and meta data for file naming.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply