Working with strings

  • Hello,

    we have matrix stored like this:

    create table matrix(

    row int,

    column int,

    value float

    )

    Now we have to build a text representation of this data. Unfortunately the matrix has not 10 rows and 10 columns, but up to 100.000 columns and 10.000 rows.

    My first approach to build the string is very slow:

    Pseudocode:

    foreach row {

    foreach column {

    mystring += ', ' + valueAt(row, column)

    }

    }

    My second approach is as follows:

    1. build string for the first 100 columns and store it

    do

    step 2. build string for the 101 to 200th column and concatenate it with the intermmediate result of step 1

    until finished

    It is more set-based and probably faster, but i haven't yet testet it with the full amount of data.

    My Question is:

    - is there a better idea?

    - would ssis be faster?

    What are the costs of an operation like: Select big_text + small_text

    Tobias

  • There was an article on this page a little while back about generating running totals. The technique outlined in it should be able to do what you need.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • What is the primary key of the matrix? Also, not that we'll necessarily need it, but can Dynamic SQL be used?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/25/2008)


    What is the primary key of the matrix? Also, not that we'll necessarily need it, but can Dynamic SQL be used?

    The primary key is not specified. It could be the combination of row and col or a surrogate one.

    I'm not very used to dynamic sql. What would be the benefit?

  • Well... lemme shift gears here, a bit. You say you want a "text representation" of the matrix of 100,000 columns and 10,000 rows... that's 1,000,000,000 or a Billion "cells" of information. First, I don't know who's going to read all of that... second, the 100 columns you suggest won't fit on a single standard page.

    My question is... you have it all in a nice tight "array" table right now... why would you want to convert it to a "text representation" and what do you really want it to look like? Like I said, my thought is that 100 columns won't fit even on the old "green bar" style paper and it'll really be a bugger to read on screen.

    And what do you want for row and column names?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    the text representation is an exchange format for an third party tool. I question if the big picture will work in time, but first i have to validate, which time it takes to generate the file.

    Cheers,

    Tobias

  • Ah, ok... it get it.

    Still need to know though, what do you want for column names? I'm also pretty sure you'll need the row number... I guess I'm asking a bit more on what the details of the required format are. This isn't a difficult problem but I need to know the actual requirements of the output.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/29/2008)


    Ah, ok... it get it.

    Still need to know though, what do you want for column names? I'm also pretty sure you'll need the row number... I guess I'm asking a bit more on what the details of the required format are. This isn't a difficult problem but I need to know the actual requirements of the output.

    If the matrix was

    1 2 3

    4 5 6

    7 8 9

    the output would be

    {{1 2 3}

    {4 5 6}

    {7 8 9}}

  • And, you need that in a file 100,000 columns wide, right? You trolling here or ??? I ask because you asked for comma delimited in the first post and now you're asking for a different format.

    What is this application that will take a CSV that's 100,000 columns wide?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Did I? If so, I'm sorry. The task is to write the data to a file, the text format is an example and could differ.

    The width of the matrix of 100000 is only the worst case. It could also be 300 in many cases. The third party application will be mathematica first and a custom application in future.

    As I hope I mentioned, my target now is to determine if it is possible and reasonable to write that data to a file. And by this way, which time it will take.

    I would also appreciate any idea that differs from mine.

  • tobe_ha (5/5/2008)


    I would also appreciate any idea that differs from mine.

    The concatenation method show in the article at the following URL might help... I certainly haven't tried it on 100,000 columns so I don't know what the performance problems might be. Your method of concatenating smaller sets together and then concatenating those sets sounds like a fairly good idea.

    http://www.sqlservercentral.com/articles/Test+Data/61572/

    I'd consider writing the "3rd party" code to take the long, skinny EAV that you currently have, if that's a possibility. I don't know if Mathematica will take such a format, but it might be worth reasearching if you'll have a lot of almost 100,000 column matrices to pass.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply