How to get a variable resultset

  • Matt, I think your approach doesn't get the right answer. It interprets 1-2-8 and 1-2-4-5-6 as one sequence.

    Hans, to find any solution at all, let's describe a procedural algorithm for finding all the sequences. To make it interesting I'm going to use the same data but in a different order.

    4-5

    2-8

    2-4

    1-2

    7-8

    5-6

    3-4

    We start off by writing down all pairs that are the start of a sequence:

    1-2

    7-8

    3-4

    We go through the original set of data again

    4-5 can be attached to 3-4 giving us

    1-2

    7-8

    3-4-5

    2-8 can be attached to 1-2 giving us

    1-2-8

    7-8

    3-4-5

    2-4 can be attached to 1-2-8 but 2 not being at the end we need to create a new sequence giving us

    1-2-8

    7-8

    3-4-5

    1-2-4

    1-2 already exists in a sequence

    7-8 already exists in a sequence

    5-6 can be attached to 3-4-5 giving us

    1-2-8

    7-8

    3-4-5-6

    1-2-4

    3-4 already exists in a sequence

    We keep going through the data until nothing further can be done

    4-5 can be added to 1-2-4 giving us

    1-2-8

    7-8

    3-4-5-6

    1-2-4-5

    5-6 can be added to 1-2-4-5 giving us

    1-2-8

    7-8

    3-4-5-6

    1-2-4-5-6

    If we now go through the data again we are no longer able to add a pair anywhere. And we end up with the sequences that corresponds to the results produced by my union query.

    It seems to me that this should be written by the application's procedural language. It's not ideal for SQL.

  • It seems to me that this should be written by the application's procedural language. It's not ideal for SQL.

    But... I have read up a lot on recursive querying and hierarchical sets. There are a lot of solutions out there written in SQL; you yourself have written a solution in SQL. I agree though, that writing this in VB6 in this case would give you finer control on manipulating the data.

    On the other hand: I firmly believe you should do datamanipulation on the server or a data-tier, NOT on the client, if at all possible. Also, if you can use a SET for producing the data in the format as described it would be a far more efficient process. Consider SELECT vs using a CURSOR.

    Ideally you deliver the expected data to the client and fill the form with it. I know we do not live in an ideal world, but I am still not convinced using a non-SQL approach is better.

    Tnx again,

    Greetz,
    Hans Brouwer

  • Well - while i often would agree that a fair amount of data manipulation should be done server side, forcing the server to do things "it's not good at" doesn't do anyone any favors.  In this case - building variant-size linked lists is not a set-based operation, it's more iterative or recursive.

    That being said - you STILL can have the code be run server-side.  Just trying to force it to be done using only T-SQL as opposed to CLR is going to be more difficult.  Michael's last post is describing conceptually what I was more or less trying to achieve - I just didn't happen to catch all of the various subtleties (I'll confess my head kind of popped off my shoulders when I read about the 10-part union query, each segment of that having 9 subqueries in their respective WHERE clauses - just envisioning my server throwing the old "you're kidding right" error when I throw that at it, so I didn't catch the "split the sequence on branch" rule). 

    finally - it might be best to represent this in an XML structure - the beauty of that is - XML data structure have an implicit order - and in the case of your sequence question/linked lists - the order is important.  Or - a two-dimension array.  Anyway - lots of options not having some of the traditional SQL constraints which we're hitting against at this point.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt, yes you did essentially implement what I wrote with the exception of the 'sequence split'. What Hans needs to tell us if the 'vetical' orientation of the generated output is acceptable to his application, i.e. his application will take care of the pivoting (not supported in SS2K).

    But I think you will find implementing the 'sequence split' not trivial with SQL using your workingtable (it will give you a splitting headache ).

    My approach would be to pretty much literally implement what I described using string variables holding a current sequence using a character like a comma or dash for separating the values. Thus there would be:

    initial step for generating all starting sequences

    begin

      handle concatenation

      handle sequence split

      exit if no data was processed in the two steps above

    end

    This would be heavy on string manipulation(but nothing complicated) and the loop would repeat as often as the length of the longest sequence (mmmh, may not be true if there are loads of sequence splits).

    The advantage is that the final result is already pivoted. The end user application then has to call a stored procedure and query a temporary table which returns a sequence for each row.

    Hans, what do you think?

  • Michael,

    I think you summed it up pretty well. Your "UNION" script gives me the data in the format I need/want, the client just has to fill a Flexgrid with the records produced by your script. I was thinking of building the SQL statement based on the maximum sequence value. How to get that value without trial and error I do not know yet...

    I think a sequence split will not be that difficult to implement; look at this sequence:

    3-4-2-7-1

    which is stored as:

    Prev  Next

    3      4

    4      2

    2      7

    7      1

    If a user decides to break the chain between between 4 and 2 a requery will result in 2 sequences:

    3-4

    2-7-1

    which is stored as

    Prev  Next

    3      4

    2      7

    7      1

    Again, I don't see that as a problem.

    Tnx again both of you,

    Greetz,
    Hans Brouwer

  • I fully agree with you - the split sequence using SQL gets shot to hell with the pivot table.  Not irretrievable but highly messy, which was why I brought the alternate storage methods (XML/2D arrays).  I just kind of "forgot" to finish agreeing with you in the post...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hans,

    Let's go back to square 1.

    So the essential entity being managed is the job pair table where each row contains two jobs where one must occur before the other. Obviously, for any two jobs this needs to be expressed only once. Thus all entries in your job pair table must be unique. Is this true?

    What you call a sequence is a (conceptual) derived entity from this table as per the algorithm we have now established.

    In one of your previous posts, you said "The aim of the apl is to enable to view sequences, edit those sequences and create new sequences". Thus you want to manage sequences and have any changes to sequences be reflected in the job pairs table. Is this correct?

    So what is the primary entity?

    1. Is it the set of job pairs (as stored in the job pair table)

    or

    2. Is it the (derived) job sequences (as per the algorithm)?

    (I can't help thinking of: what came first, the chicken or the egg?)

    Getting back to your (real) data consisting of the job pair table

    1-2

    2-4

    2-8

    3-4

    4-5

    5-6

    7-8

    which results in (conceptual) sequences as per our algorithm

    1-2-4-5-6

    3-4-5-6

    7-8

    1-2-8

    What are the operations you want to perform on this list of (conceptual) sequences? Whatever they are you need to make the necessary changes in your job pair table to reflect these operations.

    Let's try a simple one: you want to delete sequence 1-2-4-5-6. That one is easy. You delete pair 2-4 from the job pair table. You simply try all pairs in the sequence (i.e. 1-2, 2-4, 4-5 and 5-6) and check if they are 'used' in some other sequence. 2-4 is the only one not used. So we delete 2-4. But just to be sure I regenerate and am able to confirm that you now have only the sequences

    1-2-8

    3-4-5-6

    7-8

    Do you agree that job pair 2-4 needs to be deleted to perform deleting of sequence 1-2-4-5-6?

    Can you describe any other operations, say add, split, concatenate? How about reverse?

    Or rather, where in the real world does this problem come from?

    And managing these conceptual sequences in a multi-user environment really makes me worry.

     

  • Obviously, for any two jobs this needs to be expressed only once. Thus all entries in your job pair table must be unique. Is this true?

    Thus you want to manage sequences and have any changes to sequences be reflected in the job pairs table. Is this correct?

    This is correct. Technically it's about updating the job-pair table, conceptually it is about managing sequences. But try to get that through to the designer...

    Your example is correct: when in the sequence 1-2-4-5-6 you remove pair 2-4, the resulting sequences are as you describe.

    This part of the process is not a problem: only 1 pair per action can be updated, after which a requery will display all (newly created) sequences. Realize, that this apl is part of a large system and only a SystemsManager can use this apl. The system is deployed in several orgamizations and only 1 SystemsAdmin exists in these organizations. Is it thinkable that 2 ppl could use this apl at the same time? Yes, but I am told not to worry about that...

    On the CREATION of sequences, that is done by combining jobs into pairs into the job-pair table. In your example, introducing the pair 2-4 again would give seq 1-2-4-5-6 again. A check needs to be done to prevent a circle referring: 1 job can reach another job only once: thus the sequences of 11-12-13-15 AND 12-14-15 may not exist: 12 would reach 15 in 2 ways, which is not allowed. But I can easily check this in the apl, noworries there.

    Tnx for bearing with me.

    Greetz,
    Hans Brouwer

  • No other suggestions?

    Greetz,
    Hans Brouwer

  • My suggestion is not a happy one. Basically because I seem to have run into this uncertainty of dealing with multiple splits. In any case, have a go at this which deals with the concept of handling splits. It returns the right results from your data and some data I have thrown at it.

    Let me know what you think.

    -- DEFINITIONS

    if exists (select name from sysobjects where name='clusters' and type='U') drop table clusters

    create table clusters(

      clusterid varchar(10),

      nextclusterid varchar(10))

    insert into clusters values('01','02')

    insert into clusters values('02','04')

    insert into clusters values('02','08')

    insert into clusters values('03','04')

    insert into clusters values('04','05')

    insert into clusters values('05','06')

    insert into clusters values('07','08')

    if exists (select name from sysobjects where name='sequences' and type='U') drop table sequences

    create table sequences(

      sequ varchar(4000))

    go

    if exists (select name from sysobjects where name='leftnode' and type='FN') drop function leftnode

    go

    create function leftnode(@pair varchar(4000)) returns varchar(10) as

    -- returns the first node in a sequence of nodes separated by dashes

    begin

      declare @ip as integer

      set @ip=charindex('-',@pair)

      return substring(@pair,1,@ip-1)

    end

    go

    if exists (select name from sysobjects where name='rightnode' and type='FN') drop function rightnode

    go

    create function rightnode(@seq varchar(4000)) returns varchar(10) as

    -- returns the last node in a sequence of nodes separated by dashes

    begin

      declare @ip1 as integer

      declare @ip2 as integer

      set @ip1=len(@seq)-1

      set @ip2=0

      while @ip2=0

        begin

          set @ip2=charindex('-',@seq,@ip1)

          set @ip1=@ip1-1

        end

      return substring(@seq,@ip2+1,len(@seq)-@ip2) 

    end

    go

    -- HERE WE GO

    delete from sequences

    -- add all pairs that represent the start of a sequence, i.e. left node must not exist as a right node

    insert into sequences

    select clusterid+'-'+nextclusterid

    from clusters

    where clusterid not in (select nextclusterid from clusters)

    declare @rows as integer

    set @rows=1

    -- loop until no more rows get inserted into table sequences

    while @rows>0

    begin

      set @rows=0

    -- handle attachments to the tail of a sequence

      update s

      set s.sequ=s.sequ+'-'+nextclusterid

      from sequences s,clusters c

      where c.clusterid=dbo.rightnode(s.sequ)

      set @rows=@rows+@@rowcount

      -- handle splits;

      insert into sequences

      select substring(s.sequ,1,charindex(c.clusterid,s.sequ)+len(c.clusterid)-1)+'-'+c.nextclusterid

      from sequences s,clusters c

      where charindex('-'+c.clusterid,'-'+s.sequ)>0

         and charindex(c.clusterid+'-'+c.nextclusterid,s.sequ)=0

         and c.clusterid<>dbo.rightnode(s.sequ)

         -- this makes sure a pair can be used for a split only once; is this legit?

         and c.clusterid+'-'+c.nextclusterid not in

          (

          select c.clusterid+'-'+c.nextclusterid

          from sequences s,clusters c

          where charindex(c.clusterid+'-'+c.nextclusterid,s.sequ)>0

          )

      set @rows=@rows+@@rowcount

    end

    select * from sequences

  • Hi Michael,

    It works fine, noproblem, even with 'real' data I have used. However, this will be difficult to implement, since the apl has to connect to both SQL Server and ORACLE; this means using 2 different  stored procedures, which will not be accepted for a variety of reasons. A way to find out the max length of the sequences with your previous script would be the most acceptable solution.

    Tnx for thinking up solutions, Michael. I won't ask more from you, you have done enough as it is. I will write the building of this array in VB6; that is far from wefficient, but it is what the customer would very much like to have.

    Tnx every1 for answering.

    Greetz,
    Hans Brouwer

  • Well, just counting the number of node separators should give you the sequence length. Load this function and do

    select sequ,dbo.seqlen(sequ) from sequences

    if exists (select name from sysobjects where name='seqlen' and type='FN') drop function seqlen

    go

    create function seqlen(@seq varchar(4000)) returns smallint as

    -- returns the sequence length by counting the number of dashes

    begin

      declare @count smallint

      declare @p smallint

      set @count=1

      set @p=charindex('-',@seq)

      while @p>0

      begin

        set @count=@count+1

        set @p=charindex('-',@seq,@p+1)

      end

     

      return @count

    end

  • Tnx Michael,

    But it would still mean I need your whole previous script. I'll solve it another way by creating a large base sequence-length and building the statement. Consequence is I have to restart when I find out the longest sequence is longer then my base length.

    Tnx again.

    Greetz,
    Hans Brouwer

Viewing 13 posts - 31 through 42 (of 42 total)

You must be logged in to reply to this topic. Login to reply