September 6, 2007 at 5:19 am
Matt, I think your approach doesn't get the right answer. It interprets 1-2-8 and 1-2-4-5-6 as one sequence.
Hans, to find any solution at all, let's describe a procedural algorithm for finding all the sequences. To make it interesting I'm going to use the same data but in a different order.
4-5
2-8
2-4
1-2
7-8
5-6
3-4
We start off by writing down all pairs that are the start of a sequence:
1-2
7-8
3-4
We go through the original set of data again
4-5 can be attached to 3-4 giving us
1-2
7-8
3-4-5
2-8 can be attached to 1-2 giving us
1-2-8
7-8
3-4-5
2-4 can be attached to 1-2-8 but 2 not being at the end we need to create a new sequence giving us
1-2-8
7-8
3-4-5
1-2-4
1-2 already exists in a sequence
7-8 already exists in a sequence
5-6 can be attached to 3-4-5 giving us
1-2-8
7-8
3-4-5-6
1-2-4
3-4 already exists in a sequence
We keep going through the data until nothing further can be done
4-5 can be added to 1-2-4 giving us
1-2-8
7-8
3-4-5-6
1-2-4-5
5-6 can be added to 1-2-4-5 giving us
1-2-8
7-8
3-4-5-6
1-2-4-5-6
If we now go through the data again we are no longer able to add a pair anywhere. And we end up with the sequences that corresponds to the results produced by my union query.
It seems to me that this should be written by the application's procedural language. It's not ideal for SQL.
September 6, 2007 at 6:28 am
It seems to me that this should be written by the application's procedural language. It's not ideal for SQL.
But... I have read up a lot on recursive querying and hierarchical sets. There are a lot of solutions out there written in SQL; you yourself have written a solution in SQL. I agree though, that writing this in VB6 in this case would give you finer control on manipulating the data.
On the other hand: I firmly believe you should do datamanipulation on the server or a data-tier, NOT on the client, if at all possible. Also, if you can use a SET for producing the data in the format as described it would be a far more efficient process. Consider SELECT vs using a CURSOR.
Ideally you deliver the expected data to the client and fill the form with it. I know we do not live in an ideal world, but I am still not convinced using a non-SQL approach is better.
Tnx again,
Greetz,
Hans Brouwer
September 6, 2007 at 6:51 am
Well - while i often would agree that a fair amount of data manipulation should be done server side, forcing the server to do things "it's not good at" doesn't do anyone any favors. In this case - building variant-size linked lists is not a set-based operation, it's more iterative or recursive.
That being said - you STILL can have the code be run server-side. Just trying to force it to be done using only T-SQL as opposed to CLR is going to be more difficult. Michael's last post is describing conceptually what I was more or less trying to achieve - I just didn't happen to catch all of the various subtleties (I'll confess my head kind of popped off my shoulders when I read about the 10-part union query, each segment of that having 9 subqueries in their respective WHERE clauses - just envisioning my server throwing the old "you're kidding right" error when I throw that at it, so I didn't catch the "split the sequence on branch" rule).
finally - it might be best to represent this in an XML structure - the beauty of that is - XML data structure have an implicit order - and in the case of your sequence question/linked lists - the order is important. Or - a two-dimension array. Anyway - lots of options not having some of the traditional SQL constraints which we're hitting against at this point.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 6, 2007 at 7:40 am
Matt, yes you did essentially implement what I wrote with the exception of the 'sequence split'. What Hans needs to tell us if the 'vetical' orientation of the generated output is acceptable to his application, i.e. his application will take care of the pivoting (not supported in SS2K).
But I think you will find implementing the 'sequence split' not trivial with SQL using your workingtable (it will give you a splitting headache ).
My approach would be to pretty much literally implement what I described using string variables holding a current sequence using a character like a comma or dash for separating the values. Thus there would be:
initial step for generating all starting sequences
begin
handle concatenation
handle sequence split
exit if no data was processed in the two steps above
end
This would be heavy on string manipulation(but nothing complicated) and the loop would repeat as often as the length of the longest sequence (mmmh, may not be true if there are loads of sequence splits).
The advantage is that the final result is already pivoted. The end user application then has to call a stored procedure and query a temporary table which returns a sequence for each row.
Hans, what do you think?
September 6, 2007 at 8:37 am
Michael,
I think you summed it up pretty well. Your "UNION" script gives me the data in the format I need/want, the client just has to fill a Flexgrid with the records produced by your script. I was thinking of building the SQL statement based on the maximum sequence value. How to get that value without trial and error I do not know yet...
I think a sequence split will not be that difficult to implement; look at this sequence:
3-4-2-7-1
which is stored as:
Prev Next
3 4
4 2
2 7
7 1
If a user decides to break the chain between between 4 and 2 a requery will result in 2 sequences:
3-4
2-7-1
which is stored as
Prev Next
3 4
2 7
7 1
Again, I don't see that as a problem.
Tnx again both of you,
Greetz,
Hans Brouwer
September 6, 2007 at 9:28 am
I fully agree with you - the split sequence using SQL gets shot to hell with the pivot table. Not irretrievable but highly messy, which was why I brought the alternate storage methods (XML/2D arrays). I just kind of "forgot" to finish agreeing with you in the post...
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 6, 2007 at 2:11 pm
Hans,
Let's go back to square 1.
So the essential entity being managed is the job pair table where each row contains two jobs where one must occur before the other. Obviously, for any two jobs this needs to be expressed only once. Thus all entries in your job pair table must be unique. Is this true?
What you call a sequence is a (conceptual) derived entity from this table as per the algorithm we have now established.
In one of your previous posts, you said "The aim of the apl is to enable to view sequences, edit those sequences and create new sequences". Thus you want to manage sequences and have any changes to sequences be reflected in the job pairs table. Is this correct?
So what is the primary entity?
1. Is it the set of job pairs (as stored in the job pair table)
or
2. Is it the (derived) job sequences (as per the algorithm)?
(I can't help thinking of: what came first, the chicken or the egg?)
Getting back to your (real) data consisting of the job pair table
1-2
2-4
2-8
3-4
4-5
5-6
7-8
which results in (conceptual) sequences as per our algorithm
1-2-4-5-6
3-4-5-6
7-8
1-2-8
What are the operations you want to perform on this list of (conceptual) sequences? Whatever they are you need to make the necessary changes in your job pair table to reflect these operations.
Let's try a simple one: you want to delete sequence 1-2-4-5-6. That one is easy. You delete pair 2-4 from the job pair table. You simply try all pairs in the sequence (i.e. 1-2, 2-4, 4-5 and 5-6) and check if they are 'used' in some other sequence. 2-4 is the only one not used. So we delete 2-4. But just to be sure I regenerate and am able to confirm that you now have only the sequences
1-2-8
3-4-5-6
7-8
Do you agree that job pair 2-4 needs to be deleted to perform deleting of sequence 1-2-4-5-6?
Can you describe any other operations, say add, split, concatenate? How about reverse?
Or rather, where in the real world does this problem come from?
And managing these conceptual sequences in a multi-user environment really makes me worry.
September 7, 2007 at 3:51 am
Obviously, for any two jobs this needs to be expressed only once. Thus all entries in your job pair table must be unique. Is this true?
Thus you want to manage sequences and have any changes to sequences be reflected in the job pairs table. Is this correct?
This is correct. Technically it's about updating the job-pair table, conceptually it is about managing sequences. But try to get that through to the designer...
Your example is correct: when in the sequence 1-2-4-5-6 you remove pair 2-4, the resulting sequences are as you describe.
This part of the process is not a problem: only 1 pair per action can be updated, after which a requery will display all (newly created) sequences. Realize, that this apl is part of a large system and only a SystemsManager can use this apl. The system is deployed in several orgamizations and only 1 SystemsAdmin exists in these organizations. Is it thinkable that 2 ppl could use this apl at the same time? Yes, but I am told not to worry about that...
On the CREATION of sequences, that is done by combining jobs into pairs into the job-pair table. In your example, introducing the pair 2-4 again would give seq 1-2-4-5-6 again. A check needs to be done to prevent a circle referring: 1 job can reach another job only once: thus the sequences of 11-12-13-15 AND 12-14-15 may not exist: 12 would reach 15 in 2 ways, which is not allowed. But I can easily check this in the apl, noworries there.
Tnx for bearing with me.
Greetz,
Hans Brouwer
September 13, 2007 at 6:07 am
No other suggestions?
Greetz,
Hans Brouwer
September 13, 2007 at 1:48 pm
My suggestion is not a happy one. Basically because I seem to have run into this uncertainty of dealing with multiple splits. In any case, have a go at this which deals with the concept of handling splits. It returns the right results from your data and some data I have thrown at it.
Let me know what you think.
-- DEFINITIONS
if exists (select name from sysobjects where name='clusters' and type='U') drop table clusters
create table clusters(
clusterid varchar(10),
nextclusterid varchar(10))
insert into clusters values('01','02')
insert into clusters values('02','04')
insert into clusters values('02','08')
insert into clusters values('03','04')
insert into clusters values('04','05')
insert into clusters values('05','06')
insert into clusters values('07','08')
if exists (select name from sysobjects where name='sequences' and type='U') drop table sequences
create table sequences(
sequ varchar(4000))
go
if exists (select name from sysobjects where name='leftnode' and type='FN') drop function leftnode
go
create function leftnode(@pair varchar(4000)) returns varchar(10) as
-- returns the first node in a sequence of nodes separated by dashes
begin
declare @ip as integer
set @ip=charindex('-',@pair)
return substring(@pair,1,@ip-1)
end
go
if exists (select name from sysobjects where name='rightnode' and type='FN') drop function rightnode
go
create function rightnode(@seq varchar(4000)) returns varchar(10) as
-- returns the last node in a sequence of nodes separated by dashes
begin
declare @ip1 as integer
declare @ip2 as integer
set @ip1=len(@seq)-1
set @ip2=0
while @ip2=0
begin
set @ip2=charindex('-',@seq,@ip1)
set @ip1=@ip1-1
end
return substring(@seq,@ip2+1,len(@seq)-@ip2)
end
go
-- HERE WE GO
delete from sequences
-- add all pairs that represent the start of a sequence, i.e. left node must not exist as a right node
insert into sequences
select clusterid+'-'+nextclusterid
from clusters
where clusterid not in (select nextclusterid from clusters)
declare @rows as integer
set @rows=1
-- loop until no more rows get inserted into table sequences
while @rows>0
begin
set @rows=0
-- handle attachments to the tail of a sequence
update s
set s.sequ=s.sequ+'-'+nextclusterid
from sequences s,clusters c
where c.clusterid=dbo.rightnode(s.sequ)
set @rows=@rows+@@rowcount
-- handle splits;
insert into sequences
select substring(s.sequ,1,charindex(c.clusterid,s.sequ)+len(c.clusterid)-1)+'-'+c.nextclusterid
from sequences s,clusters c
where charindex('-'+c.clusterid,'-'+s.sequ)>0
and charindex(c.clusterid+'-'+c.nextclusterid,s.sequ)=0
and c.clusterid<>dbo.rightnode(s.sequ)
-- this makes sure a pair can be used for a split only once; is this legit?
and c.clusterid+'-'+c.nextclusterid not in
(
select c.clusterid+'-'+c.nextclusterid
from sequences s,clusters c
where charindex(c.clusterid+'-'+c.nextclusterid,s.sequ)>0
)
set @rows=@rows+@@rowcount
end
select * from sequences
September 14, 2007 at 4:02 am
Hi Michael,
It works fine, noproblem, even with 'real' data I have used. However, this will be difficult to implement, since the apl has to connect to both SQL Server and ORACLE; this means using 2 different stored procedures, which will not be accepted for a variety of reasons. A way to find out the max length of the sequences with your previous script would be the most acceptable solution.
Tnx for thinking up solutions, Michael. I won't ask more from you, you have done enough as it is. I will write the building of this array in VB6; that is far from wefficient, but it is what the customer would very much like to have.
Tnx every1 for answering.
Greetz,
Hans Brouwer
September 14, 2007 at 6:20 am
Well, just counting the number of node separators should give you the sequence length. Load this function and do
select sequ,dbo.seqlen(sequ) from sequences
if exists (select name from sysobjects where name='seqlen' and type='FN') drop function seqlen
go
create function seqlen(@seq varchar(4000)) returns smallint as
-- returns the sequence length by counting the number of dashes
begin
declare @count smallint
declare @p smallint
set @count=1
set @p=charindex('-',@seq)
while @p>0
begin
set @count=@count+1
set @p=charindex('-',@seq,@p+1)
end
return @count
end
September 19, 2007 at 7:30 am
Tnx Michael,
But it would still mean I need your whole previous script. I'll solve it another way by creating a large base sequence-length and building the statement. Consequence is I have to restart when I find out the longest sequence is longer then my base length.
Tnx again.
Greetz,
Hans Brouwer
Viewing 13 posts - 31 through 42 (of 42 total)
You must be logged in to reply to this topic. Login to reply