January 17, 2019 at 11:21 am
The data set that I have has 2 tables. One (trips) contains the start and end date in DATETIME format for bicycle trips as well as the ID code for the station that the bike belongs to, the other table (station) contains the same corresponding ID code for the stations along with the actual name of the station. The questions I need to answer, which means I will have to JOIN these two tables are:
January 17, 2019 at 12:28 pm
paige.miller91 - Thursday, January 17, 2019 11:21 AMThe data set that I have has 2 tables. One (trips) contains the start and end date in DATETIME format for bicycle trips as well as the ID code for the station that the bike belongs to, the other table (station) contains the same corresponding ID code for the stations along with the actual name of the station. The questions I need to answer, which means I will have to JOIN these two tables are:
- Station name
- Which station has the longest trips on average?
- Which station has the shortest trips on average?
- Extremely long / short trips can skew your results. How would avoid that?
- Let's call trips that start and end in the same station "round trips". Calculate the fraction of trips that were round trips and break it down by:
- Membership status
- Day of the week
Okay, what have you tried to answer these questions? We aren't just going to give you the answers as that doesn't he;p you learn.
January 17, 2019 at 12:58 pm
Haha, of course, ok so I got this so far:
CREATE VIEW stationtriplength AS
SELECT trips.start_station_code, trips.duration_sec, stations.name
FROM trips JOIN stations ON trips.start_station_code = stations.code
---to get the information I need from both tables into a view
THEN:
From here I answered which station has the longest and shortest trip durations, that part I figured out by using:
The only part I cannot get is how to do the round trips
January 17, 2019 at 1:24 pm
How do you know which of those trips had a round trip? You are only bringing the start point and how long it takes, unless I am missing something.
January 17, 2019 at 1:32 pm
astrid 69000 - Thursday, January 17, 2019 1:24 PMHow do you know which of those trips had a round trip? You are only bringing the start point and how long it takes, unless I am missing something.
I think the length of trip is duration not distance travelled.
The table has StartTime, EndTime and StationId.
So the calculation of a trip length (in time) can be done from one row.
January 17, 2019 at 1:34 pm
Ah yeah, I missed, there is an end_station_code as well,
January 17, 2019 at 1:35 pm
and yeah the curation of the trip is time, but there is a start station code and an end station code for where the bikes were picked up and where they were returned. Would that change anything I did above?
January 17, 2019 at 1:48 pm
paige.miller91 - Thursday, January 17, 2019 1:35 PMand yeah the curation of the trip is time, but there is a start station code and an end station code for where the bikes were picked up and where they were returned. Would that change anything I did above?
So if depends what "Which station has the longest trips on average?" means.
If the length of a trip from Station A to Station B is included as a statistic for Station A as well as Station B then it will make a difference.
January 17, 2019 at 1:57 pm
I also think we are missing information, are you checking trip per user? because you talk about trips, but you need an identifier per trip
the trip is from point a to b but 10 people can ride from a to b and you.
still confusing for me. can you show a sample of the data and exactly what are you trying to get?
January 17, 2019 at 2:52 pm
I'm doing a lot of guessing here; moving forward - DDL and sample data will help you get the best answer. Let's start with this:
USE tempdb
GO
IF OBJECT_ID('dbo.trip') IS NOT NULL DROP TABLE dbo.trip;
IF OBJECT_ID('dbo.station') IS NOT NULL DROP TABLE dbo.station;
CREATE TABLE dbo.station
(
stationId INT PRIMARY KEY CLUSTERED,
stationInfo VARCHAR(100)
);
CREATE TABLE dbo.trip
(
tripId INT IDENTITY PRIMARY KEY NONCLUSTERED,
starttime DATETIME,
endtime DATETIME,
stationId INT,
CONSTRAINT fk_trip__station FOREIGN KEY (stationID) REFERENCES dbo.station(stationId)
);
INSERT dbo.station (stationId, stationInfo)
VALUES (1, 'Greenville'), (2,'BooBerg'), (3,'ElseWhere');
INSERT dbo.trip (starttime, endtime, stationid)
VALUES
(GETDATE()-50.5,GETDATE()-50,1), (GETDATE()-100.775,GETDATE()-100,1),
(GETDATE()-60.85,GETDATE()-60.77,2), (GETDATE()-60.44,GETDATE()-60,2),
(GETDATE()-10.76,GETDATE()-10.7,3), (GETDATE()-11,GETDATE()-9.7,3),
(GETDATE()-11,GETDATE()-10,3), (GETDATE()-11.004,GETDATE()-11,3);
Then you could just do this:SELECT s.stationInfo, AvgMinutes = AVG(f.tm)
FROM dbo.trip AS t
CROSS APPLY (VALUES(DATEDIFF(MINUTE,t.starttime,t.endtime))) AS f(tm)
JOIN dbo.station AS s ON t.stationId = s.stationId
GROUP BY s.stationInfo;
Extremely long / short trips can skew your results. How would avoid that?
For this you could add a couple parameters to define an "Extremely long/short trip". For my example short = less than 10 min, long = more than a day. This will filter out the "short/long" trips prior to aggregation.
DECLARE @min-2 INT = 10, @max-2 INT = 1440; -- minutes; 1440 = 1 day
SELECT s.stationInfo, AvgMinutes = AVG(f.tm)
FROM dbo.trip AS t
CROSS APPLY (VALUES(DATEDIFF(MINUTE,t.starttime,t.endtime))) AS f(tm)
JOIN dbo.station AS s ON t.stationId = s.stationId
WHERE f.tm > @min-2 AND f.tm < @max-2
GROUP BY s.stationInfo;
-- Itzik Ben-Gan 2001
January 18, 2019 at 2:53 pm
Would you mind saving us the trouble of reading your mind and posting some DDL? It would also help if we had some sample data to work with.
Please post DDL and follow ANSI/ISO standards when asking for help.
January 24, 2019 at 3:58 am
Here an extended version of Alan Burstein's example.
A End station is added.
So roundtrips can be handled distinctly.
More examples are added to show the build in increments.
--
-- Example provided by Alan Burstein.
--
-- Extended
-- 20190124
-- Ben Brugman
USE tempdb
GO
IF OBJECT_ID('dbo.trip') IS NOT NULL DROP TABLE dbo.trip;
IF OBJECT_ID('dbo.station') IS NOT NULL DROP TABLE dbo.station;
CREATE TABLE dbo.station
(
stationId INT PRIMARY KEY CLUSTERED,
stationInfo VARCHAR(100)
);
CREATE TABLE dbo.trip
(
tripId INT IDENTITY PRIMARY KEY NONCLUSTERED,
starttime DATETIME,
endtime DATETIME,
stationId INT,
end_stationid INT,
CONSTRAINT fk_trip__station FOREIGN KEY (stationID) REFERENCES dbo.station(stationId)
,CONSTRAINT fk_trip__end_station FOREIGN KEY (end_stationID) REFERENCES dbo.station(stationId)
);
-- Values for 3 stations.
INSERT dbo.station (stationId, stationInfo)
VALUES (1, 'Greenville'), (2,'BooBerg'), (3,'ElseWhere');
-- Values for a number of trips.
INSERT dbo.trip (starttime, endtime, stationid, end_stationid)
VALUES
(GETDATE()-50.5,GETDATE()-50 ,1, 3),
(GETDATE()-100.775,GETDATE()-100,1, 2),
(GETDATE()-60.85,GETDATE()-60.77,2, 1),
(GETDATE()-60.44,GETDATE()-60 ,2, 3),
(GETDATE()-10.76,GETDATE()-10.7 ,3, 2),
(GETDATE()-11,GETDATE()-9.7 ,3, 1),
(GETDATE()-11,GETDATE()-10 ,3,3),
(GETDATE()-11.004,GETDATE()-11 ,3,2),
(GETDATE()-11.1,GETDATE()-11 ,3,3);
--
-- Show the data in the two tables.
--
select * from trip
select * from station
--
-- Show the timedifference in minutes for each trip.
--
select DATEDIFF(MINUTE, starttime, endtime) minutes,* from trip
--
-- Show the AVG, Min, Max for each station.
-- Remark:
-- The stationid field is used in the GROUP BY
-- Remark:
-- Trips of less then 10 minutes and of more than a day are excluded.
--
select
AVG (DATEDIFF(MINUTE, starttime, endtime)) AVG_minutes
, MIN (DATEDIFF(MINUTE, starttime, endtime)) min_minutes
, MAX (DATEDIFF(MINUTE, starttime, endtime)) max_minutes
, stationid from trip
WHERE DATEDIFF(MINUTE, starttime, endtime) > 10
AND
DATEDIFF(MINUTE, starttime, endtime) < 60*24
GROUP BY stationid
--
-- Get the 'name' of the station with the above query.
--
select
AVG (DATEDIFF(MINUTE, starttime, endtime)) AVG_minutes
, MIN (DATEDIFF(MINUTE, starttime, endtime)) min_minutes
, MAX (DATEDIFF(MINUTE, starttime, endtime)) max_minutes
, trip.stationid
, stationinfo
from trip join station on trip.stationid = station.stationid
WHERE DATEDIFF(MINUTE, starttime, endtime) > 10
AND
DATEDIFF(MINUTE, starttime, endtime) < 60*24
GROUP BY trip.stationid, stationinfo
--
-- Roundtrips. begin and startstation are the same.
--
select
AVG (DATEDIFF(MINUTE, starttime, endtime)) AVG_minutes
, MIN (DATEDIFF(MINUTE, starttime, endtime)) min_minutes
, MAX (DATEDIFF(MINUTE, starttime, endtime)) max_minutes
, trip.stationid
, stationinfo
from trip join station on trip.stationid = station.stationid
WHERE DATEDIFF(MINUTE, starttime, endtime) > 10
AND
DATEDIFF(MINUTE, starttime, endtime) < 60*24
AND
trip.stationId = trip.end_stationid
GROUP BY trip.stationid, stationinfo
--
-- Reuse the difference calculation several times.
-- Use a 'WITH' of CTE (Common Table Expression) construction for this.
-- A calculate the Delta time and resolve the stationinfo.
-- B Select the correct Delta times.
-- C Get the Average, Min and Max values of Delta.
-- (Roundtrips are still included)
-- A, B, C can be seen as temporary views. (Only during this statement).
--
;
WITH
A AS (SELECT DATEDIFF(MINUTE, starttime, endtime) Delta, stationinfo FROM trip join station on trip.stationId = station.stationId)
,B as (SELECT * FROM A WHERE Delta >= 10 AND Delta < 24*60)
,C as (SELECT AVG(Delta) AVG_Delta, MIN(Delta) Min_Delta, MAX(Delta) Max_Delta, stationinfo
FROM B GROUP BY stationinfo)
SELECT * FROM C
--
-- CleanUp.
--
IF OBJECT_ID('dbo.trip') IS NOT NULL DROP TABLE dbo.trip;
IF OBJECT_ID('dbo.station') IS NOT NULL DROP TABLE dbo.station;
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply