Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
95765c5
sofia and vanessa linux assignment
SofCora May 28, 2025
f11718d
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora May 28, 2025
b608967
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora May 29, 2025
076b9f2
so i accidentally made changes to the notebook but also these are my …
SofCora May 30, 2025
3699015
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora May 30, 2025
f07010d
multiline commands?
SofCora May 30, 2025
a96522c
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora May 30, 2025
fad88b6
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora May 30, 2025
363be84
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora May 30, 2025
b236a85
solutions to all but last questions
SofCora May 30, 2025
4bdfc11
ribbons
SofCora May 30, 2025
a5ee03d
added then deleted stuff no real changes
SofCora May 30, 2025
53ff747
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora May 30, 2025
351b877
finished all problems to plot trips.R assignment
SofCora Jun 2, 2025
4d04bf9
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 2, 2025
23a3eee
solutions for week 2 day 1 i can knit but i didnt do any of the r mar…
SofCora Jun 2, 2025
c0a07d9
practicing r markdown formatting
SofCora Jun 2, 2025
edf1210
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 2, 2025
3b55a9d
we did the wrong markdown exercise but ill do the rest tommorrow
SofCora Jun 2, 2025
cb6ca4a
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 3, 2025
bebaf04
solutions for week 2 day 2
SofCora Jun 4, 2025
e78dd0c
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 4, 2025
5d4ec4a
beginning of day 6/5/25
SofCora Jun 5, 2025
ccdd672
6/5/25
SofCora Jun 5, 2025
a106cab
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 5, 2025
33fff92
solutions
SofCora Jun 6, 2025
8d550f8
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 6, 2025
d32dc74
more solutions
SofCora Jun 6, 2025
97e2dff
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 9, 2025
9198d54
message
SofCora Jun 10, 2025
cce5076
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 10, 2025
7e0d98a
finishing week 2 day 5
SofCora Jun 10, 2025
3ebed5e
week 3 monday work
SofCora Jun 10, 2025
843f924
my solutions for up to week 3 day 2
SofCora Jun 11, 2025
16ee432
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 11, 2025
7731163
downloaded data for movie paper
SofCora Jun 12, 2025
0eac5fb
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 12, 2025
fff842e
movie lense
SofCora Jun 12, 2025
83d74cd
solutions to movielens.Rmd 6/13/25
SofCora Jun 13, 2025
d75718c
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 13, 2025
8f87681
fixed scale for last plot
SofCora Jun 13, 2025
49b414b
solutions from friday up to but not including the inlet graph
SofCora Jun 16, 2025
c76e0be
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 16, 2025
7f5eb1c
screenshot
SofCora Jun 16, 2025
9f2402f
best model for 6/16 citibike its not great lol
SofCora Jun 16, 2025
8d643ca
best model i got
SofCora Jun 17, 2025
0fad8ea
Merge branch 'master' of https://github.com/msr-ds3/coursework
SofCora Jun 17, 2025
a27e552
tested with 10% split 2014 data 3401.735 rmse
SofCora Jun 17, 2025
c2faf24
test
SofCora Jun 17, 2025
e1b42ff
rmse is unreasonably bad whatever lol
SofCora Jun 17, 2025
faa8a1b
changed weather scale
SofCora Jun 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 28 additions & 11 deletions week1/citibike.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,30 +19,47 @@ trips <- mutate(trips, gender = factor(gender, levels=c(0,1,2), labels = c("Unkn


########################################
# YOUR SOLUTIONS BELOW
# YOUR SOLUTIONS BELOW
########################################

# count the number of trips (= rows in the data frame)
# count the number of trips (= rows in the data frame) 224736, summarize(trips, count = n())

# find the earliest and latest birth years (see help for max and min to deal with NAs)
# find the earliest and latest birth years (see help for max and min to deal with NAs) 1997 as.numeric(trips$birth_year) then get min(birth_year na.rm=TRUE) 1899

# use filter and grepl to find all trips that either start or end on broadway
# use filter and grepl to find all trips that either start or end on broadway filter(trips, grepl('Broadway', start_station_name) | grepl('Broadway', end_station_name))

# do the same, but find all trips that both start and end on broadway
# do the same, but find all trips that both start and end on broadway filter(trips, grepl('Broadway', start_station_name) , grepl('Broadway', end_station_name))

# find all unique station names
# find all unique station names trips |> distinct(start_station_name)

# count the number of trips by gender, the average trip time by gender, and the standard deviation in trip time by gender
# do this all at once, by using summarize() with multiple arguments
# do this all at once, by using summarize() with multiple arguments
#gender count avg_trip_time sd_trip_time
# <fct> <int> <dbl> <dbl>
#1 Unknown 6731 1741. 5566.
#2 Male 176526 814. 5021.
#3 Female 41479 991. 7115.

# find the 10 most frequent station-to-station trips
# find the 10 most frequent station-to-station trips
View(trips |> group_by(start_station_name, end_station_name) |>
summarize(count = n()) |>
arrange(desc(count) |>
head(n=10)))

# find the top 3 end stations for trips starting from each start station
# find the top 3 end stations for trips starting from each start station view( trips |> group_by(start_station_name, end_station_name) |> summarize(count = n()) |> group_by(start_station_name) |> arrange(desc(count)) |> mutate(rank = row_number()) |> filter(rank <=3))

# find the top 3 most common station-to-station trips by gender
# find the top 3 most common station-to-station trips by gender view(trips |> group_by(start_station_name, end_station_name, gender) |> summarize(count = n()) |> arrange(desc(count))|> group_by(gender) |> mutate(rank = row_number()) |> filter(rank <=3) |> arrange(gender))

# find the day with the most trips
# tip: first add a column for year/month/day without time of day (use as.Date or floor_date from the lubridate package)
# tip: first add a column for year/month/day without time of day (use as.Date or floor_date from the lubridate package) trips_date <- trips |> mutate(date = as.Date(trips$starttime, "%m/%d/%y"))
#view( trips_date |> group_by(date) |> summarize(count = n()) |> arrange(desc(count)) |> head(n=1) )


# compute the average number of trips taken during each of the 24 hours of the day across the entire month
trips_hours <- trips |> mutate(hour = hour(trips$starttime))
view( trips_hours |> group_by(hour) |> summarize(count=n(), mean=count/28)) #you could also do days_in_month(trips$starttime) or something for each month
# what time(s) of day tend to be peak hour(s)?
trips_hours <- trips |> mutate(hour = hour(trips$starttime))
view(trips_hours)
view( trips_hours |> group_by(hour) |> summarize(count=n()) |> arrange(desc(count)))

196 changes: 188 additions & 8 deletions week1/citibike.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,201 @@
# add your solution after each of the 10 comments below
#

# count the number of unique stations
# count the number of unique stations, 330 cut -d, -f9 201402-citibike-tripdata.csv | sort | uniq -c | wc -l

# count the number of unique bikes
# count the number of unique bikes cut -d, -f12 201402-citibike-tripdata.csv | sort | uniq -c | wc -l
#5700

# count the number of trips per day
# count the number of trips per day cut -d, -f2 201402-citibike-tripdata.csv | cut -d' ' -f1 | sort | uniq -c
# 12771 2014-02-01
# 13816 2014-02-02
# 2600 2014-02-03
# 8709 2014-02-04
# 2746 2014-02-05
# 7196 2014-02-06
# 8495 2014-02-07
# 5986 2014-02-08
# 4996 2014-02-09
# 6846 2014-02-10
# 8343 2014-02-11
# 8580 2014-02-12
# 876 2014-02-13
# 3609 2014-02-14
# 2261 2014-02-15
# 3003 2014-02-16
# 4854 2014-02-17
# 5140 2014-02-18
# 8506 2014-02-19
# 11792 2014-02-20
# 8680 2014-02-21
# 13044 2014-02-22
# 13324 2014-02-23
# 12922 2014-02-24
# 12830 2014-02-25
# 11188 2014-02-26
# 12036 2014-02-27
# 9587 2014-02-28
# 1 starttime

# find the day with the most rides
# find the day with the most rides cut -d, -f2 201402-citibike-tripdata.csv | cut -d' ' -f1 | sort | uniq -c | sort -nr | head -n1
# 13816 2014-02-02

# find the day with the fewest rides
# find the day with the fewest rides cut -d, -f2 201402-citibike-tripdata.csv | cut -d' ' -f1 | sort | uniq -c | sort | head -n2
# 1 starttime
# 876 2014-02-13

# find the id of the bike with the most rides
# find the id of the bike with the most rides: cut -d, -f12 201402-citibike-tripdata.csv | sort | uniq -c | sort -r | head -n1
# 130 20837

# count the number of rides by gender and birth year
# cut -d, -f14,15 201402-citibike-tripdata.csv | sort | uniq -c
# 6717 \N,0
# 9 1899,1
# 68 1900,1
# 11 1901,1
# 5 1907,1
# 4 1910,1
# 1 1913,1
# 3 1917,1
# 1 1921,1
# 32 1922,1
# 5 1926,2
# 2 1927,1
# 1 1932,1
# 7 1932,2
# 10 1933,1
# 21 1934,1
# 14 1935,1
# 31 1936,1
# 24 1937,1
# 70 1938,1
# 5 1938,2
# 24 1939,1
# 19 1939,2
# 83 1940,1
# 1 1940,2
# 148 1941,1
# 16 1941,2
# 173 1942,1
# 9 1942,2
# 108 1943,1
# 22 1943,2
# 277 1944,1
# 34 1944,2
# 171 1945,1
# 43 1945,2
# 424 1946,1
# 30 1946,2
# 391 1947,1
# 60 1947,2
# 664 1948,1
# 143 1948,2
# 624 1949,1
# 101 1949,2
# 738 1950,1
# 152 1950,2
# 6 1951,0
# 1006 1951,1
# 146 1951,2
# 1040 1952,1
# 143 1952,2
# 1474 1953,1
# 301 1953,2
# 1636 1954,1
# 306 1954,2
# 1568 1955,1
# 349 1955,2
# 1777 1956,1
# 542 1956,2
# 1676 1957,1
# 562 1957,2
# 2333 1958,1
# 643 1958,2
# 2281 1959,1
# 539 1959,2
# 2679 1960,1
# 776 1960,2
# 2315 1961,1
# 432 1961,2
# 2808 1962,1
# 833 1962,2
# 3514 1963,1
# 715 1963,2
# 3679 1964,1
# 570 1964,2
# 2957 1965,1
# 687 1965,2
# 3440 1966,1
# 565 1966,2
# 4016 1967,1
# 634 1967,2
# 3931 1968,1
# 545 1968,2
# 4557 1969,1
# 898 1969,2
# 4657 1970,1
# 1079 1970,2
# 4132 1971,1
# 791 1971,2
# 4066 1972,1
# 962 1972,2
# 4097 1973,1
# 877 1973,2
# 4957 1974,1
# 891 1974,2
# 4185 1975,1
# 699 1975,2
# 4557 1976,1
# 1022 1976,2
# 4817 1977,1
# 1140 1977,2
# 5645 1978,1
# 1231 1978,2
# 6433 1979,1
# 1338 1979,2
# 6173 1980,1
# 1488 1980,2
# 6620 1981,1
# 1588 1981,2
# 6244 1982,1
# 1724 1982,2
# 6890 1983,1
# 1889 1983,2
# 7348 1984,1
# 1791 1984,2
# 7043 1985,1
# 2262 1985,2
# 6147 1986,1
# 1962 1986,2
# 5776 1987,1
# 1696 1987,2
# 6449 1988,1
# 1599 1988,2
# 5408 1989,1
# 1435 1989,2
# 4541 1990,1
# 1156 1990,2
# 8 1991,0
# 2377 1991,1
# 689 1991,2
# 1758 1992,1
# 410 1992,2
# 1398 1993,1
# 289 1993,2
# 927 1994,1
# 288 1994,2
# 664 1995,1
# 163 1995,2
# 234 1996,1
# 100 1996,2
# 164 1997,1
# 87 1997,2
# 1 birth year,gender

# count the number of trips that start on cross streets that both contain numbers (e.g., "1 Ave & E 15 St", "E 39 St & 2 Ave", ...)
# cut -d, -f5 201402-citibike-tripdata.csv | grep '.*[0-9].*&.*[0-9]' | wc -l
# 90549


# compute the average trip duration
# compute the average trip duration (need awk)
#awk '{SUM+=$1} END {print SUM/NR}' 201402-citibike-tripdata.csv
#874.516
Loading