Correlations in the Measles Dataset vis-à-vis Practice assignment. #4
-
|
Hello everyone, the intent behind starting this Q&A is to hopefully receive some constructive criticism or input regarding my (R code) solution to the measles practice assignment, and to also give a solution (and verify the correctness of it) to the bonus question therein. I have commented my code and I will give a basic rundown of what its doing and how it works, my intent while writing the solution was to avoid loops and take advantage of inbuilt vectorized operations and functions of R. The code: # Store the dataframe in the variable measles
measles <- read.csv("https://deepayan.github.io/BSDS/2025-01-DARP/slides/data/measles.csv")
# Apply log transform to the values in the rate column
measles$rate<- log(measles$rate)
# Get unique state names and store it in the vector states.
states <- unique(measles$state)
# Get all unique pairs of states and store it as a matrix in UniqStatePairs.
UniqStatePairs <- t(combn(states,2))
# For any given pair of states, ComputeCorr finds the correlation between their rates.
ComputeCorr <- function(pair){
state1 <- pair[1]
state2 <- pair[2]
state1df <- subset(measles, state == state1)
state2df <- subset(measles, state == state2)
mergedf <-merge(state1df,state2df, by="year", suffixes= c("_A","_B"))
cor(mergedf$rate_A,mergedf$rate_B,use="complete.obs")
}
# Apply the function ComputeCorr to all unique pairs of states
Corr_rates <-apply(UniqStatePairs,1,ComputeCorr)
UniqStatePairs <- as.data.frame(UniqStatePairs) # Convert the matrix to data frame
colnames(UniqStatePairs) <- c("State1", "State2") # Rename the columns
UniqStatePairs$Correlation <- Corr_rates # Add correlation to each row
# Find the rows with minimum and maximum correlation
max_corr_row <- UniqStatePairs[which.max(UniqStatePairs$Correlation), ]
min_corr_row <- UniqStatePairs[which.min(UniqStatePairs$Correlation), ]
print(max_corr_row)
print(min_corr_row) A couple of points where some explanation is required: The We apply the function Solution to the Bonus QuestionBefore thinking of the solution, I wanted to first understand what problem does the question pose. Supposing that we have removed all rows containing Assuming my interpretation is correct, we do not need to modify the given code at all, and in fact it returns the same output for the measles dataframe with I would deeply appreciate any corrections, optimisations or simplifications in this approach, and it would be equally as wonderful to see alternate approaches as well. More than anything I would like to know whether my solution to the bonus question is correct. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
@deepayan Professor, does this satisfactorily answer the bonus question? |
Beta Was this translation helpful? Give feedback.
Yes, this is fine. I would suggest using
expand.grid()instead ofcombn()just to avoid any mistakes with the ordering of the combinations.An alternative approach is the following, which takes advantage of the fact that cor() works on a data.frame or matrix, giving the pairwise correlation matrix.