Fall2024CS584 · dsasidharannair · Oct 11, 2024 · Oct 11, 2024
diff --git a/.DS_Store b/.DS_Store
diff --git a/.idea/.gitignore b/.idea/.gitignore
diff --git a/.idea/Project1.iml b/.idea/Project1.iml
diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml
diff --git a/.idea/misc.xml b/.idea/misc.xml
diff --git a/.idea/modules.xml b/.idea/modules.xml
diff --git a/.idea/vcs.xml b/.idea/vcs.xml
diff --git a/README.md b/README.md
@@ -1,8 +1,94 @@
-# Project 1 
+# ElasticNetModel - Linear Regression with ElasticNet Regularization
 
-Put your README here. Answer the following questions.
+## Team Members
 
-* What does the model you have implemented do and when should it be used?
-* How did you test your model to determine if it is working reasonably correctly?
-* What parameters have you exposed to users of your implementation in order to tune performance? (Also perhaps provide some basic usage examples.)
-* Are there specific inputs that your implementation has trouble with? Given more time, could you work around these or is it fundamental?
+- **Darshan Sasidharan Nair** : dsasidharannair@hawk.iit.edu
+- **Ishaan Goel** : igoel@hawk.iit.edu
+- **Ayesha** : asaif@hawk.iit.edu
+- **Ramya** : rarumugam@hawk.iit.edu
+
+## 1. Overview
+
+The `ElasticNetModel` class implements **Linear Regression** with **ElasticNet Regularization**, which is a combination of **L1 (Lasso)** and **L2 (Ridge)** regularization techniques. This model is used for:
+
+- **Linear regression tasks** where you want to predict a continuous target variable.
+- **Feature selection** when you have a large set of features and want to eliminate irrelevant ones (due to L1 regularization).
+- **Dealing with multicollinearity** in the dataset, as L2 regularization stabilizes the model and reduces variance.
+
+### When to Use:
+
+- ElasticNet should be used when the data has high-dimensional features and some of the features are expected to be irrelevant.
+- It’s useful when there is **multicollinearity** (i.e., when predictors are correlated) in the data.
+- Use it if you want to **select important features** and **reduce model complexity**.
+
+## 2. Testing and Validation
+
+To test if the model is works well, we performed the following tests:
+
+- **Initial Test**:
+  The model was first tested with synthetic datasets where the relationship between input features and target values was known. The model correctly identified trends and produced reasonable coefficient estimates.
+- **Real Data**:
+  The model was also validated on real-world datasets such as the Adverstising dataset, where it demonstrated expected behavior by regularizing feature weights, selecting important features, and generating sensible predictions.
+- **Benchmarking**:
+  The model was compared to `sklearn`'s `ElasticNet` implementation for consistency in predictions.
+
+### Metrics for Evaluation:
+
+- **Mean Squared Error (MSE)**: used to check how well the predicted values align with the actual target values.
+- **Coefficient Magnitude**: observing how regularization influences the weight values, especially when tuning the `lambda1` parameter.
+- **Real VS True Graph**: To check how close the predicted values were to the actual values and if the model actually fits the data points
+- **Residual Graph/ Histogram**: To check the distribution of the residuals and enure homoscedasticity
+
+### Correctness was established through:
+
+- **Convergence** of the loss function.
+- **Stability** in coefficients when tuning `lambda1`.
+- Comparing predictions against a baseline linear regression model.
+
+## 3. Parameters for Tuning
+
+The model can be optimized by tuning the following hyper-parameters:
+
+- **lambda1 (default: 0.5)**: This is the L1 regularization parameter (used in ElasticNet to control the penalty for the magnitude of the coefficients). Larger values encourage sparser solutions, where some feature coefficients may become zero. A value closer to 0 leans more towards L2 regularization, making it similar to Ridge regression.
+- **threshold (default: 0.000001)**: This is the convergence threshold for gradient descent. The training process stops when the change in model weights (beta coefficients) is less than this threshold, indicating that the model has converged. Smaller values might lead to more accurate solutions but could require more iterations.
+- **learning_rate (default: 0.000001)**: This is the step size for each iteration of gradient descent. It controls how much the model updates its weights after each step. A smaller learning rate ensures more stable convergence but might require more iterations to reach an optimal solution.
+- **scale (default: False)**: This is a Boolean flag indicating whether to scale the features using `MinMaxScaler`. When set to `True`, the input features are scaled to a specified range, defined by `scale_range`.
+- **scale_range (default: (-10, 10))**: This is the range within which the features are scaled when scale is set to `True`. This ensures the input data is transformed into a desired range, which can be helpful for algorithms that are sensitive to the magnitude of features.
+
+## 4. Challenges and Limitations
+
+### Potential Issues:
+
+- **Highly Correlated Features**: While `ElasticNetModel` handles multicollinearity better than standard linear regression, data with significantly high correlations between features might still cause some instability in weight updates.
+- **Nonlinear Relationships**: The model assumes a linear relationship between features and the target. If there is a nonlinear relationship in the data, `ElasticNetModel` will not perform well unless features are transformed appropriately prior to inputting them into the model. It also performs poorly on binary or general categorical data.
+- **Imbalanced Datasets**: If the dataset has a high class imbalance, `ElasticNetModel` might struggle to fit the data well.
+- **Large Values**: Datasets with large values need to be scaled down otherwise the runtime can grow very quickly.
+
+### Improvements:
+
+Given more time, improvements could include:
+
+- Implementing **cross-validation** to automatically determine optimal values for `lambda1`.
+- Developing **early stopping criteria** during gradient descent to avoid unnecessary iterations once the loss function stabilizes.
+- Implementing strategies to **handle missing data** would make the model more robust for real-world datasets, which often contain incomplete data.
+
+---
+
+### Usage Example:
+
+```python
+# Initialize the model
+elastic_net = ElasticNetModel(lambda1=0.7, learning_rate=0.01, threshold=0.0001, scale=True)
+
+# Fit the model to training data
+elastic_net.fit(X_train, y_train)
+
+# Predict using test data
+y_pred = elastic_net.predict(X_test)
+```
+
+A working usage example can be seen in the test_ElasticNetModel.py file. This can used for reference for future testing.
+
+The datasets provided can also be used to ensure that the model works. This is few of the real world datasets the team used to check the validity of the model.
+
+This implementation offers flexibility to users to experiment with various parameter settings, providing both L1 and L2 regularization, making it suitable for various types of linear regression problems.
diff --git a/elasticnet/.DS_Store b/elasticnet/.DS_Store
diff --git a/elasticnet/models/ElasticNet.py b/elasticnet/models/ElasticNet.py
@@ -1,17 +1,184 @@
-
+import matplotlib.pyplot as plt
+import numpy
+from sklearn.preprocessing import StandardScaler, MinMaxScaler
 
 class ElasticNetModel():
-    def __init__(self):
-        pass
+    # The init function takes in the hyperparameters for lambda1 from which lambda2 is also calculated, the threshold which the criteria
+    # for stopping the gradient descent allowing the model to converge, the learning rate to specify how fast the model learns.
+    # The scale is used to allow the user to either scale or not scale their data and the scale range scales all the values
+    # between the specified range.
+    def __init__(self, lambda1 = 0.5, threshold = 0.000001, learning_rate = 0.000001, scale = False, scale_range = (-10,10)):
+        if not isinstance(lambda1, (float, int)) or lambda1 <= 0:
+            raise ValueError("lambda1 must be a positive number.")
+        self.lambda1 = lambda1
+
+        if not isinstance(threshold, (float, int)) or threshold <= 0:
+            raise ValueError("threshold must be a positive number.")
+        self.threshold = threshold
+
+        if not isinstance(learning_rate, (float, int)) or learning_rate <= 0:
+            raise ValueError("learning_rate must be a positive number.")
+        self.learning_rate = learning_rate
+
+        if not isinstance(scale, bool):
+            raise ValueError("scale must be a boolean (True or False).")
+        self.shouldScale = scale
+
+        if not (isinstance(scale_range, tuple) and len(scale_range) == 2 and
+                all(isinstance(x, (int, float)) for x in scale_range) and
+                scale_range[0] < scale_range[1]):
+            raise ValueError("scale_range must be a tuple of two numbers (min, max) where min < max.")
+        self.scaler = MinMaxScaler(feature_range=scale_range)
+
+    def fit(self, A, ys):
+        if not isinstance(A, numpy.ndarray):
+            raise ValueError("A must be a numpy array.")
+        if not isinstance(ys, numpy.ndarray):
+            raise ValueError("ys must be a numpy array.")
 
+        if numpy.any(A == None):
+            raise ValueError("A contains None values.")
+        if numpy.any(ys == None):
+            raise ValueError("ys contains None values.")
 
-    def fit(self, X, y):
-        return ElasticNetModelResults()
+        if numpy.isnan(A).any():
+            raise ValueError("A contains NaN values.")
+        if numpy.isnan(ys).any():
+            raise ValueError("ys contains NaN values.")
 
+        # Checks if scaling is required, if it is then scale the data
+        if(self.shouldScale):
+            # Scales the train data between the specified range
+            A = self.scaler.fit_transform(A)
+            ys = self.scaler.fit_transform(ys)
+        # Initialized the random multidimensional arrays generator
+        rng = numpy.random.default_rng()
+        # Create a matrix with 1 column and len(A) rows to account for the intercept
+        intercept_ones = numpy.ones((len(A), 1))
+        # Append the matrix of all ones to the data to account for the intercept
+        A = numpy.c_[intercept_ones, A]
+        # Get the number of rows and number of columns for the data
+        Ny,dy = ys.shape
+        self.N, self.d = A.shape
+        if self.N == 0:
+            # If there are no rows then raise an error
+            raise ValueError("Number of samples cannot be zero.")
+        if(Ny != self.N):
+            raise ValueError("Number of samples has to be same for both Target and Features")
+        # Set a random staring point for the beta matrix
+        self.beta = rng.normal(loc=0, scale=0.01, size=(self.d, 1))
+        # Set a beta before complete with zeroes so that we can compare if we have met the required threshold
+        self.beta_before = numpy.zeros(shape=(self.d, 1))
+        # Check if the required threshold as been satisfied, if not continue looping
+        while (numpy.linalg.norm(self.beta - self.beta_before) > self.threshold):
+            # Set the beta before to the current beta
+            self.beta_before = self.beta
+            # Update the weights
+            self.beta = self.change_weights(A, ys)
+        # Once the beta has converged return a Result Class with the beta value stored in it
+        return ElasticNetModelResults(self.beta,self.scaler,self.shouldScale)
+
+    def change_weights(self, A, ys):
+        # Create an empty gradient matrix filled with zeroes
+        gradient = numpy.zeros_like(self.beta)
+        # Get the predictions for the current values of beta using the dot product
+        predictions = numpy.dot(A, self.beta)
+        # Use the gradient formula for Elastic Net Regression to calculate each of the gradient values and store
+        # it in the gradient matrix
+        for i in range(self.d):
+            if self.beta[i, 0] > 0:
+                gradient[i, 0] = (-2 * numpy.dot(A[:, i], (ys - predictions)) + self.lambda1 + (
+                            2 * (1 - self.lambda1) * self.beta[i, 0])) / self.d
+            elif self.beta[i, 0] < 0:
+                gradient[i, 0] = (-2 * numpy.dot(A[:, i], (ys - predictions)) - self.lambda1 + (
+                            2 * (1 - self.lambda1) * self.beta[i, 0])) / self.d
+            else:
+                gradient[i, 0] = (-2 * numpy.dot(A[:, i], (ys - predictions)) + (
+                            2 * (1 - self.lambda1) * self.beta[i, 0])) / self.d
+        # Apply the learning rate to the gradient and substract it from the beta to move the beta closer to its actual value
+        return self.beta - (self.learning_rate * gradient)
 
 class ElasticNetModelResults():
-    def __init__(self):
-        pass
 
+    # Initializing the model's parameters through the constructor function:
+    def __init__(self, beta, scaler, shouldScale):
+        #   beta: Coefficients of the ElasticNet model (including intercept)
+        #   scaler: Scaler object used to scale the features (e.g., StandardScaler or MinMaxScaler)
+        #   shouldScale: Boolean indicating whether the features should be scaled before prediction
+        self.beta = beta
+        self.scaler = scaler
+        self.shouldScale = shouldScale
+
+    # Predicting the output using the model's coefficients:
     def predict(self, x):
-        return 0.5
+        self.check_x(x)
+
+        # Scaling the features and target values if the shouldScale flag is True
+        if (self.shouldScale):
+            x = self.scaler.fit_transform(x)
+
+        # Adding a column of ones for the intercept term.
+        intercept_ones = numpy.ones((len(x), 1))
+
+        x_b = numpy.c_[intercept_ones, x]
+        return numpy.dot(x_b, self.beta)
+
+    def check_x(self,A):
+        if not isinstance(A, numpy.ndarray):
+            raise ValueError("A must be a numpy array.")
+        if numpy.any(A == None):
+            raise ValueError("A contains None values.")
+        if numpy.isnan(A).any():
+            raise ValueError("A contains NaN values.")
+
+    def check_y(self,ys):
+        if not isinstance(ys, numpy.ndarray):
+            raise ValueError("ys must be a numpy array.")
+        if numpy.any(ys == None):
+            raise ValueError("ys contains None values.")
+        if numpy.isnan(ys).any():
+            raise ValueError("ys contains NaN values.")
+
+    # Creating a scatter plot comparing actual vs predicted values:
+    def getActualVsTrueGraph(self, x, y):
+        self.check_x(x)
+        self.check_y(y)
+        if (self.shouldScale):
+            x = self.scaler.fit_transform(x)
+            y = self.scaler.fit_transform(y)
+        intercept_ones = numpy.ones((len(x), 1))
+        x_b = numpy.c_[intercept_ones, x]
+
+        # Calculating predicted values
+        pred = numpy.dot(x_b, self.beta)
+
+        # Creating a scatter plot with actual values on x-axis and predicted values on y-axis
+        plt.scatter(y[:, 0], pred[:, 0], color='green', alpha=0.5)
+        plt.plot([y.min(), y.max()], [y.min(), y.max()], color='red', linestyle='--')
+
+        plt.xlabel('Actual Values')
+        plt.ylabel('Predicted Values')
+        plt.title('Predicted vs. Actual Plot')
+        plt.show()
+
+    # Creating a residual plot, which visualizes the difference between actual and predicted values:
+    def getResidualGraph(self, x, y):
+        self.check_x(x)
+        self.check_y(y)
+        if (self.shouldScale):
+            x = self.scaler.fit_transform(x)
+            y = self.scaler.fit_transform(y)
+        intercept_ones = numpy.ones((len(x), 1))
+        x_b = numpy.c_[intercept_ones, x]
+        pred = numpy.dot(x_b, self.beta)
+
+        # Calculating the residuals (differences between actual and predicted values)
+        residual = y[:, 0] - pred[:, 0]
+
+        plt.scatter(pred[:, 0], residual, color='blue', alpha=0.5)
+        plt.axhline(y=0, color='red', linestyle='--')
+
+        plt.xlabel('Predicted Values')
+        plt.ylabel('Residuals')
+        plt.title('Residual Plot')
+        plt.show()