Skip to content

2.4 Hyperparameter Optimization Question #1

@ramir266

Description

@ramir266

Hey Jordan, looking at problem 2.4, how do you want us to implement the neural network? Do you want us to use:

Method #1
model = Sequential()
model.add(Dense(256, activation='relu',input_shape=(784,)))
model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.Adadelta(),metrics=['accuracy'])
model.fit(x_train, y_train,batch_size=32,epochs=10,verbose=1,validation_data=(x_test, y_test))

Method #2
alpha = 0.01 # set learning rate
theta_1 = np.random.normal(0,.1,size=(2,3)); b1 = np.zeros((1,3)) # init weights
theta_2 = np.random.normal(0,.1,size=(3,2)); b2 = np.zeros((1,2))

J = []
for i in range(10000):
l1 = relu(np.dot(X, theta_1) + b1) # l1 = X * theta_1
y_hat = softmax(np.dot(l1, theta_2) + b2) # Y_hat = l1 * theta_2 + b

cost = np.sum( - (Y * np.log(y_hat) + (1 - Y) * np.log(1 - y_hat)) )
J.append(cost)                                         # store cost

dJ_dZ2 = d_softmax(y_hat,Y)                            
dJ_dtheta2 = np.dot(l1.T, dJ_dZ2)                      # compute gradients
dJ_db2 = np.sum(dJ_dZ2, axis=0, keepdims=True)

dJ_dZ1 = np.dot(dJ_dZ2, theta_2.T) * d_relu(l1)
dJ_db1 = np.sum(dJ_dZ1, axis=0, keepdims=True)

theta_2 -= alpha * dJ_dtheta2                         # weight update
b2 -= alpha * dJ_db2
theta_1 -= alpha * np.dot(X.T, dJ_dZ1)
b1 -= alpha * dJ_db1

if J[-1] == 0 or J[-1] > 10: break

The issue with method #1 is that you can't implement the learning rate portion that you wanted us to use, so I am assuming it's method #2 but I wanted to clarify with you. Please let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions