-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hey Jordan, looking at problem 2.4, how do you want us to implement the neural network? Do you want us to use:
Method #1
model = Sequential()
model.add(Dense(256, activation='relu',input_shape=(784,)))
model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.Adadelta(),metrics=['accuracy'])
model.fit(x_train, y_train,batch_size=32,epochs=10,verbose=1,validation_data=(x_test, y_test))
Method #2
alpha = 0.01 # set learning rate
theta_1 = np.random.normal(0,.1,size=(2,3)); b1 = np.zeros((1,3)) # init weights
theta_2 = np.random.normal(0,.1,size=(3,2)); b2 = np.zeros((1,2))
J = []
for i in range(10000):
l1 = relu(np.dot(X, theta_1) + b1) # l1 = X * theta_1
y_hat = softmax(np.dot(l1, theta_2) + b2) # Y_hat = l1 * theta_2 + b
cost = np.sum( - (Y * np.log(y_hat) + (1 - Y) * np.log(1 - y_hat)) )
J.append(cost) # store cost
dJ_dZ2 = d_softmax(y_hat,Y)
dJ_dtheta2 = np.dot(l1.T, dJ_dZ2) # compute gradients
dJ_db2 = np.sum(dJ_dZ2, axis=0, keepdims=True)
dJ_dZ1 = np.dot(dJ_dZ2, theta_2.T) * d_relu(l1)
dJ_db1 = np.sum(dJ_dZ1, axis=0, keepdims=True)
theta_2 -= alpha * dJ_dtheta2 # weight update
b2 -= alpha * dJ_db2
theta_1 -= alpha * np.dot(X.T, dJ_dZ1)
b1 -= alpha * dJ_db1
if J[-1] == 0 or J[-1] > 10: break
The issue with method #1 is that you can't implement the learning rate portion that you wanted us to use, so I am assuming it's method #2 but I wanted to clarify with you. Please let me know.