Input scaling, optimiser choise and other changes#3
Open
dquigley533 wants to merge 1 commit intoTheodoreWolf:mainfrom
Open
Input scaling, optimiser choise and other changes#3dquigley533 wants to merge 1 commit intoTheodoreWolf:mainfrom
dquigley533 wants to merge 1 commit intoTheodoreWolf:mainfrom
Conversation
…parameters needed. Switched to LBFGS since examples use whole batch training. Adjusted training to that physics_loss dominates. Used smooth activation functions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I came across your Medium post on this - thanks for making the code available. I'm currently learning about PINNs myself and found your tutorial to be a more useful introduction than many things I've read.
In my further reading and experimentation I made some tweaks which you might consider improvements, hence the PR in case you'd like to incorporate them.
Input/Output scaling. The input time is scaled into [0,1], as is the output. This dramatically reduces the number of parameters needed to get a good reproduction of the PDE solution. I'm using a few dozen nodes in a single hidden layer.
Switch optimiser. Your example was training on the whole data, rather than a batch, so using AdamW was unnecessary since you have the exact (rather than estimated) gradients. Using LBFGS, the training converges in a handful of steps, noting that I switched activation functions from ReLU to GELU to get smooth gradients.
With those changes the problem fixed by the L2 regularisation doesn't seem to occur, but I've left that in anyway.
In the examples which use
physics_lossI've weighted this massively higher than the MSE loss on the training points to avoid overfitting to the noise on those points.The whole thing now runs in a handful of seconds on a CPU, which did wonders for my confidence that PINNs would be tractable for problems in higher numbers of dimensions.
Obviously feel free to ignore - just felt compelled to share what I learned by trying to push your example to its limits.