This project is my write-up of the Julia Academy Data Sciences Course (website). The original Jupyter notebooks are found on the official github repo.
My write-ups of the notebooks are in Pluto.jl. Some notebooks utilise the inbuilt package management within Pluto, others make use of a local environment (using Project.toml).
Run Julia as project in current directory
julia --project=.Load project.toml
] activate .
] instantiateInstall and load Pluto
] add Pluto
using Pluto
Pluto.run()- Introduction to loading, reading, saving data from/into various formats. Downloading data.
- Linear Algebra. Tools for working with Arrays, in particular Matrices. Factorisation, decomposition, sparse matrices, images, solving
Ax=b. - Basic statistics. Distributions, plotting, kernel density, sampling, fitting, hypothesis testing, correlations.
- Dimension reduction. PCA, t-SNE, UMAP. Reducing dimensionality of data to find variables of significance, project data onto 2 or 3 dimensions for visualisation.
- Clustering. Hclist, K-means, K-medoids. Grouping similar data.
- Classificaiton. Modelling categorical data, predicting which class a set of observations might fall into. Using Iris Dataset. Splitting data into training and testing. Lasso, ridge regression & elastic net, decision trees & random forest, nearest neighbours, and support vector machines.
- Regression. Linear regression, generalised linear models, non-linear fitting. Minimum least square error.
- Graphs. Map plotting, building graphs from adjacency matrices. Graph analysis including degree distribution, shortest distance, spanning trees, node importance measurements, clustering.
- Optimisation. Finding solution to a problem given a set of constraints. Min/maxing optimisation problems. Convex problems,
JuMPoptimisation. - Neural Networks. Basics of neural networks, training neural networks. Setting up training and testing data.
- Other languages. Calling functions from R, python, and C.
- Visualisation. Various plot types: violin, barplots/histograms, line plots, ribbon/band plots, scatter plots.
Where possible I have used a fully Julia solution, for example GLMNet.jl is a wrapper for a fortran library. Lasso.jl is a pure Julia version of GLMNet.
- I have used
Pluto.jlrather than Jupyter notebooks. - I used
Makie(specificallyCairoMakie.jl) in place of the standardPlots.jl. Lasso.jlinstead ofGLMNet.jlfor Lasso and Ridge regressions in notebook 6.
