18337

18.337J/6.338J: Parallel Computing and Scientific Machine Learning

There are two main branches of technical computing: machine learning and scientific computing. Machine learning has received a lot of hype over the last decade, with techniques such as convolutional neural networks and TSne nonlinear dimensional reductions powering a new generation of data-driven analytics. On the other hand, many scientific disciplines carry on with large-scale modeling through differential equation modeling, looking at stochastic differential equations and partial differential equations describing scientific laws.

However, there has been a recent convergence of the two disciplines. This field, scientific machine learning, has been showcasing results like how partial differential equation simulations can be accelerated with neural networks. New methods, such as probabilistic and differentiable programming, have started to be developed specifically for enhancing the tools of this domain. However, the techniques in this field combine two huge areas of computational and numerical practice, meaning that the methods are sufficiently complex. How do you backpropagate an ODE defined by neural networks? How do you perform unsupervised learning of a scientific simulator?

In this class we will dig into the methods and understand what they do, why they were made, and thus how to integrate numerical methods across fields to accentuate their pros while mitigating their cons. This class will be a survey of the numerical techniques, showcasing how many disciplines are doing the same thing under different names, and using a common mathematical language to derive efficient routines which capture both data-driven and mechanistic-based modeling.

However, these methods will quickly run into a scaling issue if naively coded. To handle this problem, everything will have a focus on performance-engineering. We will start by focusing on algorithm which are inherently serial and learn to optimize serial code. Then we will showcase how logic-heavy code can be parallelized through multithreading and distributed computing techniques like MPI, while direct mathematical descriptions can be parallelized through GPU computing.

The final part of the course will be a unique project which pulls together these techniques. As a new field, the students will be exposed to the “low hanging fruit” and will be directed towards an area which they can make a quick impact. For the final project, students will team up to solve a new problem in the field of scientific machine learning, and receive helping writing up a publication-quality analysis about their work.

Note About COVID-19

During the Fall of 2020, the special circumstances call for special approaches to teaching. In order to accommodate the lack of in-person treatment, the course will be very project-based, helping students grow as researchers in the area of parallel computing and scientific machine learning. The goal of this approach will be to help train students to become successful in the modern international online open research environment. As such, lectures will be done by pre-recorded videos. A Slack will be created for asynchronous communication (all students registered for the course will receive an email invitation. Students who wish to follow along with the course should email to receive an invite). Drop in online office hours will be available to discuss the topic with the instructor and other students over a video chat (time TBD depending on the current environment of the students).

Half of the assessment will be based on homework assignments. These will be timed to ensure that the students are keeping up-to-date with the course material. The other half of the grade will be from a final project.

Syllabus

Lectures: Pre-recorded online Office Hours: TBD.

Prerequisites: While this course will be mixing ideas from high performance computing, numerical analysis, and machine learning, no one in the course is expected to have covered all of these topics before. Understanding of calculus, linear algebra, and programming is essential. 18.337 is a graduate-level subject so mathematical maturity and the ability to learn from primary literature is necessary. Problem sets will involve use of Julia, a Matlab-like environment (little or no prior experience required; you will learn as you go).

Textbook & Other Reading: There is no textbook for this course or the field of scientific machine learning. Some helpful resources are Hairer and Wanner’s Solving Ordinary Differential Equations I & II and Gilbert Strang’s Computational Science and Engineering. Much of the reading will come in the form of primary literature from journal articles posted here.

Grading: 50% problem sets, 10% for the final project proposal (due October 30th), and 40% for the final project (due December 18th). Problem sets and final projects will be submitted electronically.

Collaboration policy: Make an effort to solve the problem on your own before discussing with any classmates. When collaborating, write up the solution on your own and acknowledge your collaborators.

Final Project

The final project is a 10-20 page paper using the style template from the SIAM Journal on Numerical Analysis (or similar). The final project must include code for a high performance (or parallelized) implementation of the algorithm in a form that is usable by others. A thorough performance analysis is expected. Model your paper on academic review articles (e.g. read SIAM Review and similar journals for examples).

One possibility is to review an interesting algorithm not covered in the course and develop a high performance implementation. Some examples include:

Another possibility is to work on state-of-the-art performance engineering. This would be implementing a new auto-parallelization or performance enhancement. For these types of projects, implementing an application for benchmarking is not required, and one can instead benchmark the effects on already existing code to find cases where it is beneficial (or leads to performance regressions). Possible examples are:

Additionally, Scientific Machine Learning is a wide open field with lots of low hanging fruit. Instead of a review, a suitable research project can be used for chosen for the final project. Possibilities include:

Final project topics must be declared by October 30th with a 1 page extended abstract.

Schedule of Topics

Each topic is a group of three pieces: a numerical method, a performance-engineering technique, and a scientific application. These three together form a complete usable program that is demonstrated.

Homework 1: Parallelized dynamical system simulations and ODE integrators

Homework 2: Parameter estimation in dynamical systems and overhead of parallelism

Homework 3: Training neural ordinary differential equations (with GPUs)

Homeworks

Lecture Summaries and Handouts

Note that lectures are broken down by topic, not by day. Some lectures are more than 1 class day, others are less.

Lecture 1: Introduction and Syllabus

Lecture and Notes

This is to make sure we’re all on the same page. It goes over the syllabus and what will be expected of you throughout the course. If you have not joined the Slack, please use the link from the introduction email (or email me if you need the link!).

Lecture 1.1: Getting Started with Julia

Lecture and Notes

Optional Extra Resources

If you are not comfortable with Julia yet, here’s a few resources as sort of a “crash course” to get you up an running:

Some deeper materials:

Steven Johnson will be running a Julia workshop on 9/8/2020 for people who are interested. More details TBA.

Lecture 2: Optimizing Serial Code

Lecture and Notes

Optional Extra Resources

Before we start to parallelize code, build huge models, and automatically learn physics, we need to make sure our code is “good”. How do you know you’re writing “good” code? That’s what this lecture seeks to answer. In this lecture we’ll go through the techniques for writing good serial code and checking that your code is efficient.

Lecture 3: Introduction to Scientific Machine Learning Through Physics-Informed Neural Networks

Optional Extra Resources

Now let’s take our first stab at the application: scientific machine learning. What is scientific machine learning? We will define the field by looking at a few approaches people are taking and what kinds of problems are being solved using scientific machine learning. The field of scientific machine learning and its span across computational science to applications in climate modeling and aerospace will be introduced. The methodologies that will be studied, in their various names, will be introduced, and the general formula that is arising in the discipline will be laid out: a mixture of scientific simulation tools like differential equations with machine learning primitives like neural networks, tied together through differentiable programming to achieve results that were previously not possible. After doing a survey, we while dive straight into developing a physics-informed neural network solver which solves an ordinary differential equation.

Lecture 4: Introduction to Discrete Dynamical Systems

Optional Extra Resources

Now that the stage is set, we see that to go deeper we will need a good grasp on how both discrete and continuous dynamical systems work. We will start by developing the basics of our scientific simulators: differential and difference equations. A quick overview of geometric results in the study of differential and difference equations will set the stage for understanding nonlinear dynamics, which we will quickly turn to numerical methods to visualize. Even if there is not analytical solution to the dynamical system, overarching behavior such as convergence to zero can be determined through asymptotic means and linearization. We will see later that these same techniques for the basis for the analysis of numerical methods for differential equations, such as the Runge-Kutta and Adams-Bashforth methods.

Since the discretization of differential equations is indeed a discrete dynamical system, we will use this as a case study to see how serial scalar-heavy codes should be optimized. SIMD, in-place operations, broadcasting, heap allocations, and static arrays will be used to get fast codes for dynamical system simulation. These simulations will then be used to reveal some intriguing properties of dynamical systems which will be further explored through the rest of the course.

Lecture 5:

Optional Extra Resources

Now that we have a concrete problem, let’s start investigating ways to parallelize its solution. We will first see that many systems have an almost automatic way of parallelizing through array operations, which we will call array-based parallelism. The ability to easily parallelize large blocked linear algebra will be discussed, along with libraries like OpenBLAS, Intel MKL, CuBLAS (GPU parallelism) and Elemental.jl. This gives a form of Within-Method Parallelism which we can use to optimize specific algorithms which utilize linearity. Another form of parallelism is to parallelize over the inputs. We will describe how this is a form of data parallelism, and use this as a framework to introduce shared memory and distributed parallelism. The interactions between these parallelization methods and application considerations will be discussed.

Lecture 6: Styles of Parallelism

Here we continue down the line of describing methods of parallelism by giving a high level overview of the types of parallelism. SIMD and multithreading are reviewed as the basic forms of parallelism where message passing is not a concern. Then accelerators, such as GPUs and TPUs are introduced. Moving further, distributed parallel computing and its models are showcased. What we will see is that what kind of parallelism we are doing actually is not the main determiner as to how we need to think about parallelism. Instead, the determining factor is the parallel programming model, where just a handful of models, like task-based parallelism or SPMD models, are seen across all of the different hardware abstractions.

Lecture 7: Ordinary Differential Equations: Applications and Discretizations

In this lecture we will describe ordinary differential equations, where they arise in scientific contexts, and how they are solved. We will see that understanding the properties of the numerical methods requires understanding the dynamics of the discrete system generated from the approximation to the continuous system, and thus stability of a numerical method is directly tied to the stability properties of the dynamics. This gives the idea of stiffness, which is a larger computational idea about ill-conditioned systems.

Lecture 8: Forward-Mode Automatic Differentiation

As we will soon see, the ability to calculate derivatives underpins a lot of problems in both scientific computing and machine learning. We will specifically see it show up in later lectures on solving implicit equations f(x)=0 for stiff ordinary differential equation solvers, and in fitting neural networks. The common high performance way that this is done is called automatic differentiation. This lecture introduces the methods of forward and reverse mode automatic differentiation to setup future studies uses of the technique.

Lecture 9: Solving Stiff Ordinary Differential Equations

Lecture Notes

Additional Readings on Convergence of Newton’s Method

Solving stiff ordinary differential equations, especially those which arise from partial differential equations, are the common bottleneck of scientific computing. The largest-scale scientific computing models are generally using heavy compute power in order to tackle some implicitly timestepped PDE solve! Thus we will take a deep dive into how the different methods which are combined to create a stiff ordinary differential equation solver, looking at different aspects of Jacobian computations and linear solving and the effects that they have.

https://youtu.be/XQAe4pEZ6L4

Lecture 10: Basic Parameter Estimation, Reverse-Mode AD, and Inverse Problems

Now that we have models, how do you fit the models to data? This lecture goes through the basic shooting method for parameter estimation, showcases how it’s equivalent to training neural networks, and gives an in-depth discussion of how reverse-mode automatic differentiation is utilized in the training process for the efficient calculation of gradients.