Introduction

EE 290: Hardware for Machine Learning Spring 2020

Taught by Sophia Shao

Lectures

Lecture 9: Mapping

Link

In the News

LLVM, research project at Illinois. It shows the importance of getting industry to use it, Chris Lattner went to Apple to get it used.

Review

Dataflow

  • Definition: Dataflow: Execution Order DNN operation
    • Computation Order
    • Data Movement Order

Loops desrcibe the order of execution

  • for: temporal one step at a time.
  • spatial_for: parallel at the same time.

Convolution Loop Nest

Matrix Multiply

Output Stationary (?)

  • Finish convolution, keeping output the same.

Datapath Optimized on a TPU

Each output row in parallel. Each "MAC" (move input and weight) in parallel.

What if your hardware only 2x2 unit?

Chunk it. Use more for loops.

What about memory?

mvin Weight and Input Activation memory before, mvout output activation.

With all tiling.

Add a shared buffer. Higher in memory hierarchy.

DNN Mapping Problem

Loop Bounds

Size of the buffers.

Loop Ordering

How much IA is moved (index in by pointer) or W.

For example IA is stationary in the first example while W is stationary in the second example.

Spatial Choice

Model Parallelism

Can you parallelize this with multiple devices? Split the model (weights) (?)

spatial_for of weights size N

Note: Shared Input Activation. Multicast or duplicate.

Data Parallelism

Split the data (input activations)

Now weights are shared.

As an Optimization Problem

Hardware-Software Co-optimization. Lots of manual tuning though.

Lecture 10: FireSim Tutorial

Slide Link

Open source hardware/ISA/OS/Compilers is coming.

Berkeley has a lot of hardware research projects.

FireSim

AWS

Create an AWS account

Manager Instance

Billing