LLVM, research project at Illinois. It shows the importance of getting industry to use it, Chris Lattner went to Apple to get it used.
- Definition: Dataflow: Execution Order DNN operation
- Computation Order
- Data Movement Order
Loops desrcibe the order of execution
for: temporal one step at a time.
spatial_for: parallel at the same time.
Output Stationary (?)
- Finish convolution, keeping output the same.
Each output row in parallel. Each "MAC" (move input and weight) in parallel.
Chunk it. Use more for loops.
mvin Weight and Input Activation memory before,
mvout output activation.
With all tiling.
Add a shared buffer. Higher in memory hierarchy.
Size of the buffers.
How much IA is moved (index in by pointer) or W.
For example IA is stationary in the first example while W is stationary in the second example.
Can you parallelize this with multiple devices? Split the model (weights) (?)
spatial_for of weights size
Note: Shared Input Activation. Multicast or duplicate.
Split the data (input activations)
Now weights are shared.
Hardware-Software Co-optimization. Lots of manual tuning though.