Technology


Parameter and Activation Sparsity


Efficiency

Parameter and Activation Sparsity


Many neural networks are highly overparameterized. By introducing parameter sparsity during training, model size and number of operations can be reduced by a factor of 10.

Many neural networks also perform too many unecessary operations. By introducing activation sparsity during training, the number of computations can be reduced by another factor of 10.

When paired together, dual sparsity can reduce power by 100x and required memory by 10x. In order to realize these gains, we've designed training utilities that make it easy to create high-sparsity models without harming performance.

Sparse math saves 100x on operations and 10x on storage

First-Class Sparsity Support and
Near-Memory-Compute


Speed

First-Class Sparsity Support and
Near-Memory-Compute


Dual sparsity gains cannot be realized without hardware support. We've designed the SPU to support sparse data formats to ensure models are compressed in memory and zero-skipping is exploited during runtime.

Accesing off-chip memory uses significantly more power than compute operations. By moving memory on-chip, data motion is minimized and power is reduced. Breaking memory into smaller banks and placing them near each processing element further reduces data motion and eliminates memory bottlenecks.

Because of dual-sparsity, large complex models will now fit in on-chip memory, allowing new use cases to be brought to smaller form factors without affecting battery life.

Femtosense hardware provides first class support for sparsity with a near memory compute architecture