312 B
312 B
Conclusion and Future Work
- achievable speedup of 17.6 × and 9.0 × hypothetical infinite compute system
- lower bound
- linux driver implementation
- comparison with real neural network workloads
- consider replacing library approach with compiler approach
- power comparison, power models needed