## Conclusion and Future Work
- achievable speedup of 17.6 × and 9.0 × hypothetical infinite compute system - lower bound - linux driver implementation - comparison with real neural network workloads - consider replacing library approach with compiler approach - power comparison, power models needed