Open
Description
Running on DISCO-L475VG-IOT01A development board (Cortex-M4F) a quantized network is much slower than a non-quantized one. Simple MLP 33x20x10x5 with ReLu activation between the dense layers and SoftMax for output layer takes ~330 ms. with quantization, but only ~43 ms. without quantization. This is with all layers in flash.
Is this a known issue and any idea on how to make this faster? Could it be the flash overhead?
Metadata
Metadata
Assignees
Labels
No labels