Quantized ops are slow

Running on DISCO-L475VG-IOT01A development board (Cortex-M4F) a quantized network is much slower than a non-quantized one. Simple MLP 33x20x10x5 with ReLu activation between the dense layers and SoftMax for output layer takes ~330 ms. with quantization, but only ~43 ms. without quantization. This is with all layers in flash.

Is this a known issue and any idea on how to make this faster? Could it be the flash overhead?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantized ops are slow #177

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantized ops are slow #177

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions