Skip to content

Quantized ops are slow #177

Open
Open
@janjongboom

Description

@janjongboom

Running on DISCO-L475VG-IOT01A development board (Cortex-M4F) a quantized network is much slower than a non-quantized one. Simple MLP 33x20x10x5 with ReLu activation between the dense layers and SoftMax for output layer takes ~330 ms. with quantization, but only ~43 ms. without quantization. This is with all layers in flash.

Is this a known issue and any idea on how to make this faster? Could it be the flash overhead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions