Project Overview
This project demonstrates the complete pipeline for importing a PyTorch 7-layer MLP model from ONNX format
into MATLAB, rebuilding it as a native dlnetwork, quantizing to INT8, and generating embedded C code
through Simulink. The Option 5 (Native Rebuild) approach was used because direct ONNX import
produces custom autogenerated layers that block quantization and Simulink export.
Deployment Pipeline
model2.onnx
Custom Layers
Weights OK
14 Layers
56% RelErr
Layer Blocks
ERT
Summary Table
| Property | Value |
|---|---|
| Model Architecture | 7-layer MLP (5 → 50 → 50 → 50 → 50 → 50 → 50 → 5) |
| ONNX Operators | MatMul, Add, Relu (opset 11) |
| Import Method | Option 5: Native dlnetwork Rebuild |
| Why Option 5? | importNetworkFromONNX creates single custom autogenerated layer |
| Equivalence (Max) | 3.36e-4 (single-precision FP accumulation) |
| Equivalence (Relative) | 3.7e-7 |
| INT8 Quantization | Mean abs error: 248.7 (wide output range 230-450) |
| Simulink Models | FP32 (15 blocks) + INT8 (17 blocks) |
| C Code | Embedded Coder ERT with SSE2 intrinsics |
Network Architecture
ONNX Model (Original)
The ONNX model uses raw MatMul + Add + Relu operators (not ONNX Gemm), causing MATLAB importers to collapse everything into a single custom layer.
| Layer | Operation | Weight | Bias | Params |
|---|---|---|---|---|
| 1 | MatMul+Add+ReLU | 5×50 | 50 | 300 |
| 2 | MatMul+Add+ReLU | 50×50 | 50 | 2,550 |
| 3 | MatMul+Add+ReLU | 50×50 | 50 | 2,550 |
| 4 | MatMul+Add+ReLU | 50×50 | 50 | 2,550 |
| 5 | MatMul+Add+ReLU | 50×50 | 50 | 2,550 |
| 6 | MatMul+Add+ReLU | 50×50 | 50 | 2,550 |
| 7 | MatMul+Add | 50×5 | 5 | 255 |
| Total | 13,305 | |||
Native dlnetwork (Rebuilt)
The rebuilt network uses standard MATLAB layers that support quantization, Simulink export, and code generation.
Import Comparison
| Import Method | Result | Quantization | Simulink Export | Status |
|---|---|---|---|---|
| importNetworkFromONNX | Single custom autogenerated layer | Blocked | Blocked | FAIL |
| importONNXNetwork (legacy) | Requires format args, still custom layer | Blocked | Blocked | FAIL |
| importONNXFunction | Clean weight extraction, reference model | N/A (function) | N/A (function) | PASS |
| Option 5: Native Rebuild | 14 native layers, all standard types | Supported | Supported | BEST |
Weight Transfer Mapping
ONNX uses MatMul(W, X) where W is [inputSize × outputSize].
MATLAB's fullyConnectedLayer computes Weights * X where Weights is [outputSize × inputSize].
Therefore: Weights = WT (transpose).
| ONNX Weight | Shape | MATLAB Layer | Weights Shape | Bias Shape |
|---|---|---|---|---|
| W0 | [5, 50] | fc1 | [50, 5] | [50, 1] |
| W1 | [50, 50] | fc2 | [50, 50] | [50, 1] |
| W2 | [50, 50] | fc3 | [50, 50] | [50, 1] |
| W3 | [50, 50] | fc4 | [50, 50] | [50, 1] |
| W4 | [50, 50] | fc5 | [50, 50] | [50, 1] |
| W5 | [50, 50] | fc6 | [50, 50] | [50, 1] |
| W6 | [50, 5] | fc7 | [5, 50] | [5, 1] |
Numerical Equivalence Validation
Analysis
1,000 random test vectors (5-dimensional, normally distributed, single-precision) were passed through both the
original ONNX model (via importONNXFunction) and the rebuilt native dlnetwork.
| Metric | Value | Status |
|---|---|---|
| Max Absolute Error | 3.36e-4 | PASS |
| Mean Absolute Error | 9.37e-5 | PASS |
| Std Absolute Error | 4.87e-5 | PASS |
| 99th Percentile | 2.44e-4 | PASS |
| Mean Relative Error | 3.7e-7 | PASS |
All errors are within expected single-precision floating point tolerance.
The small absolute errors (~1e-4) arise from different computation order between
onnxMatMul (XT × W) and fullyConnectedLayer (W × X),
which are mathematically identical but numerically differ by floating-point rounding through 7 layers.
INT8 Quantization
Why High Error?
The model outputs range from approximately 230 to 450 (a span of ~220 units). INT8 has only 256 quantization levels, so each step represents ~0.86 units of output range. This coarse granularity causes substantial quantization noise for this regression model.
Calibration Ranges
| Layer | Type | Min | Max | Range |
|---|---|---|---|---|
| input | Activation | -3.23 | 4.25 | 7.49 |
| fc1 | Activation | -1.88 | 2.76 | 4.65 |
| fc4 | Activation | -55.4 | 48.8 | 104.1 |
| fc6 | Activation | -205.8 | 8.8 | 214.6 |
| fc7 (output) | Activation | 230.9 | 449.6 | 218.6 |
Note the exponential growth in activation ranges through the network, from ±4 at input to ±200+ at deeper layers.
Recommendations
- Output normalization before quantization
- FP16 half-precision for better accuracy
- Mixed-precision: keep output layer in FP32
- Quantization-aware training (QAT)
Simulink: FP32 vs INT8 Models
| Feature | FP32 Model | INT8 Model |
|---|---|---|
| Simulink File | model2_simulink.slx | model2_simulink_int8.slx |
| Block Count | 15 | 17 (+2 cast blocks) |
| Data Type | double (64-bit) | int8 (8-bit) |
| Extra Blocks | None | fc1_in_cast, fc7_out_cast |
| Accuracy | Full precision | ~57% relative error |
| Memory Footprint | ~107 KB (weights) | ~27 KB (weights) |
Simulink Deployment
FP32 Simulink Model
Exported using exportNetworkToSimulink with ExpandNetworkSubsystem=true,
creating individual layer blocks.
| Block | Type |
|---|---|
| input | Inport |
| fc1 — fc7 | Subsystem (Matrix Multiply + Add) |
| relu1 — relu6 | Subsystem (Max with 0) |
| fc7_out | Outport |
INT8 Simulink Model
Quantized model adds data type conversion blocks at input/output boundaries.
| Block | Type |
|---|---|
| input | Inport |
| fc1_in_cast | Data Type Conversion (FP → INT8) |
| fc1 — fc7 | Subsystem (INT8 arithmetic) |
| relu1 — relu6 | Subsystem (INT8 ReLU) |
| fc7_out_cast | Data Type Conversion (INT8 → FP) |
| fc7_out | Outport |
C Code Generation Configuration
| Parameter | Value |
|---|---|
| System Target File | ert.tlc (Embedded Coder) |
| Solver | FixedStepDiscrete |
| Fixed Step Size | 1 |
| Target Language | C |
| Code Only | Yes (no compilation) |
| Hardware Target | Intel x86-64 |
Generated File Sizes
Generated C Code
FP32 Model — Step Function
The main inference function processes input through 7 FC layers with SSE2 SIMD intrinsics for vectorized computation.
model2_simulink.c — model2_simulink_step()
/* Model step function */
void model2_simulink_step(void)
{
__m128d tmp_1;
real_T rtb_Add_b[50];
real_T rtb_Max[50];
real_T tmp[5];
real_T tmp_0;
int32_T i;
int32_T i_0;
/* FC Layer 1: input(5) -> hidden(50) */
for (i_0 = 0; i_0 < 50; i_0++) {
tmp_0 = 0.0;
for (i = 0; i < 5; i++) {
tmp_0 += model2_simulink_ConstP.Weights_Value[50 * i + i_0] *
model2_simulink_U.input[i];
}
rtb_Add_b[i_0] = tmp_0 + model2_simulink_ConstP.Bias_Value[i_0];
}
/* ReLU 1 */
model2_simulink_relu1(rtb_Add_b, rtb_Max);
/* FC Layers 2-6: hidden(50) -> hidden(50) with ReLU */
/* ... (repeated pattern for each layer) ... */
/* FC Layer 7: hidden(50) -> output(5) */
/* Uses SSE2 intrinsics for vectorized MAC operations */
}
FP32 Model — ReLU Function
model2_simulink.c — model2_simulink_relu1()
/* Shared ReLU implementation for all 6 hidden layers */
void model2_simulink_relu1(const real_T rtu_In1[50], real_T rty_Out1[50])
{
int32_T i;
for (i = 0; i < 50; i++) {
rty_Out1[i] = fmax(rtu_In1[i], 0.0);
}
}
INT8 Model — Step Function
The INT8 model uses integer arithmetic throughout, with data type conversion at boundaries.
model2_simulink_int8.c — model2_simulink_int8_step()
/* Model step function (INT8) */
void model2_simulink_int8_step(void)
{
int32_T i;
int32_T i_0;
int32_T tmp_0;
int8_T rtb_DataTypeConversion_m[50];
int8_T rtb_Max_p[50];
int8_T tmp[50];
/* FC Layer 1 with INT8 weights and activations */
for (i = 0; i < 50; i++) {
tmp_0 = 0;
for (i_0 = 0; i_0 < 5; i_0++) {
tmp_0 += model2_simulink_int8_ConstP.Weights_Value_f[50 * i_0 + i] *
model2_simulink_int8_U.input[i_0];
}
/* Saturate to INT8 range [-128, 127] */
/* ... */
}
/* INT8 ReLU: simple comparison with 0 */
model2_simulink_int8_relu1(rtb_DataTypeConversion_m, rtb_Max_p);
}
INT8 Model — ReLU Function
model2_simulink_int8.c — model2_simulink_int8_relu1()
/* INT8 ReLU - branch instead of fmax */
void model2_simulink_int8_relu1(const int8_T rtu_In1[50], int8_T rty_Out1[50])
{
int32_T i;
for (i = 0; i < 50; i++) {
int8_T u0;
u0 = rtu_In1[i];
if (u0 >= 0) {
rty_Out1[i] = u0;
} else {
rty_Out1[i] = 0;
}
}
}
Generated Files
FP32 Code
| File | Description |
|---|---|
| model2_simulink.c | Main step function with SSE2 |
| model2_simulink_data.c | Weights & biases as const arrays |
| model2_simulink.h | Type definitions & externs |
| model2_simulink_private.h | Internal declarations |
| ert_main.c | Application entry point |
| rtwtypes.h | Runtime type definitions |
INT8 Code
| File | Description |
|---|---|
| model2_simulink_int8.c | Main step function (integer ops) |
| model2_simulink_int8_data.c | Quantized weights as int8 arrays |
| model2_simulink_int8.h | Type definitions & externs |
| model2_simulink_int8_private.h | Internal declarations |
| ert_main.c | Application entry point |
| rtwtypes.h | Runtime type definitions |