Project Overview

This project demonstrates the complete pipeline for importing a PyTorch 7-layer MLP model from ONNX format into MATLAB, rebuilding it as a native dlnetwork, quantizing to INT8, and generating embedded C code through Simulink. The Option 5 (Native Rebuild) approach was used because direct ONNX import produces custom autogenerated layers that block quantization and Simulink export.

7
FC Layers
13,305
Parameters
3.4e-4
Max Equiv Error
INT8
Quantization
ERT
C Code Target

Deployment Pipeline

PyTorch
model2.onnx
importONNXNetwork
Custom Layers
importONNXFunction
Weights OK
Native dlnetwork
14 Layers
INT8 Quantize
56% RelErr
Simulink
Layer Blocks
Embedded C
ERT

Summary Table

PropertyValue
Model Architecture7-layer MLP (5 → 50 → 50 → 50 → 50 → 50 → 50 → 5)
ONNX OperatorsMatMul, Add, Relu (opset 11)
Import MethodOption 5: Native dlnetwork Rebuild
Why Option 5?importNetworkFromONNX creates single custom autogenerated layer
Equivalence (Max)3.36e-4 (single-precision FP accumulation)
Equivalence (Relative)3.7e-7
INT8 QuantizationMean abs error: 248.7 (wide output range 230-450)
Simulink ModelsFP32 (15 blocks) + INT8 (17 blocks)
C CodeEmbedded Coder ERT with SSE2 intrinsics

Network Architecture

ONNX Model (Original)

The ONNX model uses raw MatMul + Add + Relu operators (not ONNX Gemm), causing MATLAB importers to collapse everything into a single custom layer.

LayerOperationWeightBiasParams
1MatMul+Add+ReLU5×5050300
2MatMul+Add+ReLU50×50502,550
3MatMul+Add+ReLU50×50502,550
4MatMul+Add+ReLU50×50502,550
5MatMul+Add+ReLU50×50502,550
6MatMul+Add+ReLU50×50502,550
7MatMul+Add50×55255
Total13,305

Native dlnetwork (Rebuilt)

The rebuilt network uses standard MATLAB layers that support quantization, Simulink export, and code generation.

featureInputLayer(5)
fullyConnectedLayer(50) — fc1
reluLayer — relu1
fullyConnectedLayer(50) — fc2
reluLayer — relu2
fullyConnectedLayer(50) — fc3
reluLayer — relu3
fullyConnectedLayer(50) — fc4
reluLayer — relu4
fullyConnectedLayer(50) — fc5
reluLayer — relu5
fullyConnectedLayer(50) — fc6
reluLayer — relu6
fullyConnectedLayer(5) — fc7

Import Comparison

Import MethodResultQuantizationSimulink ExportStatus
importNetworkFromONNXSingle custom autogenerated layerBlockedBlockedFAIL
importONNXNetwork (legacy)Requires format args, still custom layerBlockedBlockedFAIL
importONNXFunctionClean weight extraction, reference modelN/A (function)N/A (function)PASS
Option 5: Native Rebuild14 native layers, all standard typesSupportedSupportedBEST

Weight Transfer Mapping

ONNX uses MatMul(W, X) where W is [inputSize × outputSize]. MATLAB's fullyConnectedLayer computes Weights * X where Weights is [outputSize × inputSize]. Therefore: Weights = WT (transpose).

ONNX WeightShapeMATLAB LayerWeights ShapeBias Shape
W0[5, 50]fc1[50, 5][50, 1]
W1[50, 50]fc2[50, 50][50, 1]
W2[50, 50]fc3[50, 50][50, 1]
W3[50, 50]fc4[50, 50][50, 1]
W4[50, 50]fc5[50, 50][50, 1]
W5[50, 50]fc6[50, 50][50, 1]
W6[50, 5]fc7[5, 50][5, 1]

Numerical Equivalence Validation

1,000
Test Samples
3.36e-4
Max Abs Error
9.37e-5
Mean Abs Error
3.7e-7
Mean Relative Error

Analysis

1,000 random test vectors (5-dimensional, normally distributed, single-precision) were passed through both the original ONNX model (via importONNXFunction) and the rebuilt native dlnetwork.

MetricValueStatus
Max Absolute Error3.36e-4PASS
Mean Absolute Error9.37e-5PASS
Std Absolute Error4.87e-5PASS
99th Percentile2.44e-4PASS
Mean Relative Error3.7e-7PASS

All errors are within expected single-precision floating point tolerance. The small absolute errors (~1e-4) arise from different computation order between onnxMatMul (XT × W) and fullyConnectedLayer (W × X), which are mathematically identical but numerically differ by floating-point rounding through 7 layers.

INT8 Quantization

248.7
Mean Abs Error
348.6
Max Abs Error
~57%
Relative Error

Why High Error?

The model outputs range from approximately 230 to 450 (a span of ~220 units). INT8 has only 256 quantization levels, so each step represents ~0.86 units of output range. This coarse granularity causes substantial quantization noise for this regression model.

Calibration Ranges

LayerTypeMinMaxRange
inputActivation-3.234.257.49
fc1Activation-1.882.764.65
fc4Activation-55.448.8104.1
fc6Activation-205.88.8214.6
fc7 (output)Activation230.9449.6218.6

Note the exponential growth in activation ranges through the network, from ±4 at input to ±200+ at deeper layers.

Recommendations

  • Output normalization before quantization
  • FP16 half-precision for better accuracy
  • Mixed-precision: keep output layer in FP32
  • Quantization-aware training (QAT)

Simulink: FP32 vs INT8 Models

FeatureFP32 ModelINT8 Model
Simulink Filemodel2_simulink.slxmodel2_simulink_int8.slx
Block Count1517 (+2 cast blocks)
Data Typedouble (64-bit)int8 (8-bit)
Extra BlocksNonefc1_in_cast, fc7_out_cast
AccuracyFull precision~57% relative error
Memory Footprint~107 KB (weights)~27 KB (weights)

Generated C Code

FP32 Model — Step Function

The main inference function processes input through 7 FC layers with SSE2 SIMD intrinsics for vectorized computation.

model2_simulink.c — model2_simulink_step()
/* Model step function */
void model2_simulink_step(void)
{
  __m128d tmp_1;
  real_T rtb_Add_b[50];
  real_T rtb_Max[50];
  real_T tmp[5];
  real_T tmp_0;
  int32_T i;
  int32_T i_0;

  /* FC Layer 1: input(5) -> hidden(50) */
  for (i_0 = 0; i_0 < 50; i_0++) {
    tmp_0 = 0.0;
    for (i = 0; i < 5; i++) {
      tmp_0 += model2_simulink_ConstP.Weights_Value[50 * i + i_0] *
        model2_simulink_U.input[i];
    }
    rtb_Add_b[i_0] = tmp_0 + model2_simulink_ConstP.Bias_Value[i_0];
  }

  /* ReLU 1 */
  model2_simulink_relu1(rtb_Add_b, rtb_Max);

  /* FC Layers 2-6: hidden(50) -> hidden(50) with ReLU */
  /* ... (repeated pattern for each layer) ... */

  /* FC Layer 7: hidden(50) -> output(5) */
  /* Uses SSE2 intrinsics for vectorized MAC operations */
}

FP32 Model — ReLU Function

model2_simulink.c — model2_simulink_relu1()
/* Shared ReLU implementation for all 6 hidden layers */
void model2_simulink_relu1(const real_T rtu_In1[50], real_T rty_Out1[50])
{
  int32_T i;
  for (i = 0; i < 50; i++) {
    rty_Out1[i] = fmax(rtu_In1[i], 0.0);
  }
}

INT8 Model — Step Function

The INT8 model uses integer arithmetic throughout, with data type conversion at boundaries.

model2_simulink_int8.c — model2_simulink_int8_step()
/* Model step function (INT8) */
void model2_simulink_int8_step(void)
{
  int32_T i;
  int32_T i_0;
  int32_T tmp_0;
  int8_T rtb_DataTypeConversion_m[50];
  int8_T rtb_Max_p[50];
  int8_T tmp[50];

  /* FC Layer 1 with INT8 weights and activations */
  for (i = 0; i < 50; i++) {
    tmp_0 = 0;
    for (i_0 = 0; i_0 < 5; i_0++) {
      tmp_0 += model2_simulink_int8_ConstP.Weights_Value_f[50 * i_0 + i] *
        model2_simulink_int8_U.input[i_0];
    }
    /* Saturate to INT8 range [-128, 127] */
    /* ... */
  }

  /* INT8 ReLU: simple comparison with 0 */
  model2_simulink_int8_relu1(rtb_DataTypeConversion_m, rtb_Max_p);
}

INT8 Model — ReLU Function

model2_simulink_int8.c — model2_simulink_int8_relu1()
/* INT8 ReLU - branch instead of fmax */
void model2_simulink_int8_relu1(const int8_T rtu_In1[50], int8_T rty_Out1[50])
{
  int32_T i;
  for (i = 0; i < 50; i++) {
    int8_T u0;
    u0 = rtu_In1[i];
    if (u0 >= 0) {
      rty_Out1[i] = u0;
    } else {
      rty_Out1[i] = 0;
    }
  }
}

Generated Files

FP32 Code

FileDescription
model2_simulink.cMain step function with SSE2
model2_simulink_data.cWeights & biases as const arrays
model2_simulink.hType definitions & externs
model2_simulink_private.hInternal declarations
ert_main.cApplication entry point
rtwtypes.hRuntime type definitions

INT8 Code

FileDescription
model2_simulink_int8.cMain step function (integer ops)
model2_simulink_int8_data.cQuantized weights as int8 arrays
model2_simulink_int8.hType definitions & externs
model2_simulink_int8_private.hInternal declarations
ert_main.cApplication entry point
rtwtypes.hRuntime type definitions