Model2 MLP — ONNX Import & Embedded C Code Generation Explorer

Project Overview

This project demonstrates the complete pipeline for importing a PyTorch 7-layer MLP model from ONNX format into MATLAB, rebuilding it as a native dlnetwork, quantizing to INT8, and generating embedded C code through Simulink. The Option 5 (Native Rebuild) approach was used because direct ONNX import produces custom autogenerated layers that block quantization and Simulink export.

FC Layers

13,305

Parameters

3.4e-4

Max Equiv Error

INT8

Quantization

ERT

C Code Target

Deployment Pipeline

PyTorch
model2.onnx

→

importONNXNetwork
Custom Layers

→

importONNXFunction
Weights OK

→

Native dlnetwork
14 Layers

→

INT8 Quantize
56% RelErr

→

Simulink
Layer Blocks

→

Embedded C
ERT

Summary Table

Property	Value
Model Architecture	7-layer MLP (5 → 50 → 50 → 50 → 50 → 50 → 50 → 5)
ONNX Operators	MatMul, Add, Relu (opset 11)
Import Method	Option 5: Native dlnetwork Rebuild
Why Option 5?	importNetworkFromONNX creates single custom autogenerated layer
Equivalence (Max)	3.36e-4 (single-precision FP accumulation)
Equivalence (Relative)	3.7e-7
INT8 Quantization	Mean abs error: 248.7 (wide output range 230-450)
Simulink Models	FP32 (15 blocks) + INT8 (17 blocks)
C Code	Embedded Coder ERT with SSE2 intrinsics

Network Architecture

ONNX Model (Original)

The ONNX model uses raw MatMul + Add + Relu operators (not ONNX Gemm), causing MATLAB importers to collapse everything into a single custom layer.

Layer	Operation	Weight	Bias	Params
1	MatMul+Add+ReLU	5×50	50	300
2	MatMul+Add+ReLU	50×50	50	2,550
3	MatMul+Add+ReLU	50×50	50	2,550
4	MatMul+Add+ReLU	50×50	50	2,550
5	MatMul+Add+ReLU	50×50	50	2,550
6	MatMul+Add+ReLU	50×50	50	2,550
7	MatMul+Add	50×5	5	255
Total				13,305

Native dlnetwork (Rebuilt)

The rebuilt network uses standard MATLAB layers that support quantization, Simulink export, and code generation.

featureInputLayer(5)

↓

fullyConnectedLayer(50) — fc1

↓

reluLayer — relu1

↓

fullyConnectedLayer(50) — fc2

↓

reluLayer — relu2

↓

fullyConnectedLayer(50) — fc3

↓

reluLayer — relu3

↓

fullyConnectedLayer(50) — fc4

↓

reluLayer — relu4

↓

fullyConnectedLayer(50) — fc5

↓

reluLayer — relu5

↓

fullyConnectedLayer(50) — fc6

↓

reluLayer — relu6

↓

fullyConnectedLayer(5) — fc7

Import Comparison

Import Method	Result	Quantization	Simulink Export	Status
importNetworkFromONNX	Single custom autogenerated layer	Blocked	Blocked	FAIL
importONNXNetwork (legacy)	Requires format args, still custom layer	Blocked	Blocked	FAIL
importONNXFunction	Clean weight extraction, reference model	N/A (function)	N/A (function)	PASS
Option 5: Native Rebuild	14 native layers, all standard types	Supported	Supported	BEST

Weight Transfer Mapping

ONNX uses MatMul(W, X) where W is [inputSize × outputSize]. MATLAB's fullyConnectedLayer computes Weights * X where Weights is [outputSize × inputSize]. Therefore: Weights = W^T (transpose).

ONNX Weight	Shape	MATLAB Layer	Weights Shape	Bias Shape
W0	[5, 50]	fc1	[50, 5]	[50, 1]
W1	[50, 50]	fc2	[50, 50]	[50, 1]
W2	[50, 50]	fc3	[50, 50]	[50, 1]
W3	[50, 50]	fc4	[50, 50]	[50, 1]
W4	[50, 50]	fc5	[50, 50]	[50, 1]
W5	[50, 50]	fc6	[50, 50]	[50, 1]
W6	[50, 5]	fc7	[5, 50]	[5, 1]

Numerical Equivalence Validation

1,000

Test Samples

3.36e-4

Max Abs Error

9.37e-5

Mean Abs Error

3.7e-7

Mean Relative Error

Analysis

1,000 random test vectors (5-dimensional, normally distributed, single-precision) were passed through both the original ONNX model (via importONNXFunction) and the rebuilt native dlnetwork.

Metric	Value	Status
Max Absolute Error	3.36e-4	PASS
Mean Absolute Error	9.37e-5	PASS
Std Absolute Error	4.87e-5	PASS
99th Percentile	2.44e-4	PASS
Mean Relative Error	3.7e-7	PASS

All errors are within expected single-precision floating point tolerance. The small absolute errors (~1e-4) arise from different computation order between onnxMatMul (X^T × W) and fullyConnectedLayer (W × X), which are mathematically identical but numerically differ by floating-point rounding through 7 layers.

INT8 Quantization

248.7

Mean Abs Error

348.6

Max Abs Error

~57%

Relative Error

Why High Error?

The model outputs range from approximately 230 to 450 (a span of ~220 units). INT8 has only 256 quantization levels, so each step represents ~0.86 units of output range. This coarse granularity causes substantial quantization noise for this regression model.

Calibration Ranges

Layer	Type	Min	Max	Range
input	Activation	-3.23	4.25	7.49
fc1	Activation	-1.88	2.76	4.65
fc4	Activation	-55.4	48.8	104.1
fc6	Activation	-205.8	8.8	214.6
fc7 (output)	Activation	230.9	449.6	218.6

Note the exponential growth in activation ranges through the network, from ±4 at input to ±200+ at deeper layers.

Recommendations

Output normalization before quantization
FP16 half-precision for better accuracy
Mixed-precision: keep output layer in FP32
Quantization-aware training (QAT)

Simulink: FP32 vs INT8 Models

Feature	FP32 Model	INT8 Model
Simulink File	model2_simulink.slx	model2_simulink_int8.slx
Block Count	15	17 (+2 cast blocks)
Data Type	double (64-bit)	int8 (8-bit)
Extra Blocks	None	fc1_in_cast, fc7_out_cast
Accuracy	Full precision	~57% relative error
Memory Footprint	~107 KB (weights)	~27 KB (weights)

Simulink Deployment

FP32 Simulink Model

Exported using exportNetworkToSimulink with ExpandNetworkSubsystem=true, creating individual layer blocks.

Block	Type
input	Inport
fc1 — fc7	Subsystem (Matrix Multiply + Add)
relu1 — relu6	Subsystem (Max with 0)
fc7_out	Outport

INT8 Simulink Model

Quantized model adds data type conversion blocks at input/output boundaries.

Block	Type
input	Inport
fc1_in_cast	Data Type Conversion (FP → INT8)
fc1 — fc7	Subsystem (INT8 arithmetic)
relu1 — relu6	Subsystem (INT8 ReLU)
fc7_out_cast	Data Type Conversion (INT8 → FP)
fc7_out	Outport

C Code Generation Configuration

Parameter	Value
System Target File	ert.tlc (Embedded Coder)
Solver	FixedStepDiscrete
Fixed Step Size	1
Target Language	C
Code Only	Yes (no compilation)
Hardware Target	Intel x86-64

Generated File Sizes

Generated C Code

FP32 Model — Step Function

The main inference function processes input through 7 FC layers with SSE2 SIMD intrinsics for vectorized computation.

model2_simulink.c — model2_simulink_step()

/* Model step function */
void model2_simulink_step(void)
{
  __m128d tmp_1;
  real_T rtb_Add_b[50];
  real_T rtb_Max[50];
  real_T tmp[5];
  real_T tmp_0;
  int32_T i;
  int32_T i_0;

  /* FC Layer 1: input(5) -> hidden(50) */
  for (i_0 = 0; i_0 < 50; i_0++) {
    tmp_0 = 0.0;
    for (i = 0; i < 5; i++) {
      tmp_0 += model2_simulink_ConstP.Weights_Value[50 * i + i_0] *
        model2_simulink_U.input[i];
    }
    rtb_Add_b[i_0] = tmp_0 + model2_simulink_ConstP.Bias_Value[i_0];
  }

  /* ReLU 1 */
  model2_simulink_relu1(rtb_Add_b, rtb_Max);

  /* FC Layers 2-6: hidden(50) -> hidden(50) with ReLU */
  /* ... (repeated pattern for each layer) ... */

  /* FC Layer 7: hidden(50) -> output(5) */
  /* Uses SSE2 intrinsics for vectorized MAC operations */
}

FP32 Model — ReLU Function

model2_simulink.c — model2_simulink_relu1()

/* Shared ReLU implementation for all 6 hidden layers */
void model2_simulink_relu1(const real_T rtu_In1[50], real_T rty_Out1[50])
{
  int32_T i;
  for (i = 0; i < 50; i++) {
    rty_Out1[i] = fmax(rtu_In1[i], 0.0);
  }
}

INT8 Model — Step Function

The INT8 model uses integer arithmetic throughout, with data type conversion at boundaries.

model2_simulink_int8.c — model2_simulink_int8_step()

/* Model step function (INT8) */
void model2_simulink_int8_step(void)
{
  int32_T i;
  int32_T i_0;
  int32_T tmp_0;
  int8_T rtb_DataTypeConversion_m[50];
  int8_T rtb_Max_p[50];
  int8_T tmp[50];

  /* FC Layer 1 with INT8 weights and activations */
  for (i = 0; i < 50; i++) {
    tmp_0 = 0;
    for (i_0 = 0; i_0 < 5; i_0++) {
      tmp_0 += model2_simulink_int8_ConstP.Weights_Value_f[50 * i_0 + i] *
        model2_simulink_int8_U.input[i_0];
    }
    /* Saturate to INT8 range [-128, 127] */
    /* ... */
  }

  /* INT8 ReLU: simple comparison with 0 */
  model2_simulink_int8_relu1(rtb_DataTypeConversion_m, rtb_Max_p);
}

INT8 Model — ReLU Function

model2_simulink_int8.c — model2_simulink_int8_relu1()

/* INT8 ReLU - branch instead of fmax */
void model2_simulink_int8_relu1(const int8_T rtu_In1[50], int8_T rty_Out1[50])
{
  int32_T i;
  for (i = 0; i < 50; i++) {
    int8_T u0;
    u0 = rtu_In1[i];
    if (u0 >= 0) {
      rty_Out1[i] = u0;
    } else {
      rty_Out1[i] = 0;
    }
  }
}

Generated Files

FP32 Code

File	Description
model2_simulink.c	Main step function with SSE2
model2_simulink_data.c	Weights & biases as const arrays
model2_simulink.h	Type definitions & externs
model2_simulink_private.h	Internal declarations
ert_main.c	Application entry point
rtwtypes.h	Runtime type definitions

INT8 Code

File	Description
model2_simulink_int8.c	Main step function (integer ops)
model2_simulink_int8_data.c	Quantized weights as int8 arrays
model2_simulink_int8.h	Type definitions & externs
model2_simulink_int8_private.h	Internal declarations
ert_main.c	Application entry point
rtwtypes.h	Runtime type definitions