Knowledge Base

Everything you need to know and understand to develop V2X applications.

denm-generator.py
msg = self.generate_denm()
future = self.send_request(msg)
future.add_done_callback(self.request_completed)

TfLite Inference Example

The cube:evk makes real-time AI at the edge simple and efficient. Powered by the NXP i.MX 8M Plus SoC with an integrated 2.3 TOPS NPU, it can accelerate neural-network inference directly on the device — without cloud round-trips, high latency, or heavy CPU load.

To help you get started quickly, we provide a complete, open-source example that demonstrates object detection with LiteRT (TensorFlow Lite) using hardware acceleration via the VX Delegate.


Example Repository

➡️ tflite-inference-example https://github.com/cubesys-GmbH/tflite-inference-example

This example shows:

  • Loading and executing a TFLite model
  • Using the NPU (VX Delegate) for acceleration
  • Falling back to CPU-only inference
  • Drawing bounding boxes on detected objects
  • End-to-end inference pipeline for still images

It is designed as a minimal, readable starting point for your own perception workloads.


Getting Started

1. Clone the Repository

git clone https://github.com/cubesys-GmbH/tflite-inference-example.git
cd tflite-inference-example

2. Create a Python Virtual Environment

This keeps all dependencies self-contained.

python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

This installs:

  • tflite-runtime (LiteRT)
  • OpenCV
  • Pillow
  • Utility libraries used by the example

Hardware Acceleration: VX Delegate

The i.MX8MPlus NPU is accessed via the VX Delegate (libvx_delegate.so). Verify that the delegate is available on the cube:

ls /usr/lib/libvx_delegate.so

If not found, the example will automatically fall back to CPU inference.


Run the Example

Run inference with NPU acceleration:

python image_detection.py --input input/example.jpg --output output/result.jpg

If the VX delegate loads successfully, you'll see:

VX delegate loaded (NPU acceleration enabled)

Force CPU-only inference:

python image_detection.py --input input/example.jpg --output output/result.jpg --no-delegate

The output image with bounding boxes is saved under:

output/result.jpg

Understanding the Model

The example uses:

models/ssd_mobilenet_v1_1/ssd_mobilenet_v1_1.tflite

This is a Single Shot Detector (SSD) with MobileNet v1 as its backbone.

Why this model?

  • Fast – real-time capable even on embedded hardware
  • Lightweight – designed for edge devices
  • NPU friendly – fully quantized INT8 version runs efficiently
  • Widely used – ideal for demos, prototyping, education

SSD Model Outputs

The TFLite version provides three tensors:

  1. Bounding boxes Normalized coordinates (ymin, xmin, ymax, xmax)
  2. Class IDs
  3. Confidence scores

The example filters detections with confidence > 0.6 and draws boxes accordingly.


Example Output

After running the script, the result includes bounding boxes around detected objects, saved as:

output/result.jpg

Code Overview – End-to-End Inference Pipeline

This section walks through the main building blocks of image_detection.py so you can quickly adapt it for your own models.

1. Loading the Interpreter (with optional VX Delegate)

from tflite_runtime.interpreter import Interpreter
def load_interpreter(model_path: str, use_delegate: bool) -> Interpreter:
delegates = []
if use_delegate:
# Try to load VX delegate (NPU)
try:
from tflite_runtime.interpreter import load_delegate
vx_delegate = load_delegate('/usr/lib/libvx_delegate.so')
delegates.append(vx_delegate)
print("VX delegate loaded (NPU acceleration enabled)")
except Exception as e:
print(e)
print("Running on CPU fallback")
else:
print("Running inference on CPU (delegate disabled)")
from multiprocessing import cpu_count
interpreter = Interpreter(
model_path=model_path,
experimental_delegates=delegates,
num_threads=cpu_count(), # use all CPU cores if on CPU
)
return interpreter
  • If use_delegate=True and libvx_delegate.so is available, inference is offloaded to the NPU.
  • If the delegate cannot be loaded, it automatically falls back to CPU.

2. Preprocessing the Input Image

import cv2
import numpy as np
from PIL import Image
def resize_image(cv_image: np.ndarray, height: int, width: int) -> np.ndarray:
# Convert OpenCV BGR → RGB, then use PIL for resizing
color_converted = cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB)
pil_image = Image.fromarray(color_converted)
image_resized = pil_image.resize((width, height))
return image_resized

Usage inside the main script:

# Load image with OpenCV
cv_image = cv2.imread(INPUT_IMAGE)
# Get input tensor shape from the model
input_details = interpreter.get_input_details()
input_height = input_details[0]['shape'][1]
input_width = input_details[0]['shape'][2]
# Resize to model input size
image_resized = resize_image(cv_image=cv_image, height=input_height, width=input_width)
# Add batch dimension: (H, W, C) -> (1, H, W, C)
image_batch = np.expand_dims(image_resized, axis=0)
# Normalize if model expects float input
if input_details[0]['dtype'] == np.float32:
input_data = image_batch / 255.0
else:
input_data = image_batch

3. Running Inference

# Allocate model tensors once
interpreter.allocate_tensors()
# Warmup (optional, but nice for timing)
import time
warmup_start = time.time()
interpreter.invoke()
warmup_end = time.time()
print(f"Interpreter warmup time: {warmup_end - warmup_start:.2f} sec")
# Set input tensor and run inference
inference_start = time.time()
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
inference_end = time.time()
print(f"Inference time: {inference_end - inference_start:.3f} sec")

4. Postprocessing SSD Outputs

SSD models usually expose:

  • boxes: bounding boxes (ymin, xmin, ymax, xmax)
  • classes: class indices
  • scores: confidence values
output_details = interpreter.get_output_details()
boxes = np.squeeze(interpreter.get_tensor(output_details[0]['index']))
classes = np.squeeze(interpreter.get_tensor(output_details[1]['index'])).astype(int)
scores = np.squeeze(interpreter.get_tensor(output_details[2]['index']))
detections = []
for idx, class_id in enumerate(classes):
if scores[idx] > 0.6:
detections.append((class_id, scores[idx], boxes[idx]))
print("Detections (class_id, score, box):")
for det in detections:
print(det)

5. Drawing Bounding Boxes

def draw_bounding_box(cv_image: np.ndarray, detections: list,
color=(0, 255, 0), thickness: int = 2) -> np.ndarray:
image = cv_image.copy()
frame_height, frame_width, _ = cv_image.shape
for class_id, score, box in detections:
y1, x1, y2, x2 = box # normalized [0, 1]
# Scale back to image coordinates
x1 = int(x1 * frame_width)
x2 = int(x2 * frame_width)
y1 = int(y1 * frame_height)
y2 = int(y2 * frame_height)
top = max(0, np.floor(y1 + 0.5))
left = max(0, np.floor(x1 + 0.5))
bottom = min(frame_height, np.floor(y2 + 0.5))
right = min(frame_width, np.floor(x2 + 0.5))
cv2.rectangle(
image,
(int(left), int(top)),
(int(right), int(bottom)),
color,
thickness
)
label = f"{class_id}:{score:.2f}"
cv2.putText(
image, label,
(int(left), int(top) - 5),
cv2.FONT_HERSHEY_SIMPLEX,
0.5,
color,
1,
cv2.LINE_AA
)
return image

Usage:

result_image = draw_bounding_box(cv_image, detections)
os.makedirs(os.path.dirname(OUTPUT_PATH), exist_ok=True)
cv2.imwrite(OUTPUT_PATH, result_image)
print(f"Output saved at {OUTPUT_PATH}")

6. Command-Line Interface

The script is controlled via simple CLI flags:

import argparse
parser = argparse.ArgumentParser(description="TFLite inference on cube:evk")
parser.add_argument("--input", type=str, default="input/example.jpg",
help="Path to input image")
parser.add_argument("--output", type=str, default="output/result.jpg",
help="Path to save output image")
parser.add_argument("--no-delegate", action="store_true",
help="Run inference without VX delegate (CPU only)")
args = parser.parse_args()
MODEL_PATH = "models/ssd_mobilenet_v1_1/ssd_mobilenet_v1_1.tflite"
INPUT_IMAGE = args.input
OUTPUT_PATH = args.output
interpreter = load_interpreter(MODEL_PATH, use_delegate=not args.no_delegate)

You can now just run:

python image_detection.py --input input/example.jpg --output output/result.jpg
# or
python image_detection.py --input my.jpg --output my-result.jpg --no-delegate

Going Further

This example is intentionally small and easy to adapt. You can extend it to:

  • Process video streams from USB cameras
  • Run inference through ROS 2, incorporating results into cube:its
  • Perform multi-modal fusion (e.g., GNSS + AI detection → CPM messages)
Previous
Introduction