TfLite Inference Example
The cube:evk makes real-time AI at the edge simple and efficient. Powered by the NXP i.MX 8M Plus SoC with an integrated 2.3 TOPS NPU, it can accelerate neural-network inference directly on the device — without cloud round-trips, high latency, or heavy CPU load.
To help you get started quickly, we provide a complete, open-source example that demonstrates object detection with LiteRT (TensorFlow Lite) using hardware acceleration via the VX Delegate.
Example Repository
➡️ tflite-inference-example https://github.com/cubesys-GmbH/tflite-inference-example
This example shows:
- Loading and executing a TFLite model
- Using the NPU (VX Delegate) for acceleration
- Falling back to CPU-only inference
- Drawing bounding boxes on detected objects
- End-to-end inference pipeline for still images
It is designed as a minimal, readable starting point for your own perception workloads.
Getting Started
1. Clone the Repository
git clone https://github.com/cubesys-GmbH/tflite-inference-example.gitcd tflite-inference-example2. Create a Python Virtual Environment
This keeps all dependencies self-contained.
python3 -m venv venvsource venv/bin/activate3. Install Dependencies
pip install --upgrade pippip install -r requirements.txtThis installs:
- tflite-runtime (LiteRT)
- OpenCV
- Pillow
- Utility libraries used by the example
Hardware Acceleration: VX Delegate
The i.MX8MPlus NPU is accessed via the VX Delegate (libvx_delegate.so). Verify that the delegate is available on the cube:
ls /usr/lib/libvx_delegate.soIf not found, the example will automatically fall back to CPU inference.
Run the Example
Run inference with NPU acceleration:
python image_detection.py --input input/example.jpg --output output/result.jpgIf the VX delegate loads successfully, you'll see:
VX delegate loaded (NPU acceleration enabled)Force CPU-only inference:
python image_detection.py --input input/example.jpg --output output/result.jpg --no-delegateThe output image with bounding boxes is saved under:
output/result.jpgUnderstanding the Model
The example uses:
models/ssd_mobilenet_v1_1/ssd_mobilenet_v1_1.tfliteThis is a Single Shot Detector (SSD) with MobileNet v1 as its backbone.
Why this model?
- ✔ Fast – real-time capable even on embedded hardware
- ✔ Lightweight – designed for edge devices
- ✔ NPU friendly – fully quantized INT8 version runs efficiently
- ✔ Widely used – ideal for demos, prototyping, education
SSD Model Outputs
The TFLite version provides three tensors:
- Bounding boxes Normalized coordinates
(ymin, xmin, ymax, xmax) - Class IDs
- Confidence scores
The example filters detections with confidence > 0.6 and draws boxes accordingly.
Example Output
After running the script, the result includes bounding boxes around detected objects, saved as:
output/result.jpgCode Overview – End-to-End Inference Pipeline
This section walks through the main building blocks of image_detection.py so you can quickly adapt it for your own models.
1. Loading the Interpreter (with optional VX Delegate)
from tflite_runtime.interpreter import Interpreter
def load_interpreter(model_path: str, use_delegate: bool) -> Interpreter: delegates = []
if use_delegate: # Try to load VX delegate (NPU) try: from tflite_runtime.interpreter import load_delegate vx_delegate = load_delegate('/usr/lib/libvx_delegate.so') delegates.append(vx_delegate) print("VX delegate loaded (NPU acceleration enabled)") except Exception as e: print(e) print("Running on CPU fallback") else: print("Running inference on CPU (delegate disabled)")
from multiprocessing import cpu_count interpreter = Interpreter( model_path=model_path, experimental_delegates=delegates, num_threads=cpu_count(), # use all CPU cores if on CPU ) return interpreter- If
use_delegate=Trueandlibvx_delegate.sois available, inference is offloaded to the NPU. - If the delegate cannot be loaded, it automatically falls back to CPU.
2. Preprocessing the Input Image
import cv2import numpy as npfrom PIL import Image
def resize_image(cv_image: np.ndarray, height: int, width: int) -> np.ndarray: # Convert OpenCV BGR → RGB, then use PIL for resizing color_converted = cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB) pil_image = Image.fromarray(color_converted) image_resized = pil_image.resize((width, height)) return image_resizedUsage inside the main script:
# Load image with OpenCVcv_image = cv2.imread(INPUT_IMAGE)
# Get input tensor shape from the modelinput_details = interpreter.get_input_details()input_height = input_details[0]['shape'][1]input_width = input_details[0]['shape'][2]
# Resize to model input sizeimage_resized = resize_image(cv_image=cv_image, height=input_height, width=input_width)
# Add batch dimension: (H, W, C) -> (1, H, W, C)image_batch = np.expand_dims(image_resized, axis=0)
# Normalize if model expects float inputif input_details[0]['dtype'] == np.float32: input_data = image_batch / 255.0else: input_data = image_batch3. Running Inference
# Allocate model tensors onceinterpreter.allocate_tensors()
# Warmup (optional, but nice for timing)import timewarmup_start = time.time()interpreter.invoke()warmup_end = time.time()print(f"Interpreter warmup time: {warmup_end - warmup_start:.2f} sec")
# Set input tensor and run inferenceinference_start = time.time()interpreter.set_tensor(input_details[0]['index'], input_data)interpreter.invoke()inference_end = time.time()print(f"Inference time: {inference_end - inference_start:.3f} sec")4. Postprocessing SSD Outputs
SSD models usually expose:
boxes: bounding boxes (ymin, xmin, ymax, xmax)classes: class indicesscores: confidence values
output_details = interpreter.get_output_details()
boxes = np.squeeze(interpreter.get_tensor(output_details[0]['index']))classes = np.squeeze(interpreter.get_tensor(output_details[1]['index'])).astype(int)scores = np.squeeze(interpreter.get_tensor(output_details[2]['index']))
detections = []for idx, class_id in enumerate(classes): if scores[idx] > 0.6: detections.append((class_id, scores[idx], boxes[idx]))
print("Detections (class_id, score, box):")for det in detections: print(det)5. Drawing Bounding Boxes
def draw_bounding_box(cv_image: np.ndarray, detections: list, color=(0, 255, 0), thickness: int = 2) -> np.ndarray: image = cv_image.copy() frame_height, frame_width, _ = cv_image.shape
for class_id, score, box in detections: y1, x1, y2, x2 = box # normalized [0, 1]
# Scale back to image coordinates x1 = int(x1 * frame_width) x2 = int(x2 * frame_width) y1 = int(y1 * frame_height) y2 = int(y2 * frame_height)
top = max(0, np.floor(y1 + 0.5)) left = max(0, np.floor(x1 + 0.5)) bottom = min(frame_height, np.floor(y2 + 0.5)) right = min(frame_width, np.floor(x2 + 0.5))
cv2.rectangle( image, (int(left), int(top)), (int(right), int(bottom)), color, thickness )
label = f"{class_id}:{score:.2f}" cv2.putText( image, label, (int(left), int(top) - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 1, cv2.LINE_AA )
return imageUsage:
result_image = draw_bounding_box(cv_image, detections)os.makedirs(os.path.dirname(OUTPUT_PATH), exist_ok=True)cv2.imwrite(OUTPUT_PATH, result_image)print(f"Output saved at {OUTPUT_PATH}")6. Command-Line Interface
The script is controlled via simple CLI flags:
import argparse
parser = argparse.ArgumentParser(description="TFLite inference on cube:evk")parser.add_argument("--input", type=str, default="input/example.jpg", help="Path to input image")parser.add_argument("--output", type=str, default="output/result.jpg", help="Path to save output image")parser.add_argument("--no-delegate", action="store_true", help="Run inference without VX delegate (CPU only)")args = parser.parse_args()
MODEL_PATH = "models/ssd_mobilenet_v1_1/ssd_mobilenet_v1_1.tflite"INPUT_IMAGE = args.inputOUTPUT_PATH = args.output
interpreter = load_interpreter(MODEL_PATH, use_delegate=not args.no_delegate)You can now just run:
python image_detection.py --input input/example.jpg --output output/result.jpg# orpython image_detection.py --input my.jpg --output my-result.jpg --no-delegateGoing Further
This example is intentionally small and easy to adapt. You can extend it to:
- Process video streams from USB cameras
- Run inference through ROS 2, incorporating results into cube:its
- Perform multi-modal fusion (e.g., GNSS + AI detection → CPM messages)

