Overview

Mobilint NPU – LLM Inference Demo Container

About Advantech Container Catalog (ACC)

Advantech Container Catalog is a comprehensive collection of ready-to-use, containerized software packages designed to accelerate the development and deployment of Edge AI applications. By offering pre-integrated solutions optimized for embedded hardware, ACC simplifies the challenge of software-hardware compatibility, especially in GPU/NPU-accelerated environments.

Feature / Benefit	Description
Accelerated Edge AI Development	Ready-to-use containerized solutions for faster prototyping and deployment
Hardware Compatible	Eliminates embedded hardware and software package incompatibility
GPU/NPU Access Ready	Supports passthrough for efficient hardware acceleration
Model Conversion & Optimization	Built-in AI model quantization and format conversion support
Optimized for CV & LLM Applications	Pre-optimized containers for computer vision and large language models

The Mobilint NPU LLM Inference Demo Container provides a fully integrated, ready-to-run environment for executing various large language models (LLMs) locally on Advantech’s edge AI devices embedded with Mobilint’s ARIES-powered MLA100 MXM AI accelerator module.

Overview

This edge LLM demo container features a user-friendly web-based GUI that allows users to select from a list of pre-compiled LLMs without any command-line configuration. It is designed for quick evaluation and demonstration of ARIES’s NPU acceleration in real-world LLM workloads.

All required runtime components and model binaries are preloaded to ensure a smooth out-of-the-box experience. Users can test different models and parameters from the GUI without editing configuration files or entering commands.

Key Features

Browser-based GUI – Model selection and inference execution from a single dashboard
Pre-compiled model set – Includes INT8-quantized LLMs
Optimized Runtime Library – Hardware-accelerated inference for ARIES NPUs and Python and C++ backend integration for extended development

Mobilint NPU – LLM Performance Report

All LLM metrics are measured using GenAI-Perf by NVIDIA. The number of input tokens and output tokens were 240 and 10 respectively.

Model	Time To First Token (ms)	Output Token Throughput Per User (tokens/sec/user)
c4ai-command-r7b-12-2024	4,667.31	4.58
EXAONE-3.5-2.4B-Instruct	963.86	14.23
EXAONE-4.0-1.2B	329.37	31.62
EXAONE-Deep-2.4B	886.35	13.03
HyperCLOVAX-SEED-Text-Instruct-1.5B	435.50	22.46
Llama-3.1-8B-Instruct	4,430.71	5.81
Llama-3.2-1B-Instruct	430.56	30.73
Llama-3.2-3B-Instruct	1218.22	12.16

Environmental Prerequisites on Host OS

Hardware

The container is designed to demonstrate Mobilint NPU’s local LLM capabilities as embedded in AIR-310, Advantech’s edge AI hardware. Other compatible hosts include:

Mobilint MLA100 Low Profile PCIe Card

Software

Docker Engine ≥ 28.2.2
Mobilint SDK modules
- Pre-compiled Mobilint-compatible LLM binaries (.mxq)
- Mobilint ARIES NPU Driver
  - NOTE: To access the files and modules, please contact tech-support@mobilint.com.
  1. To verify device recognition, run the following command in the terminal: ls /dev | grep aries0 If the output includes aries0, the device is recognized by the system.
  2. For Debian-based operating systems, verify driver installation by running: dpkg -l | grep aries-driver If the output contains information about aries-driver, the device driver is installed.

Container Information

Directory Structure

├── backend
│   └── src
└── frontend
    ├── app
    │   └── components
    └── public
        └── fonts

Container Components

Mobilint Runtime Library (latest stable release)
Web-based GUI frontend (Next.js based)
Python LLM server backend (socket.io based)

Quick Start Guide

Install Docker

** Please refer to Mobilint repository in Advantech Container Catalog Github for detailed Quick Start Guide & scripts.

Follow the official Docker installation guide. After installation, add your user to the docker group by following the Linux post-installation steps.

Create Docker Network & Build Image

docker network create mblt_int
docker compose build

Run

docker compose up

Set Production Mode

This demo was originally designed for single-user demonstration purposes. However, you can enable multi-user functionality by setting up the production environment variable. To do this, copy backend/src/.env.example to backend/src/.env and make PRODUCTION="True". In production mode, changing the model will not be applied immediately. Instead, the server will automatically load the requested model for each LLM request as needed.

Change list of models

You can change the list of LLMs by editing backend/src/models.txt. These change will be applied when server is restarted.

Change text prompts

You can change system prompts without any docker rebuild by editing backend/src/system.txt and backend/src/inter-prompt.txt. The changes will be applied when the conversation is reset.

Run on background

docker compose up -d

Shutdown background

docker compose down

From the GUI, select a model from the list.
Interact with the loaded LLM as needed.
To troubleshoot unexpected errors, please contact tech-support@mobilint.com.

Advantech × Mobilint: About the Partnership

Advantech and Mobilint have partnered to bring advanced deep learning applications, including large language models (LLMs), multimodal AI, and advanced vision workloads, fully at the edge.

Advantech’s industrial edge hardware, integrating Mobilint’s NPU AI accelerators, provides high-throughput, low-latency inference without cloud dependency.

Preloaded and validated on Advantech systems, Mobilint’s NPU enables immediate deployment of optimized AI applications across industries - including manufacturing, smart infrastructure, robotics, healthcare, and autonomous systems.

License

Provided “as is” without warranties.