使用 Hailo async API 运行 YOLOv5s 模型（NV12 输入格式）

在本篇文章中，我们将展示如何使用 Hailo 提供的异步推理 API 运行 YOLOv5s 模型，并且处理输入格式为 NV12 的图像。我们将逐步讲解每个部分的代码，包括从内存分配到推理执行的全过程。整个代码以官方github代码为样例进行修改得到。

异步推理 API 概述

Hailo 提供的 异步推理 API 可以在 NPU 上异步执行多个推理任务，非常适合需要高吞吐量和低延迟的应用场景。在本例中，我们使用 NV12 格式的输入图像，运行 YOLOv5s 模型。

流程包括以下几个步骤：

内存分配与初始化
加载模型和配置
缓冲区映射
异步推理执行
处理推理结果

1. 内存分配与初始化

首先，我们定义了一个工具函数，用于以正确的对齐方式分配内存，确保高效的性能。该内存分配方式适用于支持 DMA（直接内存访问）操作的硬件。

static std::shared_ptr page_aligned_alloc(size_t size) {
#if defined(__unix__)
    auto addr = mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    if (MAP_FAILED == addr) throw std::bad_alloc();
    return std::shared_ptr(reinterpret_cast<uint8_t*>(addr), [size](void *addr) { munmap(addr, size); });
#elif defined(_MSC_VER)
    auto addr = VirtualAlloc(NULL, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    if (!addr) throw std::bad_alloc();
    return std::shared_ptr(reinterpret_cast<uint8_t*>(addr), [](void *addr){ VirtualFree(addr, 0, MEM_RELEASE); });
#else
#pragma error("Aligned alloc not supported")
#endif
}

这个函数保证了内存的分配方式符合硬件要求，可以支持 DMA 操作的顺利进行。

2. 加载模型和配置

接下来，我们加载预训练的 YOLOv5s 模型，并配置模型的批量大小以及输出格式。在本例中，我们设置批量大小为 1，因为每次推理我们处理的是一张图像。

auto vdevice = VDevice::create().expect("Failed create vdevice");
std::shared_ptr output_buffer;
const std::string filename = "image_nv12.yuv";
const size_t image_size = (640 * 640 * 3) / 2;
std::vector image_data = loadFromBinFile(filename, image_size);
auto infer_model = vdevice->create_infer_model("hefs/shortcut_net_nv12.hef").expect("Failed to create infer model");

infer_model->output()->set_format_type(HAILO_FORMAT_TYPE_FLOAT32);
infer_model->set_batch_size(BATCH_SIZE);

在此部分的代码中，我们：

创建了一个虚拟设备（VDevice），用于与 NPU 进行通信。
从文件中加载 YOLOv5s 模型。
将模型的输出格式设置为 float32，并配置批量大小。

3. 缓冲区映射

YOLOv5s 模型的输入是 NV12 格式，这种格式包括两个平面：Y平面和UV平面。我们为这两个平面分别分配内存，并将它们映射到虚拟设备的内存中，以便高效的数据传输。

for (const auto &input_name : infer_model->get_input_names()) {
    size_t input_frame_size = infer_model->input(input_name)->get_frame_size();
    const auto Y_PLANE_SIZE = static_cast(input_frame_size * 2 / 3);
    const auto UV_PLANE_SIZE = static_cast(input_frame_size * 1 / 3);
    
    auto y_plane_buffer = page_aligned_alloc(Y_PLANE_SIZE);
    memcpy(y_plane_buffer.get(), image_data.data(), Y_PLANE_SIZE);
    auto input_mapping_y = DmaMappedBuffer::create(*vdevice, y_plane_buffer.get(), Y_PLANE_SIZE, HAILO_DMA_BUFFER_DIRECTION_H2D).expect("Failed to map input buffer to VDevice");

    auto uv_plane_buffer = page_aligned_alloc(UV_PLANE_SIZE);
    memcpy(uv_plane_buffer.get(), image_data.data() + Y_PLANE_SIZE, UV_PLANE_SIZE);
    auto input_mapping_uv = DmaMappedBuffer::create(*vdevice, uv_plane_buffer.get(), UV_PLANE_SIZE, HAILO_DMA_BUFFER_DIRECTION_H2D).expect("Failed to map input buffer to VDevice");

    hailo_pix_buffer_t pix_buffer{};
    pix_buffer.memory_type = HAILO_PIX_BUFFER_MEMORY_TYPE_USERPTR;
    pix_buffer.number_of_planes = 2;
    pix_buffer.planes[0].user_ptr = reinterpret_cast<void*>(y_plane_buffer.get());
    pix_buffer.planes[1].user_ptr = reinterpret_cast<void*>(uv_plane_buffer.get());

    bindings.input(input_name)->set_pix_buffer(pix_buffer);
}

此代码确保了 Y 和 UV 平面分别被映射到模型的输入缓冲区，使数据能够顺利地传输到 NPU。

4. 异步推理执行

在配置完模型和缓冲区之后，我们可以开始异步执行推理任务。我们使用循环来执行多个推理任务，每个任务都会异步运行，不会阻塞程序。

for (uint32_t i = 0; i < BATCH_COUNT; i++) {
    auto start_time = std::chrono::high_resolution_clock::now();

    auto status = configured_infer_model.wait_for_async_ready(std::chrono::milliseconds(1000), BATCH_SIZE);
    if (HAILO_SUCCESS != status) {
        throw hailort_error(status, "Failed to wait for async ready");
    }

    auto job = configured_infer_model.run_async(multiple_bindings, [start_time, multiple_bindings, output_buffer] (const AsyncInferCompletionInfo &completion_info) {
        auto end_time = std::chrono::high_resolution_clock::now();
        auto duration_ms = std::chrono::duration_cast(end_time - start_time).count();
        std::cout << "Async inference completed in " << duration_ms << " ms" << std::endl;

        // Processing results (detections, bounding boxes)
    }).expect("Failed to start async infer job");

    job.detach();  // Detach to allow parallel execution
}

每次迭代都会启动一个异步任务，并通过 detach() 让任务并行执行，从而提高推理的吞吐量。

5. 处理推理结果

推理完成后，我们会在回调函数中处理结果，输出检测到的边界框和置信度分数。

uint8_t* buffer = output_buffer.get();
int num_detections = (int)*(float32_t*)buffer;
std::cout << "There are " << num_detections << " detections." << std::endl;
for (int i = 0; i < num_detections; i++) {
    hailo_bbox_float32_t detection = ((hailo_bbox_float32_t*)(buffer + offset))[i];
    std::cout << "Detection - y_min: " << detection.y_min
              << ", x_min: " << detection.x_min
              << ", y_max: " << detection.y_max
              << ", x_max: " << detection.x_max
              << ", confidence: " << detection.confidence << std::endl;
}

我们读取推理结果中的边界框数据并输出检测到的目标信息。

CMake 配置

为了编译这个项目，我们需要在 `CMakeLists.txt` 文件中进行配置，以便正确链接 Hailo 库和其他依赖项。以下是一个简化的 `CMakeLists.txt` 文件示例：

cmake_minimum_required(VERSION 3.0.0)
project(detection_app)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# Enable pkg-config support
find_package(PkgConfig REQUIRED)
# Use pkg-config to get the OpenCV flags
pkg_check_modules(OpenCV REQUIRED opencv4)

set(HAILORT_INCLUDE "/opt/poky/4.0.2/sysroots/armv8a-poky-linux/usr/include")
set(HAILORT_LIB "/opt/poky/4.0.2/sysroots/armv8a-poky-linux/usr/lib/libhailort.so")

add_executable(detection_app async_infer_advanced_example.cpp )

set(COMPILE_OPTIONS_CPP -Wall -O3 -g -std=c++11)

include_directories(${HAILORT_INCLUDE})
include_directories( ${OpenCV_INCLUDE_DIRS} )

target_link_libraries( detection_app PRIVATE ${OpenCV_LIBS} )
target_link_libraries( detection_app PRIVATE ${HAILORT_LIB} )
#target_link_libraries( detection_app PRIVATE opencv_core opencv_videoio opencv_imgproc opencv_imgcodecs)

你可以根据需要添加更多的依赖项，比如 OpenCV、或其他库，修改 `target_link_libraries` 来链接它们。

本文讲解了如何使用 Hailo 的异步推理 API 来执行 YOLOv5s 模型的推理。通过异步执行方式，我们能够提高推理的吞吐量，适用于实时或高性能的应用场景。

使用 Hailo async API 运行 YOLOv5s 模型（NV12 输入格式）

异步推理 API 概述

1. 内存分配与初始化

2. 加载模型和配置

3. 缓冲区映射

4. 异步推理执行

5. 处理推理结果

评论

发表回复取消回复

使用 Hailo async API 运行 YOLOv5s 模型（NV12 输入格式）

异步推理 API 概述

1. 内存分配与初始化

2. 加载模型和配置

3. 缓冲区映射

4. 异步推理执行

5. 处理推理结果

评论

发表回复 取消回复

发表回复取消回复