Abstract

To address the bottlenecks of existing 3D perception technologies, such as high algorithm complexity, exorbitant hardware costs, and insufficient real-time performance, this paper innovatively proposes a lightweight 3D object property recognition method based on camera displacement. By employing a time-sliced focusing strategy and analyzing the objective laws of pixel changes during continuous camera displacement, combined with a dynamic region segmentation strategy along the vertical plumb line, the method achieves rapid identification and 3D modeling of high-convex, low-concave, and planar objects. Its core advantage lies in sharing camera data sources and computational cores with deep learning methods while supporting distributed collaborative operations. Unlike traditional deep learning-based approaches, the proposed method does not require pre-training of object attributes; instead, it determines 3D properties solely through the increasing, decreasing, or offset patterns of pixel differences, significantly reducing computational complexity. Experimental results demonstrate that the method reduces computational latency by approximately 60% while maintaining high recognition accuracy, making it effectively applicable to real-time scenarios such as blind navigation, mobile robotics, and autonomous driving. Additionally, the method can simultaneously resolve visual deception issues like mirror illusions and water reflections, providing a new technical pathway for intelligent perception systems while ensuring data credibility.

Main Contributions

A 3D object characteristic recognition method based on pixel variation is proposed, which requires no deep learning models or object attribute training, and directly judges the convex, concave, and planar properties of objects through the increasing/decreasing patterns of pixel differences;
A dynamic region segmentation strategy based on the plumb-line direction is designed, which adjusts the division ratios of the left, middle, and right regions in real time according to the camera pose (tilt angle), enhancing the dynamic adaptability of the method;
A segmented focusing method is presented, which dynamically adjusts the focus point during continuous camera displacement to ensure the comparability of consecutive frames and improve the accuracy of pixel variation calculation;
The pixel offset laws between different regions (left, middle, right) and different object characteristics (convex, concave, planar) are systematically analyzed, and a quantitative relationship between pixel variation and object distance is established;
The method can simultaneously solve visual deception problems such as mirror illusions and water surface reflections, providing robust perception capability for unmanned navigation systems.
# System Architecture and Core Principles
## 2.1. System Architecture Overview
The proposed image acquisition and 3D perception system comprises two primary subsystems:
hardware and algorithm. The overall architecture is illustrated in Figure 1:
1 camera, 2 rotating motors, 3 firmware for dual degree of freedom rotating mechanism, 4 gravity
sensor, 5 gyroscope, 6 fasteners, 7 integrated control center, 8 power supply

Hardware Subsystem

The hardware subsystem consists of the following core components:

Monocular Camera: Utilizes CMOS or CCD sensors with dynamic focusing capability. Specifications include 1920×1080 resolution, 30fps frame rate, and 90° Field of View (FOV);
Two-Degree-of-Freedom (2-DOF) Rotation Mechanism: Comprises horizontal rotation motors (pan) and vertical rotation motors (tilt), with rotation ranges of ±180° and ±60° respectively, and angular resolution of 0.1°. Flexible camera adjustment is achieved through arc-shaped brackets and connecting rods;
Attitude Sensor Module: Integrates gravity sensors and gyroscopes for real-time detection of camera tilt angle and angular velocity, with 100Hz sampling frequency. Sensor data is transmitted to the main control unit via I²C bus;
Main Control Unit: Employs ARM Cortex-A72 processor or NVIDIA Jetson Nano, integrating MCU/GPU for pixel difference analysis and real-time decision-making. Supports multi-threaded parallel processing to ensure algorithmic real-time performance;
Platform Body: Equipped with battery, motor drive module, and communication interface, enabling integration into mobile robots, assistive navigation devices, or autonomous vehicles.
### Algorithm Subsystem
The algorithm subsystem comprises the following modules:
Image Acquisition and Preprocessing Module: Responsible for continuous frame acquisition and preprocessing operations including denoising and normalization;
Dynamic Region Segmentation Module: Based on gravity sensor and gyroscope data, calculates the vertical plumb line direction in real-time, dividing the image into left, center, and right regions (Figure 2);

Figure 2. Dynamic Region Segmentation Module.

Pixel Change Calculation Module: Compares pixel differences between current and previous frames, extracting and stratifying color differences.
Object Feature Recognition Module: Determines object properties (convex, concave, planar) based on pixel change patterns;
3D Modeling and Localization Module: Combines object properties and distance information to construct 3D spatial models and output object positions and contours.

Core Principles

Pixel Change Pattern Analysis

When a camera moves continuously along a straight trajectory, different object types exhibit specific geometric patterns in the image. Let C denote the camera’s initial position, C1 the position after movement, and S the displacement vector. For an object in 3D space, its projection on the image plane changes with camera displacement.

Planar Objects: Vertical edges of planar objects undergo angular offset in consecutive frames. Specifically, vertical edges in the left region shift rightward, those in the right region shift leftward, and object height in the center region increases. The offset angle θ is inversely proportional to object distance S (Figure 3):

High-Convex Objects: The apex of convex objects remains invariant along the vertical plumb line direction across consecutive frames, while edges undergo parallel offset. Convex objects in the left region offset leftward, those in the right region offset rightward, and object height in the center region decreases. The offset magnitude ∆ p relates to protrusion height h and distance S (Figure 4):

Low-Concave Objects: Internal surfaces of concave objects expose new areas as the camera moves. The right edge of concave objects in the left region expands outward, the left edge in the right region expands outward, and object height in the center region increases with emerging internal details (Figure 5).

Dynamic Region Segmentation Strategy

Traditional image segmentation methods employ fixed grid divisions that cannot adapt to camera pose changes. The proposed dynamic region segmentation strategy uses the vertical plumb line detected by gravity sensors as a baseline to divide the image into three regions:
• Center Region (R_center): Centered on the plumb line, extending left and right. The center region occupies a smaller width (typically 5%-10% of image width) and primarily captures depth changes of objects directly ahead.
• Left Region (R_left) and Right Region (R_right): Located to the left and right of the center region, occupying the majority of image width.
When the camera rotates around the Y-axis, its optical axis forms an angle a with the platform’s forward direction. According to projective geometry principles, the width proportions of left, center, and right regions should be dynamically adjusted (Figure 6):

• When a < 90◦C, the right region width increases while the left region width decreases:
• When a > 90◦C, the left region width increases while the right region width decreases:

Segmented Focusing Method

During continuous camera displacement, objects at varying distances successively enter the
focusing range. To ensure inter-frame comparability, this paper proposes a segmented focusing
method (Figure 7):

Environmental Feature Analysis: Based on current scene depth distribution, focusing distances are divided into three intervals: near-field (0-2m), mid-field (2-5m), and far-field (5m and beyond).
Dynamic Focus Point Adjustment: Within each time segment t, the camera continuously focuses on a specific distance interval. When platform displacement exceeds threshold ∆S, focus switches to the next distance interval.
Inter-Frame Association: Focus distance and platform position (x, y) are recorded for each frame to ensure compared frames focus on the same spatial area, avoiding errors from depth-of-field changes.

A Lightweight 3D Object Feature Recognition Method Based on Camera Displacement and Pixel Change Analysis