Publications

2025

Learning Realistic Expressions for Humanoid Face Robots

Yongji Fu, et al.

In preparation; target: ICRA 2026, In preparation, 2025

Humanoid face robots need expressions that look natural to a human observer while remaining physically executable on a motor-driven face with bounded actuation. This work learns a mapping from target facial signals to the robot’s low-level actuator commands that reproduces human-like micro-dynamics — not just static key poses — and stays stable under the hardware’s mechanical limits. The pipeline covers data collection, retargeting, and a learned controller that balances visual fidelity against physical feasibility.

@unpublished{fu2025humanoidface,
  title={Learning Realistic Expressions for Humanoid Face Robots},
  author={Fu, Yongji and others},
  note={Manuscript in preparation; target venue: IEEE International Conference on Robotics and Automation (ICRA) 2026},
  year={2025}
}

Learning to Search, Searching to Learn: A Closed-Loop Framework for Large-Scale Vehicle Routing Problems

Yongji Fu, Yong Wang, Jun Deng, et al.

NeurIPS 2026 (submission), Under review, 2025

pdf · abstract · bibtex

Large-scale Vehicle Routing Problems (VRPs) face two long-standing difficulties. On the one hand, many scalable methods rely on partitioning, local candidate restriction, or staged decision making to control computation, which weakens their modeling of global structure. On the other hand, although many methods introduce search at test time to improve the final solution, search is still typically used only as a one-shot post-processing step after model prediction. The model makes a prediction, search repairs it, and little sustained feedback is formed between the two. Improved structural states are rarely fed back to the model for subsequent inference, and high-quality search solutions are seldom turned into later training supervision.

To address this issue, we propose LSL (Learning to Search, Searching to Learn), a closed-loop learning-search framework for large-scale VRPs. LSL first predicts search-friendly structural priors on a sparse candidate graph, and search then iteratively refines the current solution under the guidance of these priors. In turn, search does not leave the system after one round of refinement. At inference time, the structural states returned by search are fed back to the model for the next round of prediction, while at training time, multiple high-quality search solutions are reorganized into row-wise soft targets for model update. In this way, learning tells search where to explore, and search tells the model which structures are worth learning. Experiments show that LSL achieves strong scalability, efficiency, and solution quality across multiple large-scale VRP benchmarks.

@article{fu2025learning,
  title={Learning to Search, Searching to Learn: A Closed-Loop Framework for Large-Scale Vehicle Routing Problems},
  author={Fu, Yongji and Wang, Yong and Deng, Jun and others},
  journal={Submitted to NeurIPS 2026},
  year={2025}
}

TouchSteer: Grounding Natural Language in Tactile Perception via Steering Vectors

Guanqun Cao, Yongji Fu, Yi Zhou, Gaojie Jin, Zhenyu Lu, Shan Luo

IEEE Transactions on Robot Learning (TRL) (submission), Under review, 2025

pdf · abstract · bibtex

Tactile sensing provides robots with direct information about physical properties through contact, yet most existing methods describe tactile data using predefined attribute labels with limited semantic flexibility. Aligning tactile signals with human language enables richer, concept-level representations.

In this work, we propose a transformer-based tactile–language framework that structures the shared embedding space as a manipulable concept space using steering vectors. These vectors encode tactile properties as semantic directions, providing explicit semantic control under limited supervision. The framework supports two complementary tasks out of the same latent space: given a free-form natural language query describing desired tactile properties, the robot retrieves the most relevant material from its tactile experience; and after contacting a surface physically, the robot generates a natural language description of its tactile properties.

Experimental results show that the framework effectively retrieves tactile representations from free-form natural language and generates meaningful tactile descriptions grounded in tactile perception, supporting more effective human–robot interaction.

@article{cao2025touchsteer,
  title={TouchSteer: Grounding Natural Language in Tactile Perception via Steering Vectors},
  author={Cao, Guanqun and Fu, Yongji and Zhou, Yi and Jin, Gaojie and Lu, Zhenyu and Luo, Shan},
  journal={Submitted to IEEE Transactions on Robot Learning},
  year={2025}
}

Visuo-Tactile Latent World Models

Yongji Fu, et al.

In preparation; target: ICRA 2026, In preparation, 2025

abstract · bibtex

Contact-rich manipulation needs a world model that predicts not only what the scene will look like after an action but also what it will feel like at the fingertips. This work trains a visuo-tactile latent world model that jointly encodes vision and tactile signals into a shared latent space and rolls out future states in that space. The model supports planning and policy learning for tasks where contact events — slip, stick, sudden force changes — carry information that pure vision cannot see.

We evaluate on manipulation tasks that are ambiguous under vision alone, and show that adding the tactile channel to the latent dynamics improves both predictive accuracy and downstream task success.

@unpublished{fu2025visuotactile,
  title={Visuo-Tactile Latent World Models},
  author={Fu, Yongji and others},
  note={Manuscript in preparation; target venue: IEEE International Conference on Robotics and Automation (ICRA) 2026},
  year={2025}
}

2024

Constructing Dynamic S-boxes Based on Chaos and Irreducible Polynomials for Image Encryption

Chunlei Luo, Yong Wang, Yongji Fu, et al.

Nonlinear Dynamics（Springer，JCR Q1，IF 6.0）, 2024

webpage · abstract · bibtex

Substitution boxes (S-boxes) lie at the heart of modern symmetric ciphers, and the cryptographic strength of a cipher depends heavily on the nonlinearity, differential uniformity, and unpredictability of its S-box. Static S-boxes, however, are vulnerable to algebraic and side-channel attacks once their structure is fixed. This work proposes a dynamic S-box construction scheme that combines a two-dimensional chaotic map with irreducible polynomials over the finite field .

The chaotic map provides key-dependent sensitivity and a vast parameter space, so that each key induces a distinct S-box; irreducible polynomials over supply algebraic structure that bounds worst-case cryptographic metrics. We conduct ablation experiments to evaluate the generated S-boxes on the standard battery of cryptographic criteria — nonlinearity, strict avalanche criterion (SAC), bit independence criterion (BIC), and differential / linear approximation probabilities — and on generation efficiency. The constructed S-boxes consistently meet or exceed the requirements for cryptographic use.

We further integrate the dynamic S-box into a block-cipher image encryption pipeline, and show on standard test images that it delivers strong pixel-level confusion and diffusion, near-uniform ciphertext histograms, high NPCR/UACI scores against plaintext-sensitivity attacks, and robust resistance to differential, statistical, and brute-force attacks.

@article{luo2024dynamic,
  title={Constructing Dynamic S-boxes Based on Chaos and Irreducible Polynomials for Image Encryption},
  author={Luo, Chunlei and Wang, Yong and Fu, Yongji and others},
  journal={Nonlinear Dynamics},
  year={2024},
  publisher={Springer}
}