Yongji Fu 符永骥

I am Yongji Fu (符永骥), an MSc student in Robotics Engineering at the University of Bristol (2025.09 – 2026.10), advised by Nathan F. Lepora and Guanqun Cao. Before Bristol I received my BSc in Information Management and Information Systems from Chongqing University of Posts and Telecommunications.

Goal To build robotic and agentic systems that continuously learn and iteratively self-improve through interaction with the physical world.

Research Interests large-scale machine learning · world model for robot learning · continuous self-evolving agent · general-purpose loco-manipulation

yongji.fu7@gmail.com · GitHub · About

Publications

See all publications

Learning Realistic Expressions for Humanoid Face Robots

Yongji Fu, et al.

In preparation; target: ICRA 2026, In preparation, 2025

abstract · bibtex

Humanoid face robots need expressions that look natural to a human observer while remaining physically executable on a motor-driven face with bounded actuation. This work learns a mapping from target facial signals to the robot’s low-level actuator commands that reproduces human-like micro-dynamics — not just static key poses — and stays stable under the hardware’s mechanical limits. The pipeline covers data collection, retargeting, and a learned controller that balances visual fidelity against physical feasibility.

@unpublished{fu2025humanoidface,
  title={Learning Realistic Expressions for Humanoid Face Robots},
  author={Fu, Yongji and others},
  note={Manuscript in preparation; target venue: IEEE International Conference on Robotics and Automation (ICRA) 2026},
  year={2025}
}

Learning to Search, Searching to Learn: A Closed-Loop Framework for Large-Scale Vehicle Routing Problems

Yongji Fu, Yong Wang, Jun Deng, et al.

NeurIPS 2026 (submission), Under review, 2025

pdf · abstract · bibtex

Large-scale Vehicle Routing Problems (VRPs) face two long-standing difficulties. On the one hand, many scalable methods rely on partitioning, local candidate restriction, or staged decision making to control computation, which weakens their modeling of global structure. On the other hand, although many methods introduce search at test time to improve the final solution, search is still typically used only as a one-shot post-processing step after model prediction. The model makes a prediction, search repairs it, and little sustained feedback is formed between the two. Improved structural states are rarely fed back to the model for subsequent inference, and high-quality search solutions are seldom turned into later training supervision.

To address this issue, we propose LSL (Learning to Search, Searching to Learn), a closed-loop learning-search framework for large-scale VRPs. LSL first predicts search-friendly structural priors on a sparse candidate graph, and search then iteratively refines the current solution under the guidance of these priors. In turn, search does not leave the system after one round of refinement. At inference time, the structural states returned by search are fed back to the model for the next round of prediction, while at training time, multiple high-quality search solutions are reorganized into row-wise soft targets for model update. In this way, learning tells search where to explore, and search tells the model which structures are worth learning. Experiments show that LSL achieves strong scalability, efficiency, and solution quality across multiple large-scale VRP benchmarks.

@article{fu2025learning,
  title={Learning to Search, Searching to Learn: A Closed-Loop Framework for Large-Scale Vehicle Routing Problems},
  author={Fu, Yongji and Wang, Yong and Deng, Jun and others},
  journal={Submitted to NeurIPS 2026},
  year={2025}
}

TouchSteer: Grounding Natural Language in Tactile Perception via Steering Vectors

Guanqun Cao, Yongji Fu, Yi Zhou, Gaojie Jin, Zhenyu Lu, Shan Luo

IEEE Transactions on Robot Learning (TRL) (submission), Under review, 2025

pdf · abstract · bibtex

Tactile sensing provides robots with direct information about physical properties through contact, yet most existing methods describe tactile data using predefined attribute labels with limited semantic flexibility. Aligning tactile signals with human language enables richer, concept-level representations.

In this work, we propose a transformer-based tactile–language framework that structures the shared embedding space as a manipulable concept space using steering vectors. These vectors encode tactile properties as semantic directions, providing explicit semantic control under limited supervision. The framework supports two complementary tasks out of the same latent space: given a free-form natural language query describing desired tactile properties, the robot retrieves the most relevant material from its tactile experience; and after contacting a surface physically, the robot generates a natural language description of its tactile properties.

Experimental results show that the framework effectively retrieves tactile representations from free-form natural language and generates meaningful tactile descriptions grounded in tactile perception, supporting more effective human–robot interaction.

@article{cao2025touchsteer,
  title={TouchSteer: Grounding Natural Language in Tactile Perception via Steering Vectors},
  author={Cao, Guanqun and Fu, Yongji and Zhou, Yi and Jin, Gaojie and Lu, Zhenyu and Luo, Shan},
  journal={Submitted to IEEE Transactions on Robot Learning},
  year={2025}
}

Visuo-Tactile Latent World Models

Yongji Fu, et al.

In preparation; target: ICRA 2026, In preparation, 2025

abstract · bibtex

Contact-rich manipulation needs a world model that predicts not only what the scene will look like after an action but also what it will feel like at the fingertips. This work trains a visuo-tactile latent world model that jointly encodes vision and tactile signals into a shared latent space and rolls out future states in that space. The model supports planning and policy learning for tasks where contact events — slip, stick, sudden force changes — carry information that pure vision cannot see.

We evaluate on manipulation tasks that are ambiguous under vision alone, and show that adding the tactile channel to the latent dynamics improves both predictive accuracy and downstream task success.

@unpublished{fu2025visuotactile,
  title={Visuo-Tactile Latent World Models},
  author={Fu, Yongji and others},
  note={Manuscript in preparation; target venue: IEEE International Conference on Robotics and Automation (ICRA) 2026},
  year={2025}
}

Constructing Dynamic S-boxes Based on Chaos and Irreducible Polynomials for Image Encryption

Chunlei Luo, Yong Wang, Yongji Fu, et al.

Nonlinear Dynamics（Springer，JCR Q1，IF 6.0）, 2024

webpage · abstract · bibtex

Substitution boxes (S-boxes) lie at the heart of modern symmetric ciphers, and the cryptographic strength of a cipher depends heavily on the nonlinearity, differential uniformity, and unpredictability of its S-box. Static S-boxes, however, are vulnerable to algebraic and side-channel attacks once their structure is fixed. This work proposes a dynamic S-box construction scheme that combines a two-dimensional chaotic map with irreducible polynomials over the finite field .

The chaotic map provides key-dependent sensitivity and a vast parameter space, so that each key induces a distinct S-box; irreducible polynomials over supply algebraic structure that bounds worst-case cryptographic metrics. We conduct ablation experiments to evaluate the generated S-boxes on the standard battery of cryptographic criteria — nonlinearity, strict avalanche criterion (SAC), bit independence criterion (BIC), and differential / linear approximation probabilities — and on generation efficiency. The constructed S-boxes consistently meet or exceed the requirements for cryptographic use.

We further integrate the dynamic S-box into a block-cipher image encryption pipeline, and show on standard test images that it delivers strong pixel-level confusion and diffusion, near-uniform ciphertext histograms, high NPCR/UACI scores against plaintext-sensitivity attacks, and robust resistance to differential, statistical, and brute-force attacks.

@article{luo2024dynamic,
  title={Constructing Dynamic S-boxes Based on Chaos and Irreducible Polynomials for Image Encryption},
  author={Luo, Chunlei and Wang, Yong and Fu, Yongji and others},
  journal={Nonlinear Dynamics},
  year={2024},
  publisher={Springer}
}

Projects

See all projects

AURA: Autoresearch via Reflective Adaptation for Compound AI Systems

Inspired by Karpathy's *autoresearch* direction, AURA is a sample-efficient prompt optimizer for compound AI systems: after every rollout it hands the full trace back to the LLM and asks for one named edit to its own prompt. Across multi-hop QA, instruction following, and AIME-style math, AURA matches GRPO with up to 35× fewer rollouts and beats MIPROv2 by ~10 points on aggregate.

LLM
Prompt Optimization
Compound AI
Reflection
Autoresearch

Continually Learning Interactive Robot

An embodied agent that keeps expanding its behavior repertoire through ongoing human–robot interaction — new skills, new object concepts, and new language grounding are acquired online rather than baked in at training time.

HRI
Continual Learning
Multimodal
Agent

Real-time Packaging QA on a Hazardous-Explosive Production Line

Industrial vision QA system for a live hazardous-powder-explosive packaging line. ≥ 99% accuracy over a 30-day production run on the customer's RTX 4060; operator-level alarms via a fixed-protocol online-monitoring API.

Multi-scale Feat
Boundary Loss
Cython
TensorRT
Reparam

Blog

See all posts

Jun 3, 2026 博士研究设想
Jan 15, 2026 Muon Optimizer