TouchSteer: Grounding Natural Language in Tactile Perception via Steering Vectors
IEEE Transactions on Robot Learning (TRL) (submission), Under review, 2025
abstract
Tactile sensing provides robots with direct information about physical properties through contact, yet most existing methods describe tactile data using predefined attribute labels with limited semantic flexibility. Aligning tactile signals with human language enables richer, concept-level representations.
In this work, we propose a transformer-based tactile–language framework that structures the shared embedding space as a manipulable concept space using steering vectors. These vectors encode tactile properties as semantic directions, providing explicit semantic control under limited supervision. The framework supports two complementary tasks out of the same latent space: given a free-form natural language query describing desired tactile properties, the robot retrieves the most relevant material from its tactile experience; and after contacting a surface physically, the robot generates a natural language description of its tactile properties.
Experimental results show that the framework effectively retrieves tactile representations from free-form natural language and generates meaningful tactile descriptions grounded in tactile perception, supporting more effective human–robot interaction.
Motivation
Most tactile-learning pipelines describe contact data with a short list of fixed attribute labels (“smooth”, “rough”, “sticky”). This is convenient for classification but loses the richness of how people actually talk about touch. We want a shared space where free-form language and tactile signals live together, and where the semantic axes are manipulable rather than implicit.
Framework
A transformer-based tactile–language model built around steering vectors that encode tactile properties as explicit semantic directions. The shared latent space supports two complementary tasks out of the same backbone:
- Language → tactile retrieval. Given a natural-language description of a desired tactile property, retrieve the most relevant material from tactile experience.
- Tactile → language description. After physically contacting a surface, generate a natural-language description of what it felt like.