Learning to Navigate Inside the Human Body
Endovascular procedures are among the most challenging manual tasks in medicine. A surgeon must guide a thin wire through a tortuous vascular network using only 2D X-ray feedback, with millimetre precision and sub-second reaction times. Could a machine learn to do this?
The challenge
Reinforcement learning has solved games, locomotion, and manipulation. But endovascular navigation presents unique constraints:
- Safety — a single wrong move can perforate a vessel wall
- Partial observability — 2D projections lose depth information
- Sparse rewards — the goal is reached only after navigating the entire tree
- Continuous control — guidewire motion is smooth, not discrete
Our approach
We framed the problem as a continuous-control reinforcement learning task with shaped rewards based on proximity to the target vessel. The agent observes:
- Current and past X-ray frames
- Guidewire tip position (segmented automatically)
- Target vessel centreline (preoperative CT)
The action space is 3-DOF: tip advancement, rotation, and articulation.
Architecture choices
We found that recurrent policies outperformed feed-forward baselines by a significant margin. The temporal context of past frames helps the agent infer depth from motion parallax — a cue human surgeons use implicitly.
Lessons learned
- Reward shaping matters more than architecture — a well-designed dense reward can compensate for a simpler policy
- Evaluation is harder than training — we spent more time building the evaluation framework than the agent itself
- Clinical relevance requires clinical input — every design decision was validated with interventional radiologists
The full system is open-sourced as CathSim. We welcome contributions and feedback from the community.