KD
← All work

RoadTwin · ROS2 · CUDA · Connected vehicles

Where camera, lidar, and V2X have to agree.

A perception and replay tool for connected-vehicle stacks. The premise: if your sensors disagree at an intersection, you do not get to ask the planner to be smart. RoadTwin makes that disagreement legible, replayable, and regressible.

+small obj

recall

4 streams

cam / lidar / V2X / IMU

replay

planner regression set

night / desync

modes

ROS2C++CUDAPyTorchTensorRTV2XFoxglove

01 · The scene

The intersection that taught me about disagreement.

A vehicle approaches a four-way intersection. Its onboard camera sees a pedestrian. The roadside lidar sees nothing. The V2X message from another vehicle reports an obstruction one second ago. Three of the most expensive sensors humanity has built, and they cannot agree on what is in front of them.

The planner does not get a tie-breaker. RoadTwin gives one.

02 · The system

A perception consistency layer.

RoadTwin runs as a ROS2 graph alongside the perception stack. It ingests the four streams, time-aligns them on the V2X clock, and runs a consistency model that scores agreement at the object level. Disagreements above a threshold raise a discordancemessage that the planner can subscribe to.

ROS2 — discordance topic/perception/objects/cam     -> /roadtwin/aligned/cam
/perception/objects/lidar   -> /roadtwin/aligned/lidar
/v2x/objects                -> /roadtwin/aligned/v2x

# Consistency model output
/roadtwin/discordance       std_msgs/Float32MultiArray
/roadtwin/recommend         roadtwin_msgs/PerceptionFix

Replay mode reads from a recorded bag and lets you toggle:

  • packet loss on V2X (uniform, bursty, geographic)
  • sensor desync (per-stream temporal offset)
  • night / low-light visual mode (camera fall-off)
  • small-object suppression (pedestrian / cyclist)

03 · The hard part

Time was harder than the model.

The model itself was a transformer over object embeddings — not exotic. The hard part was that none of the four streams agreed on what time it was. The roadside camera ran on NTP. The lidar ran on its own quartz. The V2X messages arrived stamped with a vehicle's GPS time, off by tens of milliseconds. We aligned everything on the earliest reliable monotonic clock and re-stamped on ingest.

Lesson

Half of perception is consensus. The other half is timekeeping.

04 · The result

Recall, replay, and regression.

+small obj

recall

Pedestrian / cyclist class, night mode

replayable

failures

Bag → toggle modes → re-emit

< 50ms

discordance latency

From frame ingest to flag

4 streams

time-aligned

Cam · lidar · V2X · IMU

05 · The artifact

Open the toggle, pick a failure mode.

  • /lab → Autonomy Garage — interactive intersection with packet loss / desync toggles.
  • Architecture diagram and ROS2 graph in the repo README.
  • Foxglove panel layout for live debugging shipped with the repo.

06 · The reflection

What I'd build next.

RoadTwin is currently single-vehicle plus roadside. The next iteration is a fleet of ego vehicles cross-validating each other's discordance flags as a swarm-perception signal. There is also a long-overdue port of the consistency model to TensorRT for embedded deployment.