Hi all.
We've got a very weird case. Our configuration:
- main_computer: running roscore
- slave_A_computer (ROS_MASTER_URI=main_computer)
- slave_B_computer (ROS_MASTER_URI=main_computer)
- slave_C_computer (ROS_MASTER_URI=main_computer)
- ROS: indigo
- Ubuntu 14.04 (x86)
All computers are connected in one GigE network and synced using chrony. On main_computer we run https://github.com/ros-drivers/nmea_navsat_driver and on each slave computers we run GigE camera driver and record the image and gps topics (along with some other low bandwidth stuff (diagnostics, tf, etc)) with rosbag (using c++ program, i.e. rosrun rosbag record). Recordings on all 3 computers are done simulatenously but locally on each computer.
Now a mysterious thing that happens is that in every e.g. 1 out of 100 bags recorded on slave computers we do not get ONE of the gps topics (out of 4 that nmea_navsat_driver publishes). So for instance slave_A and slave_B have all the topics but on slave_C `/gps/fix` would be missing. This is all happening on a field robot during operation and where it is impossible to actually pause and debug.
So my question is how to debug an issue like this? Clearly the topic is being advertised and active since 2 computers get it. Also since the 3rd computer gets 3 of 4 topics from the nmea_navsat_driver the network link is up. Is it then a rosbag tool that is not able to build up all socket connections? Can I somehow constantly log a list of active topics for every computer?
thx upfront
↧