Applying Deep Learning for Autodocking Calibration on the Ohmni Robot
Deep learning is a powerful tool when applied to robotics. This article discusses how OhmniLabs uses deep learning for autodocking calibration in its telepresence robot.
The Ohmni telepresence robot has the capability of finding its docking station and moving to that station for recharging. Calibration between the camera image and the robot coordinates in the real world is an important step of autodocking. In order to function properly, one first has to find some keypoints of the robot base from the camera image. According to these points, the transformation from the camera image to the robot coordinates. For the Ohmni telepresence robots, there are 4 keypoints to use as depicted in the below figure.
Previously, the keypoints were chosen manually. This was not only time-consuming but also might cause pure calibration if the neck was tilted or lens type was changed. To offset this issue, rather than save all the coordinates of the keypoints ahead of time, Ohmni use an image-based algorithm that can help the robot detect the exact keypoints from the camera image and then dock accordingly.
Deep Learning-based Keypoints Detection Model
Most image-based methods often extract low-level visual features from keypoints or regions. Such low-level feature representations usually suffer from a lack of semantic interpretation, which means they cannot capture the high-level category appearance. To improve robustness, Ohmni can integrate external constraints such as CAD models or robotic kinematics but the image-driven approach is still central to provide robust and generalisable systems.
Deep learning has emerged as the method of choice for AI tasks such as computer vision, speech and natural language processing, etc. Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. By using deep learning models, one can avoid using the features that are designed by human engineers; instead, the models learn automatically from the raw data.
In order to take advantage of this approach, OhmniLabs extend the OpenPose model, which is an efficient method originally for multi-person pose estimation that uses Part Affinity Fields. The model is comprised of a set of deep neural networks that are in charge of jointly learning image features and localising the keypoints in the image. The architecture of the model is depicted in Figure 2. To take advantage of GPU/TPU parallel computing and serving of trained models, OhmniLabs implement the model in TensorFlow.
The model first extracts features from an input image using a pre-trained convolutional neural network. Then the image features are fed into two parallel branches of other convolution layers. The first branch predicts a set of confidence maps which are a matrix that stores the confidence the network has that a certain pixel contains a certain keypoint. Figure 3 shows an example of the confidence map of keypoints.
The second branch predicts a set of Part Affinity Fields (PAFs) which represents the degree of association between keypoints. PAFs are matrices that give information about the position and orientation of pairs. They come in couples: for each keypoint there is a PAF in the horizontal direction and a PAF in the vertical direction as illustrated in Figure 4. Successive stages are used to refine the predictions made by each branch.
Using the part confidence maps, bipartite graphs are formed between pairs of parts. Using the PAF values, weaker links in the bipartite graphs are pruned. Through the above steps, the keypoints and the skeleton of the robot base can be estimated correctly. Figure 5 illustrates an example of the estimated keypoints and the connections between the keypoints.
Normally, training a deep learning model requires a large amount of data. OhmniLabs collected thousands of images under various conditions so that the trained model would be extremely robust. Specifically, OhmniLabs use many types of robot to capture images in different environments such as floors, overall illuminations, camera types and camera tilt angle.
Moreover, OhmniLabs employed data augmentation techniques to increase the size of the dataset and to help the model generalise better. After a series of experiments, the final model obtains impressive results with an accuracy of approximately 98% in terms of the mAP score evaluated on the test data. Notice that one can continuously improve the model over time; the larger the amount of training data, the better the performance of the model.
Model Training and Serving Architecture
Creating deep learning models is only one part of the problem. The next challenge is to find a way to serve the models in production. The model serving system should be subjected to a large volume of traffic. It is important for OhmniLabs to ensure that the software and hardware infrastructure serving these models is scalable, reliable and fault-tolerant.
OhmniLabs decided to use TensorFlow Serving for model serving. Tensorflow Serving is written in C++, which supports serving of machine learning models. TensorFlow Serving treats each model as a servable object. It periodically scans the local file system, loading and unloading models based on the state of the file system and the model versioning policy. This allows trained models to be easily hot-deployed by copying the exported models to the specified file path while Tensorflow serving continues running. TensorFlow Serving comes with a reference front-end implementation based on gRPC, a high performance, open-source RPC framework from Google.
First, OhmniLabs train the TensorFlow models with cloud GPU instances. Once trained and validated, they are exported and published to Ohmni's model repository. Next, the development of a model serving network (MSN) that implements TensorFlow Serving. The MSN manages the job queues, pre-processing of images and post-processing of TensorFlow Serving predictions. It also load balances requests from Ohmni telepresence robot and manages the updating of models from the repository. OhmniLabs have generalised this model training and serving architecture to serve other models and data as part of OhmniLabs’ deep learning AI framework.
Conclusion on Deep Learning for Autodocking Calibration
The results demonstrate the power of deep learning for autodocking calibration. We find that OhmniLabs' model can tackle the problem of camera calibration efficiently. Furthermore, OhmniLabs also built an infrastructure for model training and serving that allows for continuous improvement of the deep learning models. This architecture is deployed as part of OhmniLabs’ deep learning AI framework.
To learn more about Ohmni Developer Edition, please visit: https://www.robots4good.com.au/robot-developer-platform
About Robots4Good
A leading provider of robots as a service, Robots4Good is the exclusive supplier of OhmniLabs robots and services in Australia and New Zealand for business, manufacturing, schools, hospitals, disability and aged care settings.