Hercules: An Autonomous Logistic Vehicle for Contact-less ... · 1 Hercules: An Autonomous Logistic Vehicle for Contact-less Goods Transportation During the COVID-19 Outbreak Tianyu

1

Hercules: An Autonomous Logistic Vehicle forContact-less Goods Transportation During the

COVID-19 OutbreakTianyu Liu∗, Qinghai Liao∗, Lu Gan, Fulong Ma, Jie Cheng, Xupeng Xie, Zhe Wang, Yingbing Chen, Yilong

Zhu, Shuyang Zhang, Zhengyong Chen, Yang Liu, Yang Yu, Zitong Guo, Guang Li, Peidong Yuan, Dong Han,Yuying Chen, Haoyang Ye, Jianhao Jiao, Peng Yun, Zhenhua Xu, Hengli Wang, Huaiyang Huang, Sukai Wang,

Peide Cai, Yuxiang Sun, Yandong Liu, Lujia Wang, and Ming Liu

Fig. 1. A worker dressed in a protective suit is collecting goods from ourHercules logistic autonomous vehicle. There is no person-to-person contactduring the process of goods transportation. This photo was taken in Shenzhen,Guangdong, China, February 2020 during the outbreak of COVID-19.

Since December 2019, the coronavirus disease 2019(COVID-19) has spread rapidly across China. As at the dateof writing this article, the disease has been globally reportedin 100 countries, infected over 100,000 people and causedover 3,000 deaths. Avoiding person-to-person transmission isan effective approach to control and prevent the epidemic.However, many daily activities, such as logistics transportinggoods in our daily life, inevitably involve person-to-personcontact. To achieve contact-less goods transportation, using anautonomous logistic vehicle has become the preferred choice.This article presents Hercules, an autonomous logistic vehicleused for contact-less goods transportation during the outbreakof COVID-19. The vehicle is designed with autonomousnavigation capability. We provide details on the hardware andsoftware, as well as the algorithms to achieve autonomousnavigation including perception, planning and control. Thispaper is accompanied by a demonstration video and a dataset,which are available here1.

I. HARDWARE SYSTEM

The hardware system of our vehicle mainly consists of aDrive-by-Wire (DBW) chassis, an Industrial Personal Com-puter (IPC), sensors, and accessories. Fig. 2(a) shows the

∗ Equal Contribution1https://sites.google.com/view/contact-less-transportation

TABLE ITHE SPECIFICATIONS OF OUR DEVELOPED HERCULES LOGISTIC

AUTONOMOUS VEHICLE.

Items Values

Dimensions 3200×1300×1650 (mm)

Total Car Mass 600 KG

Maximum Payload 1000 KG

Maximum Capacity 3m3

Maximum Speed 35 Km/h

Maximum Distance per Charge 80 Km

Minimum Turning Radius 4 m

Battery Type LiFePO4

Battery Capacity 8.4 KWh @ 72 V

Charging Power 3 KW @ 220 V AC

Charging Time 5 H

Working Temperature -10◦C~50◦C

sensors on the 3-D model of our vehicle. Fig. 2(b) shows thetop view of the modules in the chassis.

The main modules of the chassis are those related tomotion control: Motor Control Unit (MCU), Electric PowerSteering (EPS), Electro-Hydraulic Brake (EHB) and ElectronicParking Brake (EPB). MCU supports both the speed-controland torque-control, with the input commands from the VehicleControl Unit (VCU). EPS controls the vehicle steering. Thecombination of MCU and EPS controls the vehicle longi-tudinal and lateral motions. EHB controls the brake, andEPB controls the parking brake. The Body Control Module(BCM) controls various electronic accessories in the vehiclebody, such as light and horn. VCU is the core of the DBWchassis, which is responsible for coordinating each module.It keeps communicating with the IPC, performing parameterchecking and sending commands to other modules. In addition,VCU responds to critical safety signals, such as the stoppingsignal from the emergency button. There are some sensorsin the chassis, such as a bumper, encoder and Tire PressureMonitoring System (TPMS). The bumper and TPMS are bothsafety-critical sensors. Specifically, the bumper is used to avoidcollisions. The encoder is a low-cost and effective sensor forodometry which measures the rotation of wheel precisely.

There are two batteries in our vehicle. A 12 V Lead-acid

arX

iv:2

004.

0748

0v1

[cs

.RO

] 1

6 A

pr 2

020

https://sites.google.com/view/contact-less-transportation

2

Main

GNSS Antenna

Main

LiDAR

4G&GNSS

Backup Antenna

Fish-eye camera (×4)

Ultrasonic (×16)LiDAR (×3)

Stereo

Camera

LED

Display

IMU Encoder (×2)

(a) The sensors on the 3-D model (with the cargo box removed).

BumperRemovable Lithium-ion

BatteryRear Axle

VCU

OBC

MCU

EPB

MotorEPS

EHBDCDCEncoder

(b) The sensors and control units (as annotated) on the mobile base.Fig. 2. The sensors used in our vehicle and the modules in the chassis. Notethat the cargo box is replaceable. It is not included in the figure.

starter battery and a 72 V removable lithium-ion battery witharound 9 KWh electricity, which can support the maximum80 Km running distance. The lithium-ion battery powersthe chassis, IPC, sensors and accessories. It has a built-inBattery Management System (BMS) to monitor and managethe battery. The removable design allows the vehicle to operateat 24 hours a day without stopping for a recharge. An On-Board Charger (OBC) with a Direct Current Direct Current(DCDC) converter takes about 5 hours to fully recharge thebattery. The detailed specifications of our vehicle are displayedin Tab. I.

The IPC contains an Intel i7-8700 CPU with 6 cores and12 threads, 32 GB memory, and a 1050Ti NVIDIA Graphicscard. It is able to run deep learning-based algorithms. Theaccessories mainly consist of a 4G/5G Data Transfer Unit(DTU), Human Machine Interface (HMI), LED display, re-mote controller and cargo box. The DTU allows the IPC toconnect to our cloud management platform via the Internet.The cargo box is replaceable.

II. SOFTWARE SYSTEM

Fig. 3 shows the software architecture of our autonomousvehicle. It can be generally divided into two parts: softwaresystem running on the vehicle, and software system runningon the cloud.

A. Software Platforms on the Vehicle

There are three main computing platforms on the vehicle:the IPC, Electronic Control Unit (ECU) and BCM. The IPC isused to run algorithms for autonomous navigation. The ECU isused to ensure the safety of the vehicle by energy managementand safety diagnosis. It has three main functions: the uniteddiagnostic services, driving torque coordination control, andthe distribution and brake energy recovery system. The uniteddiagnostic services are a series of applications checking andmonitoring the mechanical and control systems. The drivingtorque coordination control is an application that dynamicallydistributes motor torque according to the vehicle and roadstatus. The brake energy recovery is an application aimedat saving power and improving the travel distance. Theseapplications run on the Real-Time Operating System (RTOS),which satisfies the real-time requirements.

The BCM connects the IPC and ECU. It also runs onthe RTOS, which meets the real-time requirements. It detectsthe communication between the nodes of the CAN networkby heartbeat protocols. When major nodes of this networkexperience an outage or crash, the BCM stops transmittinghigh-level CAN signals from the IPC to the VCU and waitsfor human interventions.

The sensors on the vehicle are synchronized by a 1 Hz Pulseper Second (PPS) signal from the external GNSS receiver.The IPC receives data from the sensors and uses them atdifferent frequencies. For example, the state estimation moduleupdates at 100 Hz, providing localization results for thesystem. The Lidar object detection module runs at 10 Hzaccording to the refresh rate of the Lidar. All the modules aredeveloped on the Robot Operating System (ROS) to facilitatedata communication.

B. Software Platforms on the Cloud

The software platforms on the cloud mainly include themap server, the scheduling server and the log and simulationserver. The map server stores the pre-built map for the tra-versed environments. The scheduling server performs the taskallocations and collects the status of every registered runningvehicle. It also plays the role of accessing the map data forrouting, transmitting sensor data into the map server, recordingthe key information into the log server, and replaying the datarecorded for a good trace-back. The log and simulation serverruns the end-to-end simulator Carla2 and 51Sim-One3. Theclock synchronization between the platforms on the vehicleand cloud is manipulated based on the network time throughthe Network Time Protocol (NTP).

III. PERCEPTION

Perception serves as the fundamental component of au-tonomous navigation. It provides necessary information forplanning and control. This section describes some of the keyperception technologies used in our vehicle.

2Available at: http://carla.org/3Available at: http://www.51hitech.com/

http://carla.org/

http://www.51hitech.com/

3

Fig. 3. The software architecture of our vehicle. The part shown in the yellow box is running on the vehicle, and the part shown in the blue box is runningon the cloud. The figure is best viewed in color.

Multi-LiDAR Input

Fea

ture

Lea

rnin

g N

etw

ork

Mid

dle

Lay

ers

F

3D Boxes

Reg

ion P

roposa

l N

etw

ork

3D Object Detector

Fig. 4. The architecture of the 3-D object detection using Lidar point clouds. The inputs are multiple point clouds captured by synchronized and well-calibratedLidars. The 3-D object detector contains mainly three components: a feature learning network (FLN) for feature extraction; middle layers (ML) for featureembedding with sparse convolution; and region proposal network (RPN) for box prediction and regression. The final outputs are a series of 3-D boundingboxes: blue cubes denote the detection results, and the gree cubes denote the ground-truth boxes.

A. 3-D Object Detection with Lidar Point Clouds

The 3-D object detection aims to recognize and classify ob-jects, as well as estimate their poses with respect to a specificcoordinate system. We use multiple Lidars for object detection.To exploit the information from multiple Lidars, it is importantto calibrate them. To this end, we propose a marker-basedapproach [1] for automatic calibration without any additionalsensors and human intervention. We assume that three linearlyindependent planar surfaces forming a wall corner shape areprovided as the calibration targets, ensuring that the geometricconstraints are sufficient to calibrate each pair of Lidars. Aftermatching the corresponding planar surfaces, our method cansuccessfully recover the unknown extrinsic parameters withtwo steps: a closed-form solution for initialization based onthe Kabsch algorithm [2] and a plane-to-plane iterative closestpoint (ICP) [3] for refinement.

The overview of our 3-D object detection is shown in Fig. 4.The inputs to our approach are multiple point clouds capturedby different Lidars. We adopt an early-fusion scheme to fusethe data from multiple calibrated Lidars at the input stage.With the assumption that the Lidars are synchronized, wetransform the raw point clouds captured by all the Lidarsinto the base frame, and then feed the fused point clouds intothe 3-D object detector. The final output is a series of 3-Dbounding boxes. Each bounding box b ∈ B is parameterizedas [c,x,y,z,w, l,h,γ], where c is the class of a bounding box,and [x,y,z] denotes the bottom center of a box. We employVoxelNet [4] as the 3D object detector, which consists of threemain components: a Feature Learning Network (FLN) forfeature extraction; Middle Layers (ML) for feature embeddingwith sparse convolution; and a Region Proposal Network(RPN) for box prediction and regression.

4

3D Lidar Point Cloud

IMUAccelerometer

& Gyroscope Readings

Initialization

Lidar-inertial Odometry

Undistort

Point Cloud

Feature Extraction

& Local Map

Management

Propagation &

Pre-integration

Joint Non-linear

Optimization Undistorted

Point Cloud &

Lidar Pose

Rotationally

Constrained

Mapping

Refined Global

Map & Lidar Pose

Fig. 5. The schematic diagram of our 3-D point-cloud mapping system. Afterthe initialization, the system will estimate the states and refine the global mapand Lidar poses in the odometry and mapping sub-modules, respectively.

1) Feature Learning Network: The FLN encodes the sparsepoints into a dense feature representation. In [4], the authorsfirst voxelize the input point clouds and then apply a bunch ofVoxel Feature Encoding layers (VFELayers). The VFELayeris composed of multi-layer perceptrons and a max-poolinglayer. In the max-pooling layer, the point-wise features insidea voxel are aggregated into voxel-wise features, which will beforwarded to the ML for feature embedding.

2) Middle Layers: The ML converts the 3-D dense grid toa 2-D Bird’s-Eye-View (BEV) representation. To process thegrids, the ML is implemented with several 3-D convolutionlayers, where the 3-D convolutional operations, BN layer, andReLU layer are applied sequentially. The convolution kernelsize is set as [d,1,1], since we denote the dense grids in theorder of z,x,y. As a result, these layers can aggregate voxel-wise features within a progressively expanding receptive fieldalong the z-axis, as well as enrich the shape description.

In [4], the ML costs almost 50% of the computation of theentire network due to the dense 3-D convolutional operations.The non-empty voxels usually occupy ≤ 1% of all grids.However, in the process of dense convolution, calculations arerequired on each voxel. Inspired by [5], we utilize the spatiallysparse convolution [6] to replace the dense convolution layersin the basic ML. Hence, the inference time is boosted fromapproximately 250 ms to 56 ms even with a larger samplingresolution. Additionally, we add the DropBlock [7] to ran-domly deactivates patches of the feature map during training.This forces the network to search for more evidence to detectan object with several invisible parts.

3) Region Proposal Network: The RPN applies convolu-tional layers to predict the probability and regression mapfrom the extracted features. Since the increasing invarianceand large receptive fields of top-level nodes yield smoothresponses and cause inaccurate localization, the RPN adoptsskip-layers [8] to combine high-level semantic features withlow-level spatial features [4]. Each entry of the probabilitymap estimates the probability of the objectiveness, and oneentry in the regression map regresses the pose and size of abounding box. We jointly estimate (sinθ ,cosθ) instead of θ

to avoid the angular singularity problem described in [9].To train the network, we adopt the classification and regres-

sion loss [10] to train the stage-2 network:

L= Lcls +Lreg

= η ∑i

fcls(pi)+∑i

freg(ui,u∗i ),(1)

Current Frame

Last FrameStatic Objects

Moving Objects

Distances >

threshold?

Yes

No

Warped

Point Cloud

Odometry

Fig. 6. The schematic diagram of our moving-object detection system. Thewhite point cloud and the blue point cloud represent the detected objects inthe current frame and the last frames, respectively.

where pi represents the posterior probability of objectiveness,and ui and u∗i ∈ R7 is the regression output and ground truthfor the positive anchors. Lcls is defined as the classificationloss, fcls(·) denotes the focal loss [11], and Lreg is defined asthe normalized regression loss where we use the SmoothL1function to calculate the loss freg(·).

B. 3-D Point-cloud Mapping

The 3-D point-cloud mapping aims to build the 3-D mapof the traversed environments. Fig. 5 shows the diagram ofour mapping system. The inputs to the system are the rawdata from the IMU and 3-D Lidar (i.e., accelerometer andgyroscope readings from the IMU and point clouds from the3-D Lidar). The system starts with an adapted initializationprocedure, followed by two major sub-modules: the Lidar-inertial odometry and rotationally constrained mapping. Sincethe vehicle usually remains still at the beginning of mapping,we do not need to excite the IMU to initialize the moduleas described in [12], which is more suitable for hand-heldapplications. With the stationary IMU readings, the initialorientation for the first body frame FB0 can be obtained byaligning the average of the IMU accelerations to the oppositeof the gravity direction in the world frame FW . The initialvelocity vB0 , and IMU biases ba0 ,bg0 are set to zero. The stateestimation problem in the Lidar-inertial odometry sub-moduleis aimed at optimally fusing Lidar and IMU measurementsin a local window. The state X in our odometry contains thebody pose TW

Bi, velocity vW

Biof the IMU in FW , accelerometer

and gyroscope biases bai ,bgi in FBi , and extrinsic parameterbetween the Lidar and IMU TL

B.First, with new current IMU readings, we can propagate the

previously estimated IMU pose to the current predicted IMUpose TW

Bi, and pre-integrate the IMU measurements. Trans-

formed by TLB, the predicted Lidar pose TW

Lican be obtained

from TWBi

. The Lidar point cloud is undistorted by linearinterpolation between TW

Liand the previously optimized Lidar

pose TWLi−1

. Similar to [13], Lidar features are extracted fromeach sweep. Using TW

Li, we can associate feature points to these

in the local map and form the relative Lidar measurementsaccording to [12]. With the pre-integration and relative Lidarmeasurements, we can write the state estimation problems in

5

Localization

Point cloud

IMUOdometry

...

Sensory data

Route planning Global pathStart, destinationRoad map

High resolution map

CurbsNon-curb obstacles

Tranversable area

Static obstaclesMoving obstaclesObstacle detection

and tracking Trajectory prediction

Traffic informationRoad information

Trajectory

Kinematic,

constraintsdynamic

Obstacles' trajectories

Driving maneuverEgo vehicle pose

Motion planning

Local path planning

Behavioral planning

Fig. 7. The schematic diagram of planning for our autonomous vehicle. The planning process consists of four layers: global planning, behavioral planning,path planning and motion planning. The four layers are coloured in pink.

a fixed-size window of N as:

argminX

12

{‖rP(X)‖2 + ∑

m∈mLα

α∈{i+1,··· ,i+N−1}

‖rL(m,X)‖2Cm

Lα

+ ∑β∈{i,··· ,i+N−2}

∥∥∥rB(zβ

β+1,X)∥∥∥2

CB

β

Bβ+1

} , (2)

where rP(X) is the prior term from marginalizing outold states, rL(m,X) is the relative Lidar measurement, andrB(z

β

β+1,X) is the pre-integration term. The optimized X canbe obtained by solving (2), using the Levenberg-Marquartalgorithm. After the optimization, the state for propagationand local map can be updated. In our odometry sub-module,the window-size is used to limit the number of parameters.Although the odometry sub-module can make the system alignwith gravity well, it lacks a larger global map to constrain thelocal map. Thus, after a long time, the system may drift byusing only the local constraints.

To reduce the drift, we will refine the undistorted point cloudand Lidar pose to an incremental global map. Following thenormal mapping module from [13], we can write the globalmap problem as

CM = ∑m∈mL

∥∥rM(m,TWL )∥∥2. (3)

By registering a point cloud to the global map, we can achievea more consistent map than using only local window Lidar-inertial odometry. However, if we only use the pose fromthe odometry as a prior to minimize 3, we would wastethe property of the orientation of the Lidar-inertial odometrythat is aware of the gravity direction. Therefore, we proposerotationally constrained mapping. At the optimization, we willapply a modified Jacobian JC

θθθ zaccording to the confidence of

the orientation estimation and update the Lidar pose as statedby [12].

C. Ego-motion Compensated Moving-object DetectionMoving objects are very common in traffic environments.

Detection of moving objects is crucial for the motion planning

of our autonomous vehicle in dynamic environments [14].Based on the aforementioned object detection and odometryestimation, we propose an ego-motion compensated moving-object detection system. The schematic diagram is shown inFig. 6. The input to our moving-object detection system isconsecutive point-cloud frames with object detection resultsincluding both moving and static ones. We first use theaforementioned Lidar-inertial odometry estimation algorithmto find the transformation between the two frames. Then, thelast point-cloud frame is transformed to the current coordinateframe with the transformation matrix. We use object trackingto associate the bounding boxes across the two frames. Themoving objects are detected by subtracting the correspondingbounding boxes from the warped and current point-cloudframes. Specifically, we set a threshold for the distance be-tween the bounding boxes. Moving objects are detected withdistances above the threshold. As shown in Fig. 6, there arethree detected objects: a cyclist and two cars. With ego-motioncompensation and frame differencing, a moving cyclist and amoving vehicle are detected.

IV. PLANNING

Planning enables autonomous vehicles to acquire futurepaths or motions towards the destination. The planning forautonomous driving is challenging, because traffic environ-ments are usually with dynamic objects, bringing about risksand uncertainties [15]. Autonomous vehicles are required tointeract with various road participants, including cars, motor-cycles, bicycles, pedestrians, etc. The planner needs to meetthe vital requirements of safety, and the kinematic and dynamicconstraints of vehicles, as well as the traffic rules. To satisfythese requirements, our planning is hierarchically divided intofour layers: route planning, behavioral planning, local pathplanning, and motion planning. Fig. 7 shows the four-layerplanning process.

A. Route Planning

The route planning aims at finding the global path fromthe global map. For autonomous driving, the route planner

6

(a) Global route and local planning (b) Planned path and trajectory

Fig. 8. (a) An example for global planning and local planning inside an industrial estate. The red lines show the global path and the blue lines show thecandidate path. The online local path planning examples use the global path as the reference. The dashed lines show the global path. The bold lines showthe planned local path and the dots show the samples for path planning. The autonomous vehicle is represented as the rectangle. (b) An example for onlinemotion planning given the local path as the reference. The green trace shows the planned path. The yellow curve shows the planned trajectory.

typically plans a route given the road network. For structuredenvironments with clear road maps, we can use path planningalgorithms, such as A*, to find the route. However, in unstruc-tured environments, such as an industrial estate or housingdistrict, we collect the past drivable routes as the teachingprocess. Fig.8(a) shows the global route established inside anindustrial estate. The red lines show the global path and theblue lines show the candidate path as an alternative.

B. Behavioral Planning

Behavioral planning decides the manoeuvres for local navi-gation. It is a high-level representation of a sequence of vehiclemotions. Typical manoeuvres are lane keeping and overtaking.This layer receives information from the global maps andfinds the category of the local area to give specifications onpath planning. For example, unstructured environments, likeparking lots, have different requirements on planning. Giventhe road map and the localization of the ego-vehicle, featuresof the local area can be obtained. As shown in Fig. 7, roadinformation that indicates the road environment classificationof the global path segments is helpful for behavioral planning.Furthermore, traffic information from traffic signs helps inmaking decisions. The road and traffic information togetherwith the estimation of other moving agents allows the behav-ioral planner to follow or overtake the front car, or pull overthe ego-vehicle.

C. Local Path Planning

The local path planning generates a geometric path from thestarting pose to the goal pose for the vehicle to follow. Thetime complexity of this process increases with increased pathlength, so it is often limited to a local range to ensure real-time planning. The local path planner needs to tackle motion

constraints of the vehicle to generate collision-free paths thatconform to the lane boundaries and traffic rules. Fig. 7 showsthe online local path planning for driving on standard roads.Here we plan the path in the Frenet coordinate system. Withthe global path as the reference path, it defines the lateral shiftto the path and the distance travelled along the path from thestart position. We drew multiple samples with different speedsand lateral offsets. Then a graph search method is adopted tosearch the path with the minimum cost. To define the cost ofthe coordinates of each curve, we take into consideration thequality of the curve, ending offset to the global path, and otherfactors (e.g., the potential trajectories of other agents).

D. Motion Planning

Given the planned path, motion planning is the final layerwhich optimizes the trajectory with dynamic constraints fromthe vehicle, the requirements for comfort and energy con-sumption. The planned trajectory specifies the velocity andacceleration of the vehicle at different timestamps, so it isalso called trajectory planning. Though the path planned inthe Frenet frame contains speed information, the dynamicconstraint of the vehicle is not yet considered. Besides this, thelocal planning process is time-consuming and has a low updaterate, which is inadequate to handle dynamic obstacles andemergency cases. The motion planner optimizes the trajectorygiven the information of obstacles, the constraints from thevehicle, and the path from the local path planner. It outputs thefinal trajectories for the controller to follow at a much higherupdating rate to ensure safety. Fig. 8(b) shows the plannedtrajectory as yellow curves. The planned path is shown asthe green trace. The planned trajectory differs from the pathin the first segment, which is critical for collision avoidance.The motion planner shows more strict requirements for safedistances to ensure safety.

7

V. CONTROL

The main task of vehicle control is to track the planned tra-jectory. In the past decade, many trajectory tracking controllershave been developed, among which the Model PredictiveController (MPC) [16] is the most popular one. The schematicdiagram of our controller is shown in Fig. 9.

As we can see, there are two inputs to the trajectorytracking controller. One is the trajectory s(t), which includesthe information (e.g., desired coordinates, curvatures, speed)from the motion planner, the other is the feedback informationx′(t) from the state estimator. Sometimes, sensor feedbackfrom the chassis cannot be directly sent to the controller,or more feedback quantities are required by the controller,which is difficult to obtain from sensors. In such cases, a statefeedback estimator is required but is not a must. In Fig. 9,the output of the trajectory tracking controller u(t) is sent tothe chassis after being processed by a lower controller. Thelower controller can work for many purposes. For example, ourautonomous vehicle can work in both the autonomous-drivingmode and the parallel-driving model (i.e., the remote controlmode). The trajectory tracking controller only functions inthe autonomous-driving mode, which means that only in theautonomous-driving mode the lower controller takes as inputu(t).

The unmanned-car control can be divided into the lat-eral control, which controls steer angles, and the longitu-dinal control, which controls the car speed. There are twotypes of MPCs in the area of the autonomous vehicle. Oneis kinematics-based while the other is dynamics-based. Akinematics-based MPC is a combined controller that integratesthe lateral control and longitudinal control. Therefore, thelongitudinal PID controller highlighted in the dashed box inFig. 9 is not required. The vector of two control quantitiesu(t) (i.e., steer angle and speed), will be directly given bythe MPC. However, the dynamics-based MPC is a standalonelateral controller of which the output is a control quantity ofthe steering angle. In such a case, a longitudinal PID controllerthat outputs the speed control quantity will be required. Andthe outputs of these two controllers constitute u(t).

No matter whether it is kinematic or dynamic, an MPCis essentially composed of three key parts: the predictivemodel, rolling optimization, and feedback correction, as shownin Fig. 9. The main function of the predictive model is topredict the future output of the controlled plant based onthe simplified plant model, historical information and futureinput. The goal of rolling optimization is to determine thefinal control quantities through the optimization of an objectivefunction. Rolling optimization must be repeatedly carried outin real-time. Feedback information is used to correct thepredictive results based on the car feedback information so asto process rolling optimization in the next period. The majordifference between the kinematics-based MPC and dynamics-based MPC is the simplified vehicle model. The vehiclekinematics model is totally geometry-based, which takes thevehicle as a two-wheeled bicycle model, namely the Ackermanmodel. It applies while the vehicle is operated at low speed.The dynamic-based MPC is much more accurate than the

SimplifiedCar Model

PredictiveModel

Extend to Predictive time

domain

Objective Function& Constraint Conditions

Optimal Solution

MPC Controller

Longitudinal PID Controller

+

Trajectory Tracking Controller

Motion Planner

s(t)Lower

Controller

u(t)

Unmanned CarState Estimatoru'(t)x(t)

x'(t)

Fig. 9. The schematic diagram of our controller. The main component is thetrajectory tracking controller.

kinematic-based MPC, and is derived from force analysis.However, the model complexities increase as well, leading to amore complicated optimization process and reduced real-timeperformance.

VI. EVALUATION

This section describes the real tasks of contact-less goodstransportation using our vehicle during the outbreak ofCOVID-19 in China. From Feb. 2, 2020 to Mar. 7, 2020, ourvehicles were deployed in 3 different places (Zibo, Shandong;Suzhou, Jiangsu; Shenzhen, Guangdong) in China. The totalrunning distances of each vehicle reached 1100 Km. Thedetails of the transportation tasks are summarized in Tab. II.Selected demonstration photos during the tasks are displayedin Fig. 10. Note that in case of failures during the autonomousnavigation, such as unavoidable accidents and system errors,we build a parallel driving system for our vehicle to back upour navigation. The parallel driving system is a remote controlsystem based on 4G/5G technology. The vehicle control isimmediately taken over by a human driver in case of anyfailure. We expect less human intervention during our goodstransportation tasks. The vehicle is evaluated by the numberof occurrences of human interventions.

A. Tasks in Zibo, Shandong

We first deploy our vehicle in Zibo, Shandong due to theurgent requirement of a local logistics company. The tasksare to deliver fresh vegetables from a supplier to quarantinedresidential areas. There are two routes for the tasks, both ofwhich have good traffic conditions (e.g., very few cars runningon the roads). As the tasks require the vehicle loaded withgoods that approximately approach the maximum payload,such as 960 KG. To ensure safety, the vehicle speed is limitedto 20 Km/h. There is no human intervention during the tasks.

B. Tasks in Suzhou, Jiangsu

The tasks in Suzhou are divided into two parts. One taskis to deliver lunch meals from a restaurant to a company. Thevehicle receives 100 boxes of meals from the restaurant, anddelivers them to the building where the company is located.

8

(a)

(b) (c)

(d) (e)

(f) (g) (h)

(i) (j) (k)

(l) (m) (n)

Fig. 10. Selected task routes and demonstration photos. The first column includes three representative routes: (a) A 9.6 Km route in Zibo, Shandong forvegetable delivery; (f) A 1.2 Km route in Suzhou, Jiangsu for lunch meal delivery; (i) A 1.6 Km route in Shenzhen, Guangdong for lunch meal delivery.Photos taken in Zibo, Shandong: (b) Starting from the logistics company; (c) Crossing the gate of the logistics company; (d) and (e) Urban main roads. Photostaken in Suzhou, Jiangsu: (g) Left turn; (h) Spraying disinfectant in a residential area. Photos taken in Shenzhen, Guangdong: (j) Heavy traffic environment;(k) Surrounded with road users and meeting with oncoming cars; (l) U-turn in a narrow road; (m) Traffic light at night; (n) Contact-less picking up meal.

9

TABLE IITHE DETAILS OF THE CONTACT-LESS GOODS TRANSPORTATION TASKS DURING THE OUTBREAK OF COVID-19 IN CHINA.

City Task Distance Time Duration Payload Environments Characteristics

Zibo,Shandong

Vegetable Delivery 9.6 Km 75 min 600 KG Urban Main RoadLight Traffic

Heavy PayloadSlow Speed

Vegetable Delivery 5.4 Km 50 min 960 KGUrban Main RoadResidential Area

Light TrafficHeavy Payload

Slow Speed

Suzhou,Jiangsu

Meal Delivery 1.2 Km 30 min80 KG

(100 boxes of meals)Urban Main Road

Medium TrafficLeft-turnU-turn

Road Disinfection 0.6 Km 10 min - Residential AreaNarrow Road

Crowded ObstacleSlow Speed

Shenzhen,Guangdong


(80 boxes of meals)Urban Main RoadResidential Area

Heavy TrafficNarrow Road

U-turn


(120 boxes of meals)Urban Main RoadResidential Area

Heavy TrafficNarrow Road

U-turn

TABLE IIITHE DESCRIPTIONS FOR THE 4-LEVEL HUMAN INTERVENTIONS, AND THENUMBER OF TIMES HAPPENED IN THE TASKS IN SHENZHEN, GUANGDONG.

Level Description No. of OccurrencesZ1 Human intervention to avoid an accident 0Z2 Human intervention due to system errors 5

Z3 Human intervention due to freezing 25

Z4 Human intervention due to too slow speed 2

As shown in the route map, the vehicle needs to take twoU-turns and one left turn. The second task is to disinfect aroad in a residential district every night. We mount a sprayeron our cargo box so that the vehicle can be used to disinfecttraversed areas. The traversed route is approximately 600 mlong, but the environment is very crowded (e.g., vehicles andbicycles parked along the road). In these two tasks, few humaninterventions are involved to give way to other vehicles toavoid accidents or go through areas crowded with obstacles.

C. Tasks in Shenzhen, Guangdong

Similar to the tasks in Suzhou, the tasks in Shenzhenare to deliver lunch meals, but the routes in Shenzhen haveheavy traffic. So human intervention is frequently involved.We classify the human interventions into 4 levels: Z1 to Z4,the definitions of which are shown in Tab. III. The Z1 levelindicates the most serious case. Without human intervention,there will be an accident for Z1. The seriousness decreasesfrom Z1 to Z4.

As shown in Tab. III, Z1 never occurred during the tasks inShenzhen. There are 5 occurrences of Z2, which are caused bythe initialization failure of the Lidar-inertial-based localizationand the false detections of traffic lights at night. There are 25occurrences of Z3. The main reason is that there are so manyroad users around the vehicle that it stops to avoid collisions.There are 2 occurrences of Z4, both of which happen on verynarrow roads. On such roads, even experienced human driverscannot go through quickly.

VII. LESSONS LEARNED AND CONCLUSIONS

For object detection, we found that real-time performancedeserves much more attention than accuracy in practice. Theperception algorithms should be efficient since they are alwaysthe front-end of the entire autonomous navigation system.Therefore, we replaced the dense convolution layers withspatially sparse convolution [6] in our 3D object detectionmodule. As a result, the inference time is boosted from 250ms approximately to 56 ms.

For the point-cloud mapping, we found that our system wascapable of dealing with the rapid motion of the vehicle andshort-term point-cloud occlusions. Since most of the Lidarsare mounted parallel to the ground and the vehicles alwaysmove along the ground, the typical ring structure of the pointclouds makes the system difficult to observe in terms oftranslational movements vertical to the ground plane. Driftin this direction is inevitable during long-time operations. Inpractice, we learned that the GPS localization results signaledthe potential loop, which led to a consistent larger-regionmap for Lidar-based localization. In very crowded dynamictraffic environments, the system could be degraded by thedisturbances from moving objects. To tackle this issue, weuse semantic segmentation to remove movable objects to getclear point-cloud data.

For the planning part, we found the four-layer hierarchicalplanning necessary and effective. The application environ-ments of our vehicle are complex in the sense of roadstructures, traffic conditions, driving etiquette, etc. Layer-wiseplanning makes the system extensible for multiple environ-ments and various requirements. Furthermore, it is essentialto attach importance to the uncertainty from the perceptionmodules. The uncertainty comes from the limited accuracy ofthe perception system as well as the time delay in processing.For the planning, we avoid hitting the critical conditions andleave safe distances for the planned trajectory.

For the control part, we found that the greatest advantageof using an MPC can be gained by adding multiple con-

10

straints in the control process. When the vehicle operatesat low speed, the kinematic constraints restrain the vehi-cle motion planning and control. But with the incrementof speed, dynamic characteristics become more influential.As aforementioned, the dynamic-based MPC is much moreaccurate than the kinematic-based MPC since the predictivemodel is more accurate. However, we found that the complex-model prediction is not the best option. With regards tolow- and middle-speed operation environments, a simplifiedpredictive model with multiple constraints would be sufficient.As is known, the introduction of multiple constraints in theoptimization may narrow down the range of solutions andeven lead to no solution, which may corrupt the program. Insuch a case, it would be wise to bring in the relaxing factorwhen designing the objective function that could avoid the no-solution situation. Another factor that needs to be consideredcarefully when applying the MPC is the determination of theprediction horizon. A long prediction horizon can improve thecontrol accuracy to a certain degree, but would consume morecomputation power and reduce the real-time performance.Different controlled plants in different operation environmentsmay have different requirements on the control frequency.With the guarantee of the control frequency, our strategy isto select a prediction horizon that is as long as possible.

The successful daily operations demonstrated that usingour autonomous logistic vehicle could effectively avoid virusspread due to human contact. It effectively provides a physicalwall between the recipient and sender in the process of goodstransportation. For example, one task is to deliver mealsto quarantined persons suspected. Our vehicle automaticallydrives to the infected area and notifies the persons to collect thefood through short messages. There is no contact between thepersons sending the food and the quarantined during the wholeprocess, which successfully avoid virus spread through human-to-human contact. Currently, there is a huge demand forcontact-less goods transportation in many infected areas. Webelieve that continuous operations could extensively improveour vehicle and speed up the maturity of our autonomous-navigation technology.

VIII. ACKNOWLEDGEMENTS

This work was supported by the Research Grant Councilof Hong Kong SAR Government, China, under Project No.11210017 and No. 21202816, Guangdong Science and Tech-nology Plan Guangdong-Hong Kong Cooperation InnovationPlatform (Grant Number 2018B050502009), Shenzhen Sci-ence and Technology Innovation Commission (Grant NumberJCYJ2017081853518789), Science and Technology Develop-ment Fund, Macao S.A.R (FDCT) (Project reference number0015/2019/AKP).

REFERENCES

[1] J. Jiao, Q. Liao, Y. Zhu, T. Liu, Y. Yu, R. Fan, L. Wang, and M. Liu,“A novel dual-lidar calibration algorithm using planar surfaces,” in 2019IEEE IV. IEEE, 2019, pp. 1499–1504.

[2] W. Kabsch, “A discussion of the solution for the best rotation to relatetwo sets of vectors,” Acta Crystallographica Section A: Crystal Physics,Diffraction, Theoretical and General Crystallography, vol. 34, no. 5, pp.827–828, 1978.

[3] F. Pomerleau, F. Colas, R. Siegwart, and S. Magnenat, “Comparing icpvariants on real-world data sets,” Autonomous Robots, vol. 34, no. 3, pp.133–148, 2013.

[4] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloudbased 3d object detection,” in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2018, pp. 4490–4499.

[5] Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutionaldetection,” Sensors, vol. 18, no. 10, p. 3337, 2018.

[6] B. Graham and L. van der Maaten, “Submanifold sparse convolutionalnetworks,” arXiv preprint arXiv:1706.01307, 2017.

[7] G. Ghiasi, T.-Y. Lin, and Q. V. Le, “Dropblock: A regularizationmethod for convolutional networks,” in Advances in Neural InformationProcessing Systems, 2018, pp. 10 727–10 737.

[8] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networksfor semantic segmentation,” in Proceedings of the IEEE conference oncomputer vision and pattern recognition, 2015, pp. 3431–3440.

[9] M. Simon, S. Milz, K. Amende, and H.-M. Gross, “Complex-yolo: Aneuler-region-proposal for real-time 3d object detection on point clouds,”in ECCV. Springer, 2018, pp. 197–209.

[10] P. Yun, L. Tai, Y. Wang, C. Liu, and M. Liu, “Focal loss in 3d objectdetection,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp.1263–1270, April 2019.

[11] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal lossfor dense object detection,” in Proceedings of the IEEE internationalconference on computer vision, 2017, pp. 2980–2988.

[12] H. Ye, Y. Chen, and M. Liu, “Tightly coupled 3d lidar inertial odometryand mapping,” in 2019 IEEE International Conference on Robotics andAutomation (ICRA). IEEE, 2019.

[13] J. Zhang and S. Singh, “Loam: Lidar odometry and mapping in real-time.” in Robotics: Science and Systems, vol. 2, no. 9, 2014.

[14] Y. Sun, M. Liu, and M. Q.-H. Meng, “Improving RGB-D SLAM indynamic environments: A motion removal approach,” Robotics andAutonomous Systems, vol. 89, pp. 110 – 122, 2017.

[15] M. Liu, “Robotic online path planning on point cloud,” IEEE transac-tions on cybernetics, vol. 46, no. 5, pp. 1217–1228, 2015.

[16] C. E. Garcia, D. M. Prett, and M. Morari, “Model predictive control:theory and practiceâATa survey,” Automatica, vol. 25, no. 3, pp. 335–348, 1989.

Tianyu Liu Shenzhen Unity Drive Innovation TechnologyCo. Ltd., Shenzhen, China. Email: [email protected].

Qinghai Liao Department of Electronic and ComputerEngineering, The Hong Kong University of Science and Tech-nology, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Lu Gan Department of Computer Science and Engineer-ing, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Fulong Ma Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Jie Cheng Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Xupeng Xie Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

11

Zhe Wang Shenzhen Unity Drive Innovation TechnologyCo. Ltd., Shenzhen, China. Email: [email protected].

Yingbing Chen Department of Computer Science andEngineering, The Hong Kong University of Science and Tech-nology, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Yilong Zhu Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Shuyang Zhang Shenzhen Unity Drive Innovation Technol-ogy Co. Ltd., Shenzhen, China. Email: [email protected].

Zhengyong Chen Shenzhen Unity Drive InnovationTechnology Co. Ltd., Shenzhen, China. Email:[email protected].

Yang Liu Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Yang Yu Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Zitong Guo Shenzhen Unity Drive Innovation Technol-ogy Co. Ltd., Shenzhen, China. Email: [email protected].

Guang Li Shenzhen Unity Drive Innovation TechnologyCo. Ltd., Shenzhen, China. Email: [email protected].

Peidong Yuan Shenzhen Unity Drive Innovation Technol-ogy Co. Ltd., Shenzhen, China. Email: [email protected].

Dong Han Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences, Shenzhen, China. Email:[email protected].

Yuying Chen Department of Electronic and ComputerEngineering, The Hong Kong University of Science and Tech-nology, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Haoyang Ye Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Jianhao Jiao Department of Electronic and Computer En-gineering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Peng Yun Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Zhenhua Xu Department of Electronic and Computer En-gineering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Hengli Wang Department of Electronic and ComputerEngineering, The Hong Kong University of Science and Tech-nology, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Huaiyang Huang Department of Electronic and ComputerEngineering, The Hong Kong University of Science and Tech-nology, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Sukai Wang Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Peide Cai Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Yuxiang Sun Department of Electronic and ComputerEngineering, The Hong Kong University of Science and Tech-nology, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Yandong Liu Shenzhen Institutes of Advanced Technol-ogy, Chinese Academy of Sciences, Shenzhen, China. Email:[email protected].

Lujia Wang Shenzhen Institutes of Advanced Technol-ogy, Chinese Academy of Sciences, Shenzhen, China. Email:[email protected].

Ming Liu Department of Electronic and Computer Engi-neering, The Hong Kong University of Science and Technol-ogy, Clear Water Bay, Kowloon, Hong Kong, China. Email:[email protected].

Documents

Hercules: An Autonomous Logistic Vehicle for Contact-less ... · 1 Hercules: An Autonomous Logistic Vehicle for Contact-less Goods Transportation During the COVID-19 Outbreak Tianyu