9
1 Sub-optimal Control of Autonomous Wheel loader with Approximate Dynamic Programming Tohid Sardarmehni 1 and Xingyong Song 2,* Abstract Optimal control of wheel loaders in short loading cycles is studied in this paper. For modeling the wheel loader, the data from a validated diesel engine model is used to find a control oriented mean value engine model. The driveline is modeled as a switched system with three constant gear ratios (modes) of -60 for backwarding, 60 for forwarding, and zero for stopping. With these three modes, the sequence of active modes in a short loading cycle is fixed as backwarding, stopping, forwarding, and stopping. For the control part, it is assumed that the optimal path is known a priori. Given the mode sequence, the control objective is finding the optimal switching time instants between the modes while the wheel loader tracks the optimal path. To solve the optimal control problem, approximate dynamic programming is used. Simulation results are provided to show the effectiveness of the solution. Keywords- Optimal Control, Approximate Dynamic Programming, Autonomous Wheel Loader, Short Loading Cycles. TABLE I: Nomenclature ω e rotational speed of engine α angle between lift force and lift arm J e engine moment of inertia θ lift arm angle T e engine torque T buc torque due to the bucket load p im intake manifold pressure T arm,w torque due to boom weight u f mass of fuel per engine cycle I boom boom moment of inertia ω angular velocity of the boom F w wheels’ traction force V vehicle velocity F roll rolling resistance force u p pressure in the lift cylinder M tot mass of WL and load u s time derivative of steering angle X position of WL τ p time constant in p im Y position of WL p stat stationary intake manifold pressure β heading angle F cyl lifting force R turn turning radius R a parameter in the boom geometry δ steering angle I. I NTRODUCTION Wheel loaders (WLs) are essential equipment in off-road constructions. From the control perspective, minimizing fuel consumption and minimizing the operation time of wheel loaders are two crucial and contradictory control goals. In general, in a WL an engine provides the power. The three actions which consume power in a WL are traction, lifting, and steering. Traction is linked to the driveline for moving the vehicle or stopping it. Lifting is related to the hydraulic system to lift or bring down the boom/bucket, and steering is the one that navigates the vehicle. In a construction field, wheel loaders mostly go through repetitive cycles to pick up a load from one point and drop it in another point. This cycle includes 4 actions as backwarding, stopping, forwarding and then stopping. After filling the bucket with a load, the vehicle accelerates backward from the resting point. At time t 1 the operator applies the brakes while putting the gearbox in the neutral mode to bring the vehicle to a full stop at time t 2 . Then the operator puts the gearbox in the forward mode and accelerates toward the unloading point. At time t 3 , the operator applies the brakes again while putting the vehicle in the neutral gearbox to bring the vehicle to full stop at time t 4 . When the vehicle stops, the operator unloads the bucket. The path that a WL follows in such a cycle has a V-shape, and it is called a Short Loading Cycle (SLC). There are some critical questions in autonomous control of a WL such as 1- What is the best SLC to follow? 1 Tohid Sardarmehni, Postdoctoral Research Fellow, Department of Engineering Technology and Industrial Distribution, Texas A&M University, College Station, TX, 77843, USA. [email protected] 2,* Xingyong Song, Assistant Professor, Department of Engineering Technology and Industrial Distribution; Department of Mechanical Engineering; College of Engineering; Texas A&M University, College Station, TX, 77843, USA (Corresponding Author). [email protected] This research was partially supported by the National Science Foundation under Grant No. 1826410. This article has been accepted for oral presentation at 2019 Dynamic System and Control Conference. The content might NOT be the same as the final edition of the accepted paper for the DSCC. arXiv:1907.11993v1 [eess.SY] 28 Jul 2019

Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

1

Sub-optimal Control of Autonomous Wheel loaderwith Approximate Dynamic Programming

Tohid Sardarmehni1 and Xingyong Song 2,∗

Abstract

Optimal control of wheel loaders in short loading cycles is studied in this paper. For modeling the wheel loader, the datafrom a validated diesel engine model is used to find a control oriented mean value engine model. The driveline is modeled asa switched system with three constant gear ratios (modes) of −60 for backwarding, 60 for forwarding, and zero for stopping.With these three modes, the sequence of active modes in a short loading cycle is fixed as backwarding, stopping, forwarding, andstopping. For the control part, it is assumed that the optimal path is known a priori. Given the mode sequence, the control objectiveis finding the optimal switching time instants between the modes while the wheel loader tracks the optimal path. To solve theoptimal control problem, approximate dynamic programming is used. Simulation results are provided to show the effectivenessof the solution.

Keywords- Optimal Control, Approximate Dynamic Programming, Autonomous Wheel Loader, Short Loading Cycles.

TABLE I: Nomenclature

ωe rotational speed of engine α angle between lift force and lift armJe engine moment of inertia θ lift arm angleTe engine torque Tbuc torque due to the bucket loadpim intake manifold pressure Tarm,w torque due to boom weightu f mass of fuel per engine cycle Iboom boom moment of inertiaω angular velocity of the boom Fw wheels’ traction forceV vehicle velocity Froll rolling resistance forceup pressure in the lift cylinder Mtot mass of WL and loadus time derivative of steering angle X position of WLτp time constant in pim Y position of WL

pstat stationary intake manifold pressure β heading angleFcyl lifting force Rturn turning radiusR a parameter in the boom geometry δ steering angle

I. INTRODUCTION

Wheel loaders (WLs) are essential equipment in off-road constructions. From the control perspective, minimizing fuelconsumption and minimizing the operation time of wheel loaders are two crucial and contradictory control goals.

In general, in a WL an engine provides the power. The three actions which consume power in a WL are traction, lifting,and steering. Traction is linked to the driveline for moving the vehicle or stopping it. Lifting is related to the hydraulic systemto lift or bring down the boom/bucket, and steering is the one that navigates the vehicle.

In a construction field, wheel loaders mostly go through repetitive cycles to pick up a load from one point and drop it inanother point. This cycle includes 4 actions as backwarding, stopping, forwarding and then stopping. After filling the bucketwith a load, the vehicle accelerates backward from the resting point. At time t1 the operator applies the brakes while puttingthe gearbox in the neutral mode to bring the vehicle to a full stop at time t2. Then the operator puts the gearbox in the forwardmode and accelerates toward the unloading point. At time t3, the operator applies the brakes again while putting the vehiclein the neutral gearbox to bring the vehicle to full stop at time t4. When the vehicle stops, the operator unloads the bucket. Thepath that a WL follows in such a cycle has a V-shape, and it is called a Short Loading Cycle (SLC).

There are some critical questions in autonomous control of a WL such as1- What is the best SLC to follow?

1 Tohid Sardarmehni, Postdoctoral Research Fellow, Department of Engineering Technology and Industrial Distribution, Texas A&M University, CollegeStation, TX, 77843, USA. [email protected]

2,∗ Xingyong Song, Assistant Professor, Department of Engineering Technology and Industrial Distribution; Department of Mechanical Engineering;College of Engineering; Texas A&M University, College Station, TX, 77843, USA (Corresponding Author). [email protected]

This research was partially supported by the National Science Foundation under Grant No. 1826410.This article has been accepted for oral presentation at 2019 Dynamic System and Control Conference. The content might NOT be the same as the final

edition of the accepted paper for the DSCC.

arX

iv:1

907.

1199

3v1

[ee

ss.S

Y]

28

Jul 2

019

Page 2: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

2

2- What are the optimal times to switch from one mode to another mode?3- What is the optimal time of operation?4- When is the optimal time to bring the bucket down or lift it?5- What is the optimal steering angle in an SLC?6- How can one get the minimum fuel consumption?In this paper, we tried to address some of these questions in the context of optimal control and left the rest for our future

research. The behavior of a WL in an SLC portrays a switched system with controlled subsystems and a fixed mode sequence[1]. Switched systems are systems comprised of several subsystems/modes and a switching rule that assigns the active mode[2]–[4]. The sequence of active modes in a switched system is called the mode sequence. For example, consider a system withtwo modes and only one switching which happens at t = t1. One possible mode sequence is mode 1, mode 2 which meansthat ∀t < t1, mode 1 is active and for t ≥ t1 mode 2 is active. As seen in this example, when the mode sequence is fixed thecontroller is only responsible for assigning the switching time instants and not the active mode. Also, when the subsystemsinclude a continuous control, for example steering angle, braking, and hydraulic pressure in a WL, the subsystems are calledcontrolled subsystems. In general optimal control of switched systems is a challenging problem due to discontinuous natureof switching [2], [3], [5], [6].

A complete engine model has many variables/parameters, and it is not suitable for control purpose. In order to model adiesel engine, the data from a real engine [7] is used to tune the parameters of a mean value engine model, introduced in[8], which captures only the essential characteristics of the engine, but maintains a good level of accuracy. After the engineis modeled, the driveline of a WL is modeled as a switching system with forwarding, backwarding, and stopping modes. Inthe present study, it is assumed that the desired path is known. Hence, an optimal control problem is formulated to track thedesired path and to find the optimal switching times.

Compared to the existing literature, the present work neglects the filling and emptying dynamics of the bucket as were studiedin [9]. Also, the path is assumed to be known a priori. This is unlike [10]–[14] where the path planning is investigated. Thepresent study borrows most of the dynamical modeling from [8]. Also, the current study follows the same idea for modelingthe gearbox as a switched system and finding the optimal switching time that was done in [8]. However, the presented optimalcontrol problem formulation and solutions in this paper are entirely different from the ones discussed in [8]. In [8], a mixedinteger programming solution provided by a numerical solver is used for a specific initial condition. However, the solutiondeveloped in this paper is a closed loop feedback optimal policy which means that the solution is valid for all initial conditionsin a compact set selected as a domain of interest. Also, since in this paper the mode sequence is enforced, by a change ofmode sequence, one can generate the results reported in [15] in a closed loop feedback policy. Compared to [16], [17], thispaper provides the optimal policy in the whole time horizon rather than non-optimal control methods. In [18] optimal controlof a wheel loader for gravel application was performed with dynamic programming. In [19], an energy management strategyfor hybrid wheel loaders was studied with dynamic programming along with an advisory control policy. In general, dynamicprogramming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of thesystem increases rapid access to memory becomes prohibitive which is known as the curse of dimensionality [20]. The methodprovided in this paper is based on Approximate Dynamic Programming (ADP) which finds the near optimal solution insteadof the exact optimal solution. This is in fact a remedy for the curse of dimensionality in dynamic programming.

The rest of this paper is organized as follows. In section II, modeling of the diesel engine and the WL are discussed.In section III, optimal control problem formulations are presented first, and then the optimal control solution is discussed.Simulation results are provided in section IV, and section V concludes the paper.

II. DYNAMICS OF THE WHEEL LOADERA model for a wheel loader was introduced in [8] which includes four main parts as powertrain, driveline, steering, and

hydraulics. This model can be summarized as follows.

ωe =1Je

(Te(ωe, pim,u f )−Pe,load(ωe,ω,V,up,us)

)(1)

pim =1

τp(ωe)

(pstat(ωe,u f )− pim

)(2)

ω =Fcyl(up)Rsin(α(θ))−Tbuc(θ)−Tarm,w(θ)

Iboom(3)

θ = ω (4)

V =Fw(ωe,V )− sign(V )(ub +Froll)

Mtot(5)

X = V cos(β ) (6)Y = V sin(β ) (7)

(8)

Page 3: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

3

β =V

Rturn(δ )(9)

δ = us (10)In the subsequent subsection, engine model and its components are explained as they were needed. Due to page constraints,

the interested readers are referred to [8] for the details of the lifting dynamics and the driveline.

A. Engine Model

Equations (1) and (2) are the mean value engine model for modeling the engine torque and manifold pressure. In order tofind a function for Te, the data from a validated diesel engine developed in [7] is used. Hence, random square wave inputsignals with short (0.1 sec) and long (1.5 sec) pulse widths were used to excite both transient and steady-state response of theengine. For gathering the training patterns, the engine model was run for 300 sec subject to short pulse width and 100 secwith long pulse width. Then the values of Te, ωe, pim, and u f were first normalized and then saved. For finding a mean valuemodel, Te : R3→ R was selected as

Te(ωe, pim,u f ) =W TTe [1,ωe, pim,u f ]

T (11)

where WTe ∈ R4 is a tunable weight vector and superscript T denotes the transpose operator. For finding WTe least squares inbatch mode was used on the entire data. The Mean Absolute Error (MAE) between the predicted engine torque from (11) andthe data measured from the engine for the training was only 7.5×10−3. For validating the model, a new set of random inputswere given to the engine. For gathering the validation data, in the first 50 sec short pulse width was used and for the second 50sec long pulse width was used. The WTe that was found before was used, without retraining, to predict the engine torque. TheMAE between the predicted and measured values of engine torque was only 4×10−2 with the validation data which showsthe effectiveness of the training with least squares. The performance of the model introduced in (11) and the data from theengine in the validation is shown in Figure 1.

0 20 40 60 80 100Time (sec)

0

0.2

0.4

0.6

0.8

1

Nor

mal

ized

Te

Fig. 1: Comparison between the predicted Te and measured Te with the validation data.In WLs, the engine provides power for hydraulics, traction, and steering. Hence [8]

Pe,load = Pli f t(ω,up)+Psteering(us)+Ptraction(ωe,V ) (12)

In (12), Pli f t is the power required for lifting and can be defined as

Pli f t(ω,up) = max(

0,Q(ω)up

ηli f t

)(13)

In (13), Q(.) is the mass flow rate of the hydraulic fluid into the lifting cylinder and ηli f t is the lifting efficiency. Also, steeringpower can be shown as

Psteering =Cstu2s (14)

where Cst is a constant. For modeling the traction power, [8] proposed a hybrid model based on the gear ratio as

Ptraction(ω,V ) = Ppump(ωe,V,γ)|sign(γ)| (15)

Assuming three values as γ ∈ −60,0,+60, one can consider the switching dynamics as backwarding, stopping, and forward-ing.

For identifying the manifold pressure, the same approach as using the data from the validated engine model was used totune the parameters of the model introduced in [21], [22] as

τp(ωe) = τ1ωe + τ2 (16)

pstat = a1ωe +a2u f +a3 (17)

Page 4: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

4

In (16) and (17), τ1, τ2, a1, a2, and a3 are constants. For this tuning, gradient descent algorithm was used with the sequentialoffline data. Once the training concluded, the model was validated with a new set of data. The MAE between the trained modeland the validation data was only 5.3×10−2.

B. State space model

It is desired to show the dynamics of the WL as

x(t) = fv(x(t))+ gv

(x(t))u(t), v ∈= 1,2,3, x(0) = x0 (18)

where x∈Rn is the state vector, and Lipschitz functions fv : Rn→Rn and gv : Rn→Rm denote the dynamics of the subsystems.The active mode in time instant t is shown by sub-index v, and the set of all subsystems is shown as 1,2,3 which correspondsto forwarding, backwarding, and stopping mode in the WL. Equation (18) shows a switched system dynamics where eachsubsystem includes a continuous control. In the literature of the switched systems, this type of subsystems is called controlledsubsystems. Also, equation (18) shows that each mode is control affine. To show the dynamics of the WL as (18), let the statevector as

x(t) = [x1,x2,x3,x4,x5,x6,x7,x8,x9]T (19)

where time dependency of x1≤i≤9 is dropped for notational simplicity, i.e., xi = xi(t). Also, x1 = ωe, x2 = pim, x3 = θ , x4 = ω ,x5 = X , x6 = Y , x7 =V , x8 = β , x9 = δ . In [8], the control vector was selected as u(t) = [u f ,up,ub,us]

T . In this research weare interested to find the control affine dynamics and existence of nonlinear terms with respect to us and up will not permitsuch selection for u(t). To solve this problem, similar to the method used in [23], two new states can be defined as x10 = upand x11 = us. Taking time derivative from x10, and x11, one has

x10 = u1 (20)x11 = u2 (21)

Also, one can select u3 = u f and u4 = ub. Therefore, u(t) = [u1,u2,u3,u4]T .

Remark 1: In order to normalize the range of variations of the variables, one can nondimensionalize the dynamics. Forthis purpose, let X1 as the maximum value for variable |x1|. Hence, one can define x1 = x1

X1. With this transformation, one

can see that |x1| ∈ [0,X1] and |x1| ∈ [0,1]. Considering a scalar system as x1 = f (x1)+g(x1)u(t), it is straight forward to see˙x1 =

1X1

(f (X1x1)+g(X1x1)u(t)

). The discussed nondimensionalization is used in this paper for optimal controller design.

III. OPTIMAL CONTROL FORMUATION

Considering the dynamics as (18) and assuming the mode sequence is known, it is desired to find optimal switching timesand a feedback control policy that minimizes the cost function

J(x0) =(x(t f )− r(t f )

)T S(x(t f )− r(t f )

)+∫ t f

t0

12

((x(t)− r(t)

)T Q(x(t)− r(t)

)+u(t)T Ru(t)

)dt (22)

In (22), t0 is the initial time, t f is the final time, and r ∈Rn is the reference signal which is a known function of time. S ∈Rn×n

is a positive semi-definite matrix for penalizing the terminal cost, Q ∈ Rn×n is the state penalizing matrix which is assumedto be positive semi-definite, and R ∈ Rm×m is a positive definite control penalizing matrix.

A. Including Mode Sequence

To solve the optimal switching problem, one needs to solve a two-level optimization [24]. In the upper level, switching timesare assigned and in the lower level, the control policies to ensure the tracking are sought. To introduce the idea, consider aswitched system with two subsystems and only one switching which happens at t = t1. Also, let the mode sequence be mode1, mode 2. To make the switching time instant an independent parameter, let [24]

t =

t0 +(t1− t0)t if 0≤ t < 1t1 +(t f − t1)(t−1) if 1≤ t ≤ 2

(23)

using chain rule one can find x′(t) = dxdt =

dxdt

dtdt as

x′(t) =

(

f1(x(t))+ g1

(x(t))u(t)

)(t1− t0) if 0≤ t < 1(

f2(x(t))+ g2

(x(t))u(t)

)(t f − t1) if 1≤ t ≤ 2

(24)

Also, changing the independent variable in the cost function (22) from t to t leads

J(x0)≡ J(t1,x0) =(x(2)− r(2)

)T S(x(2)− r(2)

)+∫ 1

0

12

((x(t)− r(t)

)T Q(t1− t0)(x(t)− r(t)

)+u(t)T R(t1− t0)u(t)

)dt

+∫ 2

1

12

((x(t)− r(t)

)T Q(t f − t1)(x(t)− r(t)

)+u(t)T R(t f − t1)u(t)

)dt

(25)

Page 5: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

5

Letting δ t as a small sampling time, by using Euler method one can discretize (24) and (25) as

xk+1 =

f1(xk

)+g1

(xk

)uk if 0≤ k < N′

2

f2(xk

)+g2

(xk

)uk if N′

2 ≤ k ≤ N′(26)

J(t1,x0) =(xN′ − rN′

)T S(xN′ − rN′

)+

12

N′/2

∑k=1

((xk− rk)

T Q(t1− t0)δ t(xk− rk)+uTk R(t1− t0)δ tuk

)+

12

N′−1

∑k=N′/2+1

((xk− rk)

T Q(t f − t1)δ t(xk− rk)+uTk R(t f − t1)δ tuk

) (27)

In (26), f1(xk

)= xk + f1

(xk

)(t1− t0)δ t, g1

(xk

)= g1

(xk

)(t1− t0)δ t, f2

(xk

)= xk + f2

(xk

)(t f − t1)δ t, and g2

(xk

)= g2

(xk

)(t f −

t1)δ t. Also, in (26), k ∈ [1,N′] is the discrete time index where N′ = number o f switching+1δ t [1]. For finding the minimum cost-to-go

from discrete time index k to N′(value function) one has1

V (t1,xk) =minu(.)

(12(xk− rk)

T Q(t1− t0)δ t(xk− rk)+12

uTk R(t1− t0)δ tuk +V (t1,xk+1)

)(28)

The optimal policy can be defined as

u∗k = argminu(.)

(12(xk− rk)

T Q(t1− t0)δ t(xk− rk)+12

uTk R(t1− t0)δ tuk +V (t1,xk+1)

)(29)

Taking the gradient of the optimal value function, one can define the optimal costate as

λk(t1,xk) =

Q1 +

∂x∗k+1

∂xkλk+1(t1,x

∗k+1

) if 0≤ k < N′2

Q2 +∂x∗

k+1∂xk

λk+1(t1,x∗k+1

) if N′2 ≤ k ≤ N′

(30)

whereQ1 = Q(t1− t0)δ t(xk− rk)

Q2 = Q(t f − t1)δ t(xk− rk)(31)

It is straight forward to see if optimal costates are known, one can directly calculate the optimal policy as

u∗k =

−(R(t1− t0)δ t)−1gT1 (xk)λk+1(t1,x

∗k+1

) if 0≤ k < N′2

−(R(t f − t1)δ t)−1gT2 (xk)λk+1(t1,x

∗k+1

) if N′2 ≤ k ≤ N′

(32)

In (30) and (32), the super-script ∗ denotes the optimality. In the next subsection, a solution is presented to find the optimalcostates.

B. ADP Solution

The backbone of the solution is training neural networks to approximate λk+1(t1,xk+1) from (t1,xk). Based on WeierstrassApproximation Theorem [25], linear-in-parameter neural networks with polynomial basis functions can uniformly [26] approxi-mate continuous functions to a desired degree of precision in a compact set. Assuming that the value functions are continuouslydifferentiable, one can use linear-in-parameter neural networks to approximate the costates. Hence, the approximate costatescan be calculated as

λk+1(t1,xk+1) = W Tk φ(t1,xk) (33)

where Wk ∈ Rmλ ×Rn is a tunable weight vector and mλ is the number of polynomial basis functions (neurons). The weightvectors are tuned through the training process backward in time. Once the costates are known, one finds the approximateoptimal policy as

uk(t1,xk) =

−R−1

1 gT1(xk

)λk+1(t1,xk+1) if 0≤ k < N′

2

−R−12 gT

2(xk

)λk+1(t1,xk+1) if N′

2 ≤ k ≤ N′(34)

In (34), R1 = Rδ t(t1− t0) and R2 = Rδ t(t f − t1). As mentioned before, for training, one can go backward in time and find thecostates and save them for online control.

Once the training concluded, one needs to find the optimal switching times from the costates for a selected initial conditionx0 ∈ Ω. For this purpose, one can propagate the states along all possible switching times by using the costates and find theoptimal cost to go for all possible switching time. Once done, one can choose the switching times which lead to the minimumcost.

1For k < N′/2. Otherwise, (t1− t0) in (28) should be replaced by (t f − t1).

Page 6: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

6

TABLE II: Parameters of WL model

Parameter DescriptionJe = 0.43 engine inertia

r1 = 2 lift arm dimensions

r2 = 2 lift arm dimensions

θ1 = pi/6 bend angle of the lift arm

nlc = 2 number of lift cylinders

Alc = 0.0284 lift cylinder cross section area

yg = 2.13 height between body and lift arm hinge

ηli f t = 0.5 lift system efficiency

yo f f = 0.5 boom hinge offset

Mtot = 31330 mass of WL and load

µroll = 0.03 rolling resistant factor

Froll = 9.81µrollMtot rolling resistance

L = 3.7 WL wheel base

ρ f = 832 fuel density

Rw = 0.3175 wheel radius

T1 = 5 constant for Ppump

T2 = -2.5 constant for Ppump

ηgb = 0.9 gearbox efficiency

Cst = 105 constant for steering power

Fbuc = 0.0981 Mtot bucket and load weight

Farm = Fbuc arm weight

Iboom = 1200 moment of inertia for the boom

IV. SIMULATION RESULTS

To start the simulations, the values selected for the parameters of the WL model are given in Table II. Some of theseparameters are not introduced in this paper due to page constraints. Interested readers are directed to [8] for more de-tails. Also, WTe = [−0.154712845456646, 0.555616949283938, 4.84689263976440, 0.450411638112411]T in (11) and theparameters in (16) and (17) are given as a1 = 3.07399910699804, a2 = 0.467292644030092, a3 = 0.467292644032403,τ1 = 5.94705195850280, τ2 =−0.703974251212264.

In order to see the performance of the model, some selected control inputs are given to the WL, and the performance ofthe WL is evaluated. The mode sequence in this example is backwarding, stopping, forwarding, stopping. All the switchingtime instants and the continuous control inputs to the wheel loader are selected manually to simulate a short loading cycle.The braking input, vehicle velocity and the path are shown in Figures 2, 3, and 4, respectively.

In order to apply the ADP method to find the optimal switching time and control, one assumes that the reference signal is aknown function of time. For this example, the reference signal is selected as r(t) = [0,0,0,0,sin(πt),sin(πt),0,0,0,0,0]. Thestart and end times are selected as t0 = 0, and t f = 3. In this example, the mode sequence is selected as backwarding, stopping.In other words, half of an SLC is considered. Since there is only one switching in the system, by choosing δ t = 0.001, onecan calculate N′ = 2000. The state and control penalizing matrices are selected as S = diag([0,0,0,0,104,104,104,0,0,0,0]),Q = diag([0,0,0,0,104,104,0,0,0,0,0]), and R = diag([1,1,1,1])/δ t where diag([a,b]) is a diagonal matrix with values of aand b on the main diagonal and zero elsewhere. In choosing S matrix for terminal state penalizing, one notes that penalizationof the velocity is enforced as it is important to bring the vehicle to rest at the final time.

In order to start the training, one needs to select a good neural network to capture the dynamics of the costates. Mostly,such selection is performed by trial and error. After selecting a relatively rich set of basis functions, the training process wasperformed. The history of the neural network weights is shown in Figure 5. As one can see from Figure 5, the history of theweights shows a jump at t = 1 which is the switching time.

When the training process concluded, a random initial condition was selected in the domain of training. The initial conditionfor the reference signal was selected as r0 = [0,0,0,0,0.3,−0.1,0,0,0,0,0]T and the optimal switching time was found as

Page 7: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

7

0 2 4 6 8 10Time (sec)

0

0.1

0.2

0.3

0.4

0.5

Fig. 2: Illustration of the normalized braking input ub. The maximum value for normalization was selected as U4 =5×105.

0 2 4 6 8 10Time (sec)

-0.2

-0.1

0

0.1

0.2

Fig. 3: Illustration of the normalized velocity of the WL in the presented SLC.

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0-0.05

0

0.05

0.1

0.15

0.2

PathStart PointEnd Point

Fig. 4: Illustration of the path of wheel loader with the selected inputs. The start and end points are highlighted in thefigure. The wheel loader goes a V-shape path which simulates an SLC.

0 0.5 1 1.5 2-2

-1

0

1

2

31010

Fig. 5: The history of the neural network to approximate the costates.

Page 8: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

8

t1 = 2.86. Afterward, the optimal costates were used to propagate the states from the previously selected random initialcondition. The history of the states is shown in Figure 6. The poor tracking is the result of approximation errors in capturingthe dynamics of the costates and it is ongoing research with our research group to solve this problem.

0 0.5 1 1.5 2-0.5

0

0.5

1

1.5

2

2.5

Stat

es

x5

x6

r5

r6

Fig. 6: State history in the closed loop system. The optimal switching time was found as t1 = 2.86.

V. CONCLUSION

An approximate dynamic programming solution is used for optimal control of a wheel loader. For this purpose, the dynamicsof the wheel loader are modeled as a switched system with controlled subsystems and a fixed mode sequence. Some simulationresults are provided to show the effectiveness of the solution.

REFERENCES

[1] A. Heydari and S. N. Balakrishnan, “Optimal switching and control of nonlinear switching systems using approximate dynamic programming,” IEEETransactions on Neural Networks and Learning Systems, vol. 25, no. 6, pp. 1106–1117, June 2014.

[2] T. Sardarmehni and A. Heydari, “Suboptimal scheduling in switched systems with continuous-time dynamics: A least squares approach,” IEEETransactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2167–2178, June 2018.

[3] T. Sardarmehni and A. Heydari, “Sub-optimal scheduling in switched systems with continuous-time dynamics: A gradient descent approach,”Neurocomputing, vol. 285, pp. 10 – 22, 2018.

[4] C. Zhang, M. Gan, and J. Zhao, “Data-driven optimal control of switched linear autonomous systems,” International Journal of Systems Science, vol. 0,no. 0, pp. 1–15, 2019.

[5] A. G. Khiabani and A. Heydari, “Optimal switching of voltage source inverters using approximate dynamic programming,” in ASME 2018 DynamicSystems and Control Conference (DSCC 2018), 2018, pp. V001T01A006–V001T01A015.

[6] T. Sardarmehni and A. Heydari, “Optimal switching in anti-lock brake systems of ground vehicles based on approximate dynamic programming,” inASME 2015 Dynamic Systems and Control Conference (DSCC 2015), 2015, pp. V003T50A010–V003T50A020.

[7] J. Wahlstrom and L. Eriksson, “Modelling diesel engines with a variable-geometry turbocharger and exhaust gas recirculation by optimization ofmodel parameters for capturing non-linear system dynamics,” Proceedings of the Institution of Mechanical Engineers, Part D: Journal of AutomobileEngineering, vol. 225, no. 7, pp. 960–986, 2011.

[8] V. Nezhadali, B. Frank, and L. Eriksson, “Wheel loader operation—optimal control compared to real drive experience,” Control Engineering Practice,vol. 48, pp. 1 – 9, 2016.

[9] S. Sarata, H. Osumi, Y. Kawai, and F. Tomita, “Trajectory arrangement based on resistance force and shape of pile at scooping motion,” in IEEEInternational Conference on Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004, vol. 4, April 2004, pp. 3488–3493 Vol.4.

[10] S. Sarata, Y. Weeramhaeng, and T. Tsubouchi, “Approach path generation to scooping position for wheel loader,” in Proceedings of the 2005 IEEEInternational Conference on Robotics and Automation, April 2005, pp. 1809–1814.

[11] S. Shigeru, O. Hisashi, H. Yusuke, and M. Gen, “Trajectory arrangement of bucket motion of wheel loader,” in Proc. 20th Int. Symp. Autom. Robot.Constr. ISARC 2003, 2017, p. 135–140.

[12] B. Alshaer, T. Darabseh, and M. Alhanouti, “Path planning, modeling and simulation of an autonomous articulated heavy construction machine performinga loading cycle,” Applied Mathematical Modelling, vol. 37, no. 7, pp. 5315 – 5325, 2013.

[13] B. Hong and X. Ma, “Path planning for wheel loaders: A discrete optimization approach,” in 2017 IEEE 20th International Conference on IntelligentTransportation Systems (ITSC), Oct 2017, pp. 1–6.

[14] T. Takei, T. Hoshi, S. Sarata, and T. Tsubouchi, “Simultaneous determination of an optimal unloading point and paths between scooping points and theunloading point for a wheel loader,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2015, pp. 5923–5929.

[15] V. Nezhadali, L. Eriksson, and A. Froberg, “Modeling and optimal control of a wheel loader in the lift-transport section of the short loading cycle,”IFAC Proceedings Volumes, vol. 46, no. 21, pp. 195 – 200, 2013, 7th IFAC Symposium on Advances in Automotive Control.

[16] K. Oh, H. Kim, K. Ko, P. Kim, and K. Yi, “Integrated wheel loader simulation model for improving performance and energy flow,” Automation inConstruction, vol. 58, pp. 129 – 143, 2015.

[17] R. Ghabcheloo, M. Hyvonen, J. Uusisalo, O. Karhu, J. Jara, and K. Huhtala, “Autonomous motion control of a wheel loader,” in ASME 2009 DynamicSystems and Control Conference, vol. 2, Oct. 2009, p. 427–434.

[18] B. Frank, J. Kleinert, and R. Filla, “Optimal control of wheel loader actuators in gravel applications,” Automation in Construction, vol. 91, pp. 1 – 14,2018.

[19] F. Wang, M. Mohd Zulkefli, Z. Sun, and K. Stelson, “Energy management strategy for a power-split hydraulic hybrid wheel loader,” Proceedings of theInstitution of Mechanical Engineers, Part D: Journal of Automobile Engineering, vol. 230, no. 8, pp. 1105–1120, 7 2016.

[20] D. Kirk, Optimal Control Theory: An Introduction. Dover Publications, 2004.[21] T. Nilsson, A. Froberg, and J. Aslund, “Optimal operation of a turbocharged diesel engine during transients,” apr 2012. [Online]. Available:

https://doi.org/10.4271/2012-01-0711

Page 9: Sub-optimal Control of Autonomous Wheel loader with ...programming is a strong method which provides the closed loop feedback optimal policy [20]. However, as the order of the However,

9

[22] G. Rizzoni, L. Guzzella, and B. M. Baumann, “Unified modeling of hybrid electric vehicle drivetrains,” IEEE/ASME Transactions on Mechatronics,vol. 4, no. 3, pp. 246–257, Sep. 1999.

[23] A. Heydari and S. N. Balakrishnan, “Path planning using a novel finite horizon suboptimal controller,” Journal of Guidance, Control, and Dynamics,vol. 36, no. 4, pp. 1210–1214, 2013.

[24] X. Xu and P. J. Antsaklis, “Optimal control of switched systems based on parameterization of the switching instants,” IEEE Transactions on AutomaticControl, vol. 49, no. 1, pp. 2–16, 2004.

[25] W. Rudin, Principles of Mathematical Analysis, 3rd ed. McGraw-Hill, 1976.[26] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366,

1989.