bayesian optimisation with prior reuse for motion planning

Bayesian Optimisation with Prior Reuse for Motion Planning inRobot Soccer

Abhinav [email protected] Institute of Technology

Kharagpur

Arnav Kumar [email protected]

Indian Institute of TechnologyKharagpur

KV [email protected]


Arpit Tarang [email protected]


Jayanta [email protected]


ABSTRACTWe integrate learning and motion planning for soccer playing dif-ferential drive robots using Bayesian optimisation. Trajectoriesgenerated using end-slope cubic Bezier splines are �rst optimisedglobally through Bayesian optimisation for a set of candidate pointswith obstacles. �e optimised trajectories along with robot and ob-stacle positions and velocities are stored in a database. �e closestplanning situation is identi�ed from the database using k-NearestNeighbour approach. It is further optimised online through reuseof prior information from previously optimised trajectory. Ourapproach reduces computation time of trajectory optimisation con-siderably. Velocity pro�ling generates velocities consistent withrobot kinodynamoic constraints, and avoids collision and slipping.Extensive testing is done on developed simulator, as well as onphysical di�erential drive robots. Our method shows marked im-provements in mitigating tracking error, and reducing traversaland computational time over competing techniques under the con-straints of performing tasks in real time.ACM Reference format:Abhinav Agarwalla, Arnav Kumar Jain, KV Manohar, Arpit Tarang Saxena,and Jayanta Mukhopadhyay. 2018. Bayesian Optimisation with Prior Reusefor Motion Planning in Robot Soccer. In Proceedings of Conference on DataScience, Goa, India, Jan 2018 (CoDS-COMAD), 10 pages.DOI: 10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION�e environment of a robot soccer game is highly dynamic andadversarial with fast moving di�erent drive robots, and a compet-ing opponent team. High-level complex manoeuvres like a�ackingand passing are performed through coordination of multiple robotsmaking e�cient planning even more challenging. Consider a casewhere a robot has to intercept a moving ball and shoot it to goal.�e robot needs to reach the set interception point at the same timeinstant as the ball. High deviations from the generated trajectorywould lead to missing the ball which is highly undesirable. Also,

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).CoDS-COMAD, Goa, India© 2018 Copyright held by the owner/author(s). 978-x-xxxx-xxxx-x/YY/MM. . .$15.00DOI: 10.1145/nnnnnnn.nnnnnnn

Figure 1: Planning Overview: Initial trajectory generated isoptimised using Bayesian optimisation for a combination ofvarious starting and end points, and saved to a database. Inthe online optimisation, the closest prior is queried usingk-Nearest Neighbour algorithm and trajectory is further op-timised. Using the optimised trajectory and vision input,tracker calculates and sends robot velocities.

there is an opponent team whose bots are racing to be the �rstto reach the ball. �is motivates the need for trajectory optimi-sation to intercept the ball in least possible time. Both trajectorytraversal time and trajectory optimisation time contribute to inter-ception time, and must be reduced. In this paper, we address thisissue and determine time-e�cient computation of trajectories fordi�erentiable drive robots under real time constraints.

Previous works in robotics have addressed the problem of tra-jectory generation and tracking in a dynamic environment. �eseapproaches were built on joining points by lines to avoid obsta-cles [3, 25], and later extended by interpolating the points usingcurves to make the path di�erentiable and smooth at the way-points [21, 32]. However, the path is not globally optimal withrespect to time, and also may not be free of collision. Recent workslike Dynamic Window [35] and Potential Field [14] are more fo-cused on locally optimal trajectories to avoid collision with obsta-cles. Particle Swarm Optimization(PSO) [33] have been used tooptimise trajectories, but take a lot of time to converge. �is makesthem unsuitable for real time optimisation such as our use case. We

arX

iv:1

611.

0185

1v3

[cs

.RO

] 1

8 O

ct 2

017

CoDS-COMAD, Jan 2018, Goa, India A. Agarwalla et al.

adopt Bayesian optimisation for trajectory optimisation since it isa global optimisation technique requiring very few evaluations ofobjective function. Fig. 1 gives an overview of our approach.

Another interesting property of Bayesian optimisation is the useof a surrogate function. Surrogate function forms the prior, andcan be used to guide optimisation to regions containing minimapoints. We reuse this prior as obtained by Bayesian optimisationof trajectories, and store it in an o�ine reference database. �isreduces computation time since many similar situations arise in ro-bot soccer. We use starting and ending positions of the robots, theirvelocities, and obstacle positions as features for training k-NearestNeighbour [7] model. �e k-Nearest Neighbour technique identi-�es the closest prior from the reference database, which is reusedto reduce optimisation time. We test our approach1 on simulatordeveloped by us. We also experimented with robots designed anddeveloped by Kharagpur RoboSoccer Students’ Group2 (KRSSG),and report signi�cant improvements in the computation of trajecto-ries as compared to the state of the art techniques. �e kinematicsof the manufactured robot is depicted in Fig. 2.

Our main contributions summarized:• We introduce the use of Bayesian optimisation for deter-

mining control points leading to time-e�cient trajectoriesthat avoid obstacles, and kinematic constraints of the dif-ferential drive robots.

• We reuse prior information of surrogate model in Bayesianoptimisation from previously optimised trajectories to re-duce trajectory optimisation time.

• We identify most suitable prior to be reused for a particularsituation using k-Nearest Neighbours algorithm [7].

• We extensively test our approach on a simulator as well ason physical soccer playing robots.

2 RELATEDWORKTraditional approaches such as PolarBased [9], MergeSCurve [27]and Dynamic Window [5] [35] have several pitfalls, most impor-tant to us being: (1) sub-optimal traversal due to being reactive innature, (2) no guarantees and estimates of travel time and distanceleading to erroneous high level behaviours, (3) no incorporationof kinodynamic constraints of the bot (except Dynamic Window)and (4) frequent slipping of wheels at high speed. Trajectory basedmethods generally use Bezier curves [13] [31], B-splines [36] or�intic Bezier Splines [17] for trajectory generation along withbuilding velocity pro�les that incorporate kinodynamic constraints.Building on work by Shiller and Gwo [36], and Lau et al. [17], weuse cubic Bezier splines for trajectory generation. We optimise itglobally using Bayesian Optimisation [22].

Potential Field [14] and Visibility Graphs [16] are widely usedalgorithms for obstacle avoidance. While the former minimizesa potential function with obstacles as repulsive and �nal goal asa�ractive poles, the optimisation procedure does not guarantee aglobally optimal path. �e la�er generates paths that are very closeto the obstacles frequently leading to collisions. Particle SwarmOptimisation [33, 44] optimises Ferguson cubic spline trajectories,but are slow in real time because of large search space. On the other

1h�ps://github.com/abhinavagarwalla/motion-simulation2h�p://www.krssg.in

Figure 2: Robot Kinematic Model. (x ,y,θ ,vl ,vw ) constituterobot state, which are governed by standard model for dif-ferential robots with x ′ = vlcos(θ ),y′ = vl sin(θ ),θ ′ = vω .

hand, our approach generates obstacle free optimised trajectoriesin real time.

Bayesian optimisation has been employed for planning, sensingand exploration tasks in robotics. In [24], [23] and [38], authors pro-pose an active learning algorithm for online robot path planning andreducing uncertainty in its state and environment under a time con-straint. �ey essentially model the trade-o� between exploration(uncertainty in environment) and exploitation (optimising plannedtrajectories) using a Gaussian process. In [20] and [6], authorsaddress the problem of gait optimisation for speed and smoothnesson a quadruped robot. In a more machine learning context, [1]proposes a generic method for e�cient tuning of new problemsusing previously optimised tasks. �ey use a novel Bayesian optimi-sation technique through combining ranking and optimisation. [12]analyses the e�ect of hyper-parameters on performance througha functional ANOVA framework imposed on RandomForest [19]predictions. Similar technique is discussed in [4], where authorsmap features and optimal parameters using an Arti�cial NeuralNetwork, and then optimise using Covariance Matrix AdaptationEvolution Strategy [11]. We extend these approaches in a kinody-namic planning framework using splines through reuse of priorinformation from Bayesian optimisation.

Rosman et al. [30] formalise Bayesian Policy Reuse (BPR) assolving a task within a limited number of trials by a decision makingentity equipped with a library of policies. BPR introduces theproblem where multiple policies are learnt for solving a single taskin a reinforcement learning context. In contrast, we only store asingle policy for di�erent tasks of planning in a robot soccer domaintackling planning. We restrict the more general concept of BPR topolicy reuse of Bayesian optimisation to suit our use case.

�is paper is organised as follows: Section 3 details trajectoryoptimisation process using Bayesian optimisation and database for-mation for prior reuse. Section 4 introduces trajectory generationusing splines and velocity pro�ling approach for kinematic con-straints. Section 5 describes the experimental setup of the simulatorand kinematics of the manufactured robots. Section 6 comparesresults of our approach with various other approaches, is followedby concluding Section 7.

Bayesian Optimisation with Prior Reuse for Motion Planning in Robot Soccer CoDS-COMAD, Jan 2018, Goa, India

3 TRAJECTORY OPTIMISATIONOptimised trajectories improve winning chances by increasing ballpossession and decreasing trajectory traversal time. However, opti-mising trajectories in a soccer match severely limits the reactiontime of the robots due to computational overheads. We split tra-jectory optimisation into o�ine and online computation. First,trajectories are optimised o�ine and stored in a database. Foronline optimisation, the database is queried for similar situationsto optimise it further. �e proposed method is able to optimisetrajectories under strict real time constraints.

We employ Bayesian optimisation technique for both o�ine andonline optimisation of positions of control point. Se�ing controlpoints at di�erent locations leads to di�erent trajectories, as de-scribed in Section 4. Traversal time for the generated trajectories istaken as the objective to be minimised. Apart from time, avoidingwheel slipping and high curvature paths is also important. �eseare handled implicitly in the trajectory generation process as de-scribed in Section 4. Formally, the objective F is set to minimisetraversal time depending on the position of control points of thegenerated trajectory. Control points are positions through whicha trajectory must pass, and can be interpreted as parameters fortrajectory generation. �ese are denoted using CPi,x and CPi,ywhere i , x and y refer to ith control point and coordinate axis. jdenotes the total number of control points. �e set of control pointsis denoted by P , where P = {CP1,x ,CP1,y . . .CPj,x ,CPj,y }. P formsthe parameter set for optimisation.

F (P) = F ({CP1,x ,CP1,y . . .CPj,x ,CPj,y }) =Traversal time of generated trajectory (1)

3.1 Bayesian OptimisationSince F is a computationally expensive non-linear function forwhich closed form computation or gradient calculation is not pos-sible, Bayesian optimisation [22] is employed to minimise F .

P∗ = arg minP

F (P) (2)

Bayesian optimisation is a surrogate based global optimisationmethod that tries to optimise with very few function evaluations.As opposed to other optimisation methods, it remembers the his-tory of evaluations i.e. acquired samples on which objective hasalready been evaluated. Using this history and an acquisition func-tion, it decides the next sampling point P to evaluate F on. It cane�ciently handle exploration-exploitation dilemma by not onlymodelling uncertainty in surrogate model, but also predicting valueof objective F at the next sampling point. �e next sampling pointis selected such that it minimises the uncertainty in the functionalspace of the objective.

Bayesian optimisation e�ectively shi�s costly evaluations ofobjective function to cheap evaluations of the surrogate model G.Gaussian Process [28] is utilised as the surrogate model. �e advan-tage of using Gaussian Process as surrogate is that a closed formsolution for acquisition function is obtainable since Gaussian distri-bution is a self-conjugate distribution. Gaussian process models thefunction f : P → F (P), through a mean function and a covariancefunction k(A,B), where A and B denote two parameter sets. We �xmean as a constantm, and use Automatic Relevance Determination

(ARD) Matern 5/2 kernel [37] as the covariance function. ARDMatern 5/2 kernel, denoted by (3) avoids overly smooth functions,as generated by primitive kernels. σs denotes covariance amplitude,and is generally kept as 1. In (4), Ai , Bi and li denote ith parameterfor set A and B, and characteristic length-scale factor respectively.n denotes the number of parameters, and d captures the distancebetween two parameter sets.

k(A,B) = σ 2s (1 +

√5d + 5

3d) exp(−√

5d) (3)

d =n∑i=1

(Ai − Bi )2

l2i(4)

Given a history of evaluated points, the next evaluation point isselected as the one which maximises the acquisition function. Weuse Expected Improvement [37] as acquisition functionC , given by

CEI (P) = σ (P)(uΦ(P) + ϕ(u)) (5)

u =F (Pbest ) − µ(P)

σ (P) (6)

where µ and σ denote the mean and variance of objective functionas estimated by Gaussian Process model at P . µ and σ map controlpoint set P to a real number signifying the estimate of the valueand variance of the objective function respectively. Φ and ϕ denotethe cumulative standard normal function and standard normalfunction in one dimension respectively. Bayesian optimisation hasits own set of hyperparameters, namely length-scaling factors li ,covariance amplitude σs , observation noise, and kernel parametersu and σ . �ese are optimised by maximising the marginal likelihoodof Gaussian process, as discussed in [2].

Table 1: Bayesian Optimisation parameters

Parameter ValueNumber of dimensions: n | P |

Range [-1000, 1000]Objective: F (P) Equation (1)

Acquisition: C(P) Expected ImprovementKernel: K(A,B) ARD Matern 5/2 kernel

Prior: P(F ) GP(0,K(θ = {li . . . , ln,σs }) + σ 2n I )

An initial set X of points are sampled using Latin hypercubesampling [40] which generates random numbers from a strati�edinput distribution. �e surrogate modelG is incrementally updatedto encode the observation history, from which the point maximis-ing the acquisition function is selected as next evaluation point.�e process is repeated until a speci�ed number of evaluations orlimited computation time. We experiment with various kernel func-tions, namely Matern 5/2 kernel, Matern 3/2 kernel and SquaredExponential kernel [37]. Various acquisition functions such asExpected Improvement [34], A-optimality criteria and Lower Con-�dence Bound [8]. �e chosen design parameters for Bayesianoptimisation is summarised in Table 1.

During a robot soccer game, the generated trajectories should besafe i.e. it should have a good distance with obstacles to avoid colli-sions. In our case, obstacle avoidance is simply achieved throughadding a penalty to the objective function if the current trajectory


Algorithm 1: Trajectory optimisation and Database formationusing Bayesian optimisationData: Start position SP(x ,y), Ending position EP(x ,y),

Starting velocity SV (x ,y), Ending velocity EV (x ,y),Obstacle position OP(x ,y), Number of control points j

Result: D is the database1 x = (SP ,EP , SV ,EV ,OP);2 P = {CP1,x ,CP1,y . . .CPj,x ,CPj,y };3 K(P1, P2): Kernel Function;4 C(P): Acquisition Function;5 N : Number of iterations;6 {X ,y} = ϕ;7 while iter < N do8 θ = Bayesian optimisation hyperparameters;9 Posterior Model Update:

Pr (F |D) ∝∫Pr (D |F ,θ )Pr (F )Pr (θ )dθ ;

10 Select Pi = argmaxP

C(P |Pr (F |D));

11 {X ,y} = {X ,y} ∪ {Pi , F (Pi )};12 end13 Best P∗ = argmax

Py;

14 B∗ = getTrajectory(P∗, SP, EP, SV, EV, OP) ; // refer Section 415 Update database D with input x , P∗ and GP Model as prior

Algorithm 2: Run-time Bayesian optimisationData: Start position SP(x ,y), Ending position EP(x ,y),

Starting velocity SV (x ,y), Ending velocity EV (x ,y),Obstacle position OP(x ,y), Database D

Result: B∗ is the optimal trajectory1 x = (SP ,EP , SV ,EV ,OP);2 for i=1 to m do3 Compute distance d(Di ,x);4 end5 Compute set I containing indices for the k smallest distances;6 �ery D to get k priors;7 Average k priors;8 Run Algorithm 1, and get optimised trajectory B∗;9 Return B∗;

intersects an opponent robot. �is results in the optimisation steprejecting the trajectories that collide with the obstacles. �e �eldboundaries are also taken as line obstacles, resulting in rejection oftrajectories with any point outside the soccer �eld.

3.2 Prior Database CreationIn a robot soccer match, trajectories must be generated within atime window of a few hundred milliseconds. Despite being samplee�cient, Bayesian optimisation is impractical for real time optimi-sation. Since we use Gaussian process as the surrogate model, bothprior and posterior are Gaussian distributions. To reduce trajectorygeneration time, we reuse prior from o�ine optimised trajecto-ries for online optimisation. Assuming that we have enough data

points of optimised trajectories, prior is rich with areas of samplepoints that lead to minimisation of objective function. �is resultsin a surrogate model which has lower uncertainty about the ob-jective function surface. It e�ciently trades o� exploration andexploitation, leading to reduced function evaluations and trajectorygeneration time.

We form a database by optimising trajectories through Bayesianoptimisation for various combinations of starting and ending posi-tions of our robots and obstacles, summarised in Algorithm 1. Wealso vary planning scenario with the starting and ending velocity,number of obstacles and number of control points which greatlyin�uence the generated trajectory. Trajectories generated for di�er-ent planning scenarios are optimised o�ine using a simulator. Foreach trajectory, the database stores the optimal placement of controlpoints as well as the prior parameters. Prior parameters are mainlyconstituted by surrogate model along with its mean function andkernel function. Other parameters such as variance in observationnoise, and acquisition function are also utilised. Planning scenariocomprising of starting and ending positions, velocities, number ofobstacles, number of control points and obstacle positions are alsostored. In this study, we simulate a total of 20, 000 random planningscenarios, and save them to the database.

3.3 Prior ReuseWith numerous starting and ending positions of �ve enemy obsta-cles and our bots, the space of possible con�gurations is really huge.�is makes storing all situations computationally prohibitive facil-itating the need to reuse previously optimised trajectories. Also,o�ine optimisation is done on a simulator and not on physicalrobots. A lot of factors such as lighting conditions, wheel mount,playing surface, etc. can alter playing conditions. �us, onlineoptimisation is necessary to adapt from simulator environment tocurrent playing conditions.

A�er o�ine optimisation, we have a database which can bequeried for identifying similar planning scenarios. A planningscenario is characterised through starting and ending positions,velocities, number of obstacles and obstacle positions. �ese char-acteristics are used as features for training a k-Nearest Neighbourmodel [7]. We use k-NN since it is a non-parametric technique thatdoesn’t require learning of model parameters. Also, the model isnot very sensitive to hyperparameters which eliminates the needto look for optimum hyperparameters.

�e k-Nearest Neighbour (k-NN) technique is based on con-structing a space partitioning tree on the training dataset. �econstructed tree can be searched at testing time to identify datapoints similar to test query. It uses a distance function to identifythe similarity between the test query point and the training dataset.Many distance functions can be utilised for measuring the similaritybetween two n-dimensional instances A,B. We utilise L1-distanced , as denoted by (7), as the distance function in this study. Ai andBi denotes the component of A and B along ith dimension.

d(A,B) =n∑i=1| Ai − Bi | (7)

Our method is general in the sense that other sophisticated machinelearning models that employ a notion of similarity can also be


employed. �e model identi�es the most similar planning scenarioand its corresponding prior using k-NN. �e prior comprising ofthe mean function and kernel function of surrogate model is usedto initialise Bayesian optimisation. It then optimises trajectory invery few steps until either convergence is achieved or computationtime runs out. Algorithm 2 summarises prior reuse for real timeBayesian optimisation.

4 TRAJECTORY GENERATIONIn this section, we explain how a trajectory is generated and tra-versal time is computed. Given the starting point, ending pointand control points, trajectory are parametrized by splines. Veloc-ity pro�ling algorithm ensures the kinematic constraints of thebot, and pro�les velocity vector at each point on the trajectory.Trajectory traversal time is calculated by accumulating time andvelocity a�ained between two successive points. A tracking algo-rithm ensures that the robot does not deviate from the generatedpath.

Cubic Bezier Splines [10] are chosen to represent trajectoriesbecause of their desirable smoothness properties. Two dimensionalpaths are obtained by parametrization of path along x and y direc-tions through the parameter u along the curve. Initial trajectory isgenerated by stitching together n segments of Cubic Bezier Splinesalongn − 1 control points. �e equation for the Cubic Bezier Splinesis given by (8), which follows the continuity and di�erentiabilityconstraints at the knots in (9), (10) and the boundary conditionsin (11). Initial robot velocity which is captured from the overheadcamera is incorporated in (12) resulting in formation of end slopecubic splines. Bi denotes ith segment of the spline with Pi0, Pi1,Pi2 and Pi3 as coe�cients.

Bi (u) =

(1 − u)3

3(1 − u)2u3(1 − u)u2

u3

T

Pi0Pi1Pi2Pi3

(8)

Bi (1) = Bi+1(0) (9)B′i (1) = B

′i+1(0),B

′′i (1) = B

′′i+1(0) (10)

Bi (0) = Pi0,Bi (1) = Pi3 (11)B′i (0) = vl (0),B

′i (1) = vl (1) (12)

�intic Bezier Splines [17] which have the additional propertyof curvature continuity at end points, are compared with theircubic counterparts in terms of trajectory traversal time, trackingerror and average velocity of the robot. Our method is inherentlybidirectional by taking account of the direction of velocity at thetime of trajectory generation.

4.1 Arc Length Re-parametrisationArc length re-parametrisation is required to compute the parameteru in (7), from the arc length S , which is equivalent to �nding pointon trajectory at which the arc length equals S . Re-parametrisationis implemented through Bezier splines [43] and inverting Beziercurves [42], of which former is �nally chosen because it providesbe�er performance. Equidistant points are sampled from u ∈ [0, 1],and arc length on each of these points is computed. �en, a 1DCubic Bezier Spline is ��ed on these points, approximating u as

a function of S . Now, ge�ing the corresponding u for a given arclength, is a simple evaluation of the cubic spline atu using equation(8).

4.2 Velocity Pro�lingVelocity pro�le must be generated incorporating robot dynamicsto minimise the error between observed and expected positionat any moment of time. While keeping the errors to minimum,velocity at each point is maximised while respecting the kino-dynamic constraints. On the trajectory B(u), we �x a set of Ncequidistant planning points pi along the arc length using arc lengthre-parametrisation. Planning points are just sampled from the gen-erated trajectory. We have assumed that the acceleration betweentwo planning points pi and pi+1 is constant, which holds true formost trajectories in practice due to selection of su�ciently manypoints. Each translational velocity vi corresponds to a planningpoint pi . Initial velocity v0 is set to initial velocity of robot, and�nal velocity vNc is set according to game-play. Forward consis-tency is established by accelerating with the maximum constrainedacceleration from point pi to reach pi+1 with maximum velocity.�e process is repeated in the reverse direction. �is ensures thatthe robot traverses between any two points obeying the constraints.�e total time needed to complete the trajectory is also obtained,once the velocity vi at each planning point pi is determined.

Assuming that the acceleration is constant between the twoplanning points pi and pi+1, we compute the time taken to reachany of the equidistant planning points using (13), where vi andvi+1 are determined from velocity pro�ling. Time taken to reachplanning points pi and pi+1 is denoted by ti and ti+1 respectively.∆s denotes the distance between pi and pi+1. �e time at the endpoint Nc is given by (14).

ti+1 = ti +2∆s

vi +vi+1(13)

totalTime = tNc (14)

4.3 Constraints�e kinodynamic constraints limit the maximum velocity vi thatcan be set at each planning point pi . �e maximum translationalvelocity vmax |ω for angular velocity ω at a point pi is given by(15), where ωmax and κi denote maximum rotational velocity andcurvature at that point, respectively.

vi ∈ [0,vmax |ω ],vmax |ω =ωmax|κi |

(15)

�e acceleration constraints at any planning point pi depends onthe velocities of the adjacent points pi−1 and pi+1. �e accelerationconstraints are symmetric for the motion planning to be smoothand to avoid slipping of the robot. At any planning point pi , vi ∈[vmin |at ,vmax |at ] for acceleration at can be a�ained using (16),and (17) where atmax ∆t denotes maximum translation acceleration.

vmin |at = vi−1 − atmax ∆t (16)

vmax |at = vi−1 + atmax ∆t (17)�e boundary conditions can now be obtained for translationalvelocities obeying atmax , where distance between any two planning


Figure 3: System Setup and Overview: Raw camera feed isused by vision module to identify robot positions and orien-tations. Strategy module initiates trajectory generation pro-cess, a�er which velocity commands are sent to the robots.

points pi , pi+1 is ∆s . �e reader can refer to [39] for details.

vmin |at =

{√v2i−1 − 2atmax ∆s v2

i−1 ≥ 2atmax ∆s

0 else(18)

vmax |at =√v2i−1 + 2atmax ∆s (19)

4.4 Tracking�e tracking module ensures that the robot follows the trajectoryby minimising the error between the current robot state and theexpected state at that moment of time. �e current tracker is verysimilar to [15]. Characteristic frequency ωn (t) of the system isgiven by (20) where vr , ωr are tangential and angular velocity,respectively. Controller gains k1, k2, k3 are given by (21) where ζrepresents damping coe�cient, and д is simply a parameter. Errorsin position (x ,y), orientation θ , and feed forward tangential andangular velocity are denoted by e1, e2, e3, ur1 and ur2 respectively.�e �nal tangential velocity v and angular velocity w sent to thebot are given by (22) and (23) respectively.

ωn (t) =√ωr (t)2 + дvr (t)2 (20)

k1 = k3 = 2ζωn (t),k2 = д |vr (t)| (21)v = ur1cos(e3) + k1e1 (22)

w = ur2 + sдn(ur1)k2e2 + k3e3 (23)

5 EXPERIMENTAL SETUP5.1 Robot DynamicsFederation of International Robot-Soccer Association (FIRA)3 or-ganises Mirosot league every year, in which a team of �ve mobilerobots compete against an opponent team to win a soccer match.A Mirosot robot is a di�erential drive robot with two wheels onthe sides and supportive caster wheels on the front and back. �erobot has zero turn radius, and is ��ed a cube of side length 7.5cm.�e system setup is described in Fig. 3. Raw values obtained fromcamera are passed through Kalman [29] �lter at 60fps, thereby up-dating ball and robot state. An NRF module then transmits thevelocities to the micro-controller ��ed on each robot. �ere is an3h�ps://www.�ra.net/

Table 2: A comparison of Bayesian Optimisation with �veother algorithms in terms of average time(in sec), trackingerror(calculated using (25)) and average velocity(in cm/s) byrunning them on our robots. �e best scores are bold-faced.

Planner Time(s) te Avg. Velocity(cm/s)

S-curve [27] 2.675 6.634 83.338Polar-Bidirectional [9] 4.190 4.836 42.536Dynamic Window [35] 2.800 7.067 85.049

PSO [33] 2.331 4.279 94.614�intic Spline [17] 2.472 4.631 90.470Proposed approach 2.231 4.024 96.903

internal Fuzzy-PID [41] loop running on the motors, which decidesthe velocity of motors.

5.2 Physical ConstraintsTaking into account wheel dimensions, Fuzzy-PID control loop, cir-cuit design and motors speci�cations, the maximum loaded velocitya�ainable by the robot is set to 200 cm/s. For calculating the radialacceleration, the robot is run on a circular path with a �xed radiusas in [18]. �e value of radial acceleration arad is increased in stepstill the slipping point. Using linear regression, radial accelerationis obtained as a function of radius of curvature r , given by (24).

arad = −5.92r + 700 (24)

5.3 SimulatorA simulator is also developed for testing the locomotion algorithms.It is specially designed to enable simulation with kinematic andphysical model of our robots. �e dimensions of the simulatedrobots, and actuator limits are set as the same as that of actualrobots. It discretises the transmission into steps of 16ms, accountsfor packet delay and adds noise as encountered while workingon real robots. �e mobile robot does not receive the sent veloci-ties instantaneously. �ere is a time lag between transmi�ing andreceiving of packets, which we de�ne as packet delay, and is cal-culated to be four packets i.e. 64ms. Due to the special care takenin designing the simulator, only minimal retuning of parameters isrequired and the same code works on both the simulator and thereal robots. �is has also proved useful in pre-tuning parametersin simulator, and then tuning them on real robots.

6 RESULTS AND DISCUSSIONWe analyse the performance of our approach on multiple experi-ments, and compare its performance with other competitive motionplanners. In the �rst set of experiments, we utilise Cubic BezierSplines [10] for trajectory generation, and Bayesian optimisationwith prior reuse for trajectory optimisation. Fig. 4 shows the gen-erated trajectory for various planners for di�erent starting andending points. �e �eld setup is similar to a robot soccer matchwith over-crowding of obstacles. S-Curve [26] algorithm generatessmooth path but the trajectory has very low radius of curvature atsome points. Due to kinematic constraints of the robot, the robotsslow down at these points leading to increase in total traversal


(a) S-curve (b) Dynamic Window (c) PSO (d) Our method

Figure 4: Trajectory generated by a) S-curve, b) DynamicWindow, c) PSO and d) OurMethod, for a set of start and end positionson our developed simulator. �eblue circle and red circle represent the start and end positions respectively. Black line from thecentre of the circles displays the orientation of the robot at those points, and obstacles are shown as blue squares. Trajectoryis denoted by a blue curve between start and end points. In case of Our Method, the position of generated control points areshown by black circles.

time. �e path generated by Dynamic Window [35] has got verysharp turns at the end points which cause slipping. Since the gen-erated trajectory is very close to the obstacles, the robot collideswith the obstacles while following the path. �e path generated byPSO [33] looks promising but it is computationally expensive withfew high curvature points. �ere are some cases in Fig. 4 wherethe trajectory is very close to the obstacles. In our experiments, the

population of PSO contains 15 particles, and takes around 100 iter-ations to converge. Trajectory generated by splines a�er BayesianOptimisation has smooth turns, and maintains good distance withobstacles at all points. �us, our method leads to trajectories thatonly avoid obstacles, but also reduce trajectory traversal time.

We plan a �xed set of manoeuvres for the robot comprising of aset of 10 starting and ending positions. �e task of the robot is to


Figure 5: Obtained control points as overlayed on a kerneldensity estimation plot for 300 randomly initialised startingand ending points. Obstacles are depicted using red squares,and control points using black dots. In density plot, darkerregions depict denser regions.

reach from starting point to end point in minimum time possiblewhile avoiding slipping. �e robot must also result in low trackingerrors for high-level a�acking and ball interception abilities. Track-ing error te is taken as the log of mean squared error given by (25),where xi,p , yi,p , xi,a , yi,a denote the planned and actual positionsof the robot for x and y coordinate respectively. n denotes the totallength of the trajectory. �e tracking error te might become largeeither due to slipping of wheels, or due to collision with anotherrobot.

te = log

∑10

√(xi,p − xi,a )2 + (yi,p − yi,a )2

n(25)

Table 2 summarises the performances on the above task. Weevaluate planners on average time taken to reach the destination,tracking error te and average velocity of the robot while followingthe path. Our method performs be�er than other approaches onall the metrics. Online methods like Dynamic Window [35] andS-curve [26] show promising results in average time taken andaverage velocity, but have high tracking errors te due to wheelslipping. �e tracking error te of Polar-Bidirectional [9] planneris small compared to other online methods, but the velocity ofthe robot at each point is less leading to high traversal time. �e�intic Spline [17] method results in sharp turns in some cases,which increase the average traversal time of the robot. Our methodgenerates trajectories with high radius of curvature at all points,which helps the robot to move with higher speeds, and reduces therisk of slipping. �us, the average speed is quite higher leadingto less traversal time and tracking error te , when compared withother methods.

It was also observed that on planning trajectories using S-curve,Polar-Bidirectional and Dynamic Window, the robot �rst reachesthe end point at a high velocity, over shoots and keep oscillatingaround the end point. �is property can also be seen by zooming

Figure 6: Performance comparison of Bayesian Optimisa-tion with (in red) and without (in green) reuse of prior in-formation from database.

into generated trajectories in Fig. 4. Also, these approaches didn’tconsider the orientation and velocity of the bot at end point, andreached the end point with an arbitrary velocity and orientation.�is led to failure in correctly intercepting the ball, and poor ballhandling performance.

We evaluate the computation time for online Bayesian optimi-sation for both cases, one involving prior reuse and other withoutprior reuse. Fig. 6 shows the number of iterations on the x-axis, andthe current best minima of objective function on y-axis. For priorreuse, we utilise the closest prior as obtained from k-Nearest Neigh-bour to initialise the optimisation. It can be seen that prior reusegreatly reduces the number of iterations required to minimize theobjective function. �e optimisation process converges in around15 iterations. Without prior use, it takes around 60 iterations toconverge to the minima. Although the minima achieved withoutreusing prior is slightly be�er, but is insigni�cant considering thattrajectory optimisation time is limited. �is shows that our methodis able to generate near optimal trajectories in a limited time se�ing.

Figure 5 shows the obtained control points for 300 di�erent com-binations of starting and ending point with �xed obstacle positions.We see that the obtained control points avoid obstacles for all com-binations, as no black point lies on the red square. �e controlpoints are concentrated in certain safe regions that are denoted bydarker green regions. �e generated trajectories not only avoidobstacles by traversing through the region in-between obstacles,but also reduce traversal time.

We also experiment with di�erent hyperparameter se�ings forBayesian optimisation. We evaluate the performance of di�erent ac-quisition functions, namely Expected Improvement [34] (EI), LowerCon�dence Bound [8] (LCB) and A-optimality criteria (Aopt). Wealso try di�erent kernel function, namely Matern 5/2 (MaternARD5),Matern 3/2 (MaternARD5) and Square Exponential (SEARD) [28].We use Automatic Relevance Determination (ARD) for all thesekernels. �e trajectory travel times for all di�erent combinations


Table 3: Trajectory traversal time (in seconds) for di�er-ent combinations of kernel and acquisition function inBayesian optimisation

Acquisition

KernelMaternARD3 MaternARD5 SEARD

EI 1.437 1.436 1.452LCB 1.436 1.436 1.458Aopt 1.449 1.472 1.526

Figure 7: Performance comparison in terms of number of it-erations required to converge to minima for di�erent com-binations of kernel and acquisition function

of kernel function and acquisition function are analysed for 20 dif-ferent starting and ending points. As shown in Table 3, we see thatdi�erent combinations lead to very similar results. �e best resultsare obtained using either EI or LCB as acquisition function, alongwith MaternARD5 or MaternARD3 as kernel function. However,the number of iterations required to reach minima is di�erent forthese combinations, as shown in Fig. 7. We see that using Mater-nARD5 as kernel function, and EI as acquisition function takes theminimum number of iterations.

In the next experiment, we �x the starting and ending positionsof our robot and the obstacles. Se�ing k = 6, we query the data-base using k-NN, and initialise the prior for Bayesian optimisation.�e prior is initialised by averaging the prior parameters acrossall the neighbours. We evaluate the performance of control pointsobtained a�er optimisation by measuring the trajectory traversaltime for a dense grid surrounding the region. Figure 8 shows theobtained contour plot, where bluer region denote regions minimis-ing trajectory traversal time. We validate the use of k-NN for priorreuse as the control point obtained a�er optimisation minimisesthe objective.

7 CONCLUSIONPerforming online optimisation under real time constraints is thekey limitation of common optimisation methods, especially in com-plex robotics tasks. We combine learning and planning to addressthis problem in the context of trajectory optimisation in robot soccer.

Figure 8: Contour plot for trajectory traversal time (in sec-onds) generated using control points optimised using priorreuse. �e bluer the region, lower the traversal time. Greenpoint denotes control point obtained a�er Bayesian optimi-sation. �e similar planning scenarios identi�ed k-NN areshown in yellow.

Our work utilises a database of previously optimised trajectories toe�ectively reduce optimisation time, and lead to time-e�cient tra-jectories. Speci�cally, we store resultant prior by running Bayesianoptimisation o�ine for di�erent input con�gurations. At test time,the closest planning scenario is queried using k-Nearest Neighboursapproach for reuse of prior. Bayesian optimisation is initialised withthis prior, and run for few function evaluations. We test our methodon multiple planning scenarios, where it outperforms traditionaloptimisation techniques.

We suggest that current optimisation technique are inadequatefor real time optimisation for complicated tasks such as planning.If we are to make signi�cant advances in this �eld, similar e�ortstargeting both theory and experimentation for combining learningand planning must be explored. Using more sophisticated machinelearning algorithms that can be�er capture the relation betweenplanning scenario and control point locations is le� to future work.�erying multiple prior from the database, and combining themappropriately for a rich prior is also an interesting future direction.

ACKNOWLEDGMENTSWe thank Prof. Sudeshna Sarkar, Prof. Alok Kanti Deb and Prof.Dilip Kumar Pratihar for their constant guidance and support as co-Principal Investigators of Kharagpur RoboSoccer Students’ Group(KRSSG). We also thank all the alumni and present members ofKRSSG especially Kumar Abhinav, Dhananjay Yadav and Mad-hav Mantri for addressing game play, mechanical and embeddedchallenges in robot soccer. �is work is supported by SponsoredResearch and Industrial Consultancy (SRIC), Indian Institute ofTechnology, Kharagpur.

REFERENCES[1] Remi Bardenet, Matyas Brendel, Balazs Kegl, and Michele Sebag. 2013. Collabo-

rative hyperparameter tuning.. In ICML (2). 199–207.[2] James S Bergstra, Remi Bardenet, Yoshua Bengio, and Balazs Kegl. 2011. Al-

gorithms for hyper-parameter optimization. In Advances in Neural InformationProcessing Systems. 2546–2554.

[3] Johann Borenstein and Yoram Koren. 1991. �e vector �eld histogram-fast obsta-cle avoidance for mobile robots. IEEE Transactions on Robotics and Automation 7,3 (1991), 278–288.


[4] Matyas Brendel and Marc Schoenauer. 2011. Instance-Based Parameter Tuningand Learning for Evolutionary AI Planning. In PAL 2011 3rdWorkshop on Planningand Learning. 5.

[5] Oliver Brock and Oussama Khatib. 1999. High-speed navigation using the globaldynamic window approach. In Robotics and Automation, 1999. Proceedings. 1999IEEE International Conference on, Vol. 1. IEEE, 341–346.

[6] Roberto Calandra, Andre Seyfarth, Jan Peters, and Marc Peter Deisenroth. 2016.Bayesian optimization for learning gaits under uncertainty. Annals of Mathe-matics and Arti�cial Intelligence 76, 1-2 (2016), 5–23.

[7] �omas Cover and Peter Hart. 1967. Nearest neighbor pa�ern classi�cation.IEEE transactions on information theory 13, 1 (1967), 21–27.

[8] Dennis D Cox and Susan John. 1992. A statistical method for global optimization.In Systems, Man and Cybernetics, 1992., IEEE International Conference on. IEEE,1241–1246.

[9] Alessandro De Luca, Giuseppe Oriolo, and Marilena Vendi�elli. 2001. Controlof wheeled mobile robots: An experimental overview. In Ramsete. Springer,181–226.

[10] Gerald Farin. 2014. Curves and surfaces for computer-aided geometric design: apractical guide. Elsevier.

[11] Nikolaus Hansen, Sibylle D Muller, and Petros Koumoutsakos. 2003. Reducingthe time complexity of the derandomized evolution strategy with covariancematrix adaptation (CMA-ES). Evolutionary computation 11, 1 (2003), 1–18.

[12] Holger Hoos and Kevin Leyton-Brown. 2014. An e�cient approach for assessinghyperparameter importance. (2014).

[13] KG Jolly, R Sreerama Kumar, and R Vijayakumar. 2009. A Bezier curve based pathplanning in a multi-agent robot soccer system without violating the accelerationlimits. Robotics and Autonomous Systems 57, 1 (2009), 23–33.

[14] Oussama Khatib. 1986. Real-time obstacle avoidance for manipulators and mobilerobots. In Autonomous robot vehicles. Springer, 396–404.

[15] Gregor Klancar and Igor Skrjanc. 2007. Tracking-error model-based predictivecontrol for mobile robots in real time. Robotics and Autonomous Systems 55, 6(2007), 460–469.

[16] R Kunigahalli and JS Russell. 1994. Visibility graph approach to detailed pathplanning in cnc concrete placement. Automation and robotics in construction XI(1994).

[17] Boris Lau, Christoph Sprunk, and Wolfram Burgard. 2009. Kinodynamic mo-tion planning for mobile robots using splines. In 2009 IEEE/RSJ InternationalConference on Intelligent Robots and Systems. IEEE, 2427–2433.

[18] Marko Lepetic, Gregor Klancar, Igor Skrjanc, Drago Matko, and Bostjan Potocnik.2003. Time optimal path planning considering acceleration limits. Robotics andAutonomous Systems 45, 3 (2003), 199–210.

[19] Andy Liaw and Ma�hew Wiener. 2002. Classi�cation and regression by random-Forest. R news 2, 3 (2002), 18–22.

[20] Daniel J Lizo�e, Tao Wang, Michael H Bowling, and Dale Schuurmans. 2007.Automatic Gait Optimization with Gaussian Process Regression.. In IJCAI, Vol. 7.944–949.

[21] Rajko Mahkovic and Tomaz Slivnik. 1997. Smooth Path Planning on the Basisof Reduced Visibility Graph. MODELLING IDENTIFICATION AND CONTROL(1997), 267–270.

[22] Ruben Martinez-Cantin. 2014. BayesOpt: a Bayesian optimization library fornonlinear optimization, experimental design and bandits. Journal of MachineLearning Research 15, 1 (2014), 3735–3739.

[23] Ruben Martinez-Cantin, Nando de Freitas, Eric Brochu, Jose Castellanos, andArnaud Doucet. 2009. A Bayesian exploration-exploitation approach for optimalonline sensing and planning with a visually guided mobile robot. AutonomousRobots 27, 2 (2009), 93–103.

[24] Ruben Martinez-Cantin, Nando de Freitas, Arnaud Doucet, and Jose A Castel-lanos. 2007. Active Policy Learning for Robot Planning and Exploration underUncertainty.. In Robotics: Science and Systems. 321–328.

[25] H Meng and PD Picton. 1992. A neural network for collision-free path planning.Arti�cial neural networks 2, 1 (1992), 591–4.

[26] Kim Doang Nguyen, I-Ming Chen, and Teck-Chew Ng. 2007. Planning algorithmsfor s-curve trajectories. In 2007 IEEE/ASME international conference on advancedintelligent mechatronics. IEEE, 1–6.

[27] Kim Doang Nguyen, Teck-Chew Ng, and I-Ming Chen. 2008. On algorithmsfor planning S-curve motion pro�les. International Journal of Advanced RoboticSystems 5, 1 (2008), 99–106.

[28] Carl Edward Rasmussen. 2006. Gaussian processes for machine learning. (2006).[29] Maria Isabel Ribeiro. 2004. Kalman and extended kalman �lters: Concept, deriva-

tion and properties. Institute for Systems and Robotics 43 (2004).[30] Benjamin Rosman, Majd Hawasly, and Subramanian Ramamoorthy. 2016.

Bayesian policy reuse. Machine Learning 104, 1 (2016), 99–127.[31] Alireza Sahraei, Mohammad Taghi Manzuri, Mohammad Reza Razvan, Masoud

Tajfard, and Saman Khoshbakht. 2007. Real-time trajectory generation for mobilerobots. In Congress of the Italian Association for Arti�cial Intelligence. Springer,459–470.

[32] M Saska, M Kulich, G Klancar, and J Faigl. 2006. Transformed net-collisionavoidance algorithm for robotic soccer. In Proceedings 5th MATHMOD Vienna-5thVienna Symposium on Mathematical Modelling.

[33] Martin Saska, Martin Macas, Libor Preucil, and Lenka Lhotska. 2006. Robot pathplanning using particle swarm optimization of Ferguson splines. In EmergingTechnologies and Factory Automation, 2006. ETFA’06. IEEE Conference on. IEEE,833–839.

[34] Ma�hias Schonlau, William J Welch, and Donald R Jones. 1998. Global versuslocal search in constrained optimization of computer models. Lecture Notes-Monograph Series (1998), 11–25.

[35] Marija Seder and Ivan Petrovic. 2007. Dynamic window based approach tomobile robot motion control in the presence of moving obstacles. In Proceedings2007 IEEE International Conference on Robotics and Automation. IEEE, 1986–1991.

[36] Zvi Shiller and Y-R Gwo. 1991. Dynamic motion planning of autonomous vehicles.IEEE Transactions on Robotics and Automation 7, 2 (1991), 241–249.

[37] Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical bayesianoptimization of machine learning algorithms. In Advances in neural informationprocessing systems. 2951–2959.

[38] Je�erson R Souza, Roman Marchant, Lionel O�, Denis F Wolf, and Fabio Ramos.2014. Bayesian optimisation for active perception and smooth navigation. InRobotics and Automation (ICRA), 2014 IEEE International Conference on. IEEE,4081–4087.

[39] Christoph Sprunk. 2008. Planning motion trajectories for mobile robots usingsplines. University of Freiburg (2008).

[40] Michael Stein. 1987. Large sample properties of simulations using Latin hyper-cube sampling. Technometrics 29, 2 (1987), 143–151.

[41] Antonio Visioli. 2001. Tuning of PID controllers with fuzzy logic. IEE Proceedings-Control �eory and Applications 148, 1 (2001), 1–8.

[42] Marcelo Walter and Alain Fournier. 1996. Approximate arc length parameteri-zation. In Proceedings of the 9th Brazilian symposium on computer graphics andimage processing. Citeseer, 143–150.

[43] Hongling Wang, Joseph Kearney, and Kendall Atkinson. 2002. Arc-length param-eterized spline curves for real-time simulation. In 5th international conference onCurves and Surfaces.

[44] Li Wang, Yushu Liu, Hongbin Deng, and Yuanqing Xu. 2006. Obstacle-avoidancepath planning for soccer robots using particle swarm optimization. In Roboticsand Biomimetics, 2006. ROBIO’06. IEEE International Conference on. IEEE, 1233–1238.

bayesian optimisation with prior reuse for motion planning

Documents