Omnidirectional Vision-Based Formation Control(cid:3)
Ren(cid:19)e Vidal
Omid Shakernia
Shankar Sastry
Department of Electrical Engineering and Computer Sciences
University of California at Berkeley, Berkeley CA 94720-1774
@eecs.berkeley.edu
rvidal,omids,sastry
g
f
Abstract
We consider the problem of distributed leader-follower formation control for
nonholonomic mobile robots equipped with paracatadioptric cameras. Our ap-
proach is to translate the formation control problem from the con(cid:12)guration space
into a separate visual servoing task for each follower. First, we present an algorithm
for in(cid:12)nitesimal multi-body motion segmentation from multiple paracatadioptric
views. We derive a rank constraint on the optical (cid:13)ows across many frames, from
which one can estimate the position of each leader in the paracatadioptric image
plane of the follower. Then, we study the relative leader-follower kinematics in
the catadioptric image plane and design a tracking controller for each follower us-
ing feedback linearization techniques. Finally, we present experimental results on
multi-body motion segmentation of a real image sequence and simulation results
on vision-based formation control.
1 Introduction
Formation control of unmanned ground and aerial vehicles is gaining signi(cid:12)cant impor-
tance in the control and robotics communities thanks to recent advances in communica-
tion and computer vision. Examples of formation control are ubiquitous in nature and
in man-made devices, from bird (cid:13)ight and (cid:12)sh swimming to air tra(cid:14)c control, satellite
clustering, automatic highways, and mobile robotics.
Previous work in formation control assumed that communication among the robots
is available and concentrated on the study of the stability and feasibility of formations.
Swaroop et.al [23] proposed the notion of string stability for line formations. They showed
that, although the pair-wise leader-follower system can be stable, the overall formation
can become unstable due to propagation of errors through the string. The authors
showed that a string formation is stabilizable when each follower has access to the leader’s
state. Pant et.al [16] generalized this notion to formations in a planar mesh, through
the concept of mesh stability. Tannner [25] concentrated on formations in acyclic graphs
and showed that the structure of the interconnections and the amount of information can
a(cid:11)ect the input-to-state stability of the formation. Fax et.al [8] analyzed the stability of
formations in arbitrary graphs and proposed a Nyquist-like stability criteria that can be
derived from the spectral properties of the graph Laplacians. Tabuada et.al [24] studied
the conditions under which a desired formation is feasible, i.e. whether it is possible
(cid:3)This research is supported by ONR grant N00014-00-1-0621.
to design a trajectory that both maintains the formation and satis(cid:12)es the kinematic
constraints of the agents. They showed that in order for a formation to be feasible,
certain di(cid:11)erential-geometric conditions need to be satis(cid:12)ed.
In the absence of communication, the formation control problem becomes quite chal-
lenging from a sensing viewpoint, since one is faced with the problem of simultaneously
estimating the motion of multiple moving objects. Previous work [7] avoids this di(cid:14)culty
by painting each leader of a di(cid:11)erent color and then using color tracking to identify them.
Here we consider the case in which the detection of the leaders is based solely on their
motion information on the image plane.
At present (see [13]), there are many perspective and orthographic structure from
motion algorithms for analyzing a static scene, i.e. a scene in which the camera is
moving and the world is static or viceversa. The study of dynamic scenes, i.e. scenes in
which both the camera and multiple objects move independently, is much more recent
and only partially understood. The main di(cid:14)culty relies on the need of simultaneous
motion estimation (rotation and translation of the target relative to the camera) and
motion segmentation (determination of the pixels in the image that belong to each moving
target). Previous work includes the case of points moving linearly with constant speed [12,
20] or in a conic section [1], multiple moving objects seen by an orthographic camera [5,
14], or motion segmentation from two perspective views [27]. For central panoramic
cameras [2], there exist two-view motion estimation algorithms both in the discrete [22,
10, 4] and in the di(cid:11)erential case [11, 26]. More recently, multi-frame algorithms were
proposed both in the discrete [21] and in the di(cid:11)erential case [18]. To the best of our
knowledge, there is no work on the analysis of dynamic scenes seen by a central panoramic
camera.
1.1 Contributions of this work
We consider the problem of distributed leader-follower formation control for nonholo-
nomic mobile robots equipped with paracatadioptric cameras. Our work di(cid:11)ers from
previous work on vision-based formation control for mobile robots, such as [6], in that
our control laws are de(cid:12)ned in the paracatadioptric image plane. Our approach is to
translate the formation control problem from the con(cid:12)guration space into a separate vi-
sual servoing control task for each follower. In Section 2, we develop an algorithm for
in(cid:12)nitesimal multi-body motion segmentation from multiple paracatadioptric views. We
derive a rank constraint on the paracatadioptric optical (cid:13)ows across many frames, from
which one can estimate the position of each leader in the image plane of the follower.
In Section 3 we derive the leader-follower dynamical equations in the paracatadioptric
image plane using a nonholonomic kinematic model for the motion of the robots and
use feedback linearization techniques to design a tracking controller for each follower. In
Section 4 we present experimental results on multi-body motion segmentation of a real
image sequence. We also present simulation results validating our vision-based formation
control scheme.
2 Paracatadioptric Motion Segmentation
2.1 Paracatadioptric projection model
A paracatadioptric camera [9] projects a 3D point q = (X; Y; Z)T onto the image plane
by (cid:12)rst projecting q onto the surface Z = 1
1) which represents a paraboloidal
mirror of focal length 1=2 with focus at the origin; and then orthographically projecting
the back-projection ray b onto the plane Z = 0 (see Figure 1). The composition of these
two projections gives:
2 (X 2 + Y 2
(cid:0)
=
x
y
(cid:20)
(cid:21)
(cid:0)
1
Z + pX 2 + Y 2 + Z 2
X
Y
(cid:20)
(cid:21)
,
1
(cid:21)
X
Y
(cid:20)
(cid:21)
:
PSfrag replacements
(1)
1
b
q
image plane
O
x
Figure 1: Showing the projection model for paracatadioptric cameras, and the back-
projection ray b associated with image point x.
2.2 Paracatadioptric optical (cid:13)ow
R3, then
If the camera undergoes a linear velocity V
q + V . The motion of the
the coordinates of q in the camera frame evolve as _q = (cid:10)
camera induces a motion in the paracatadioptric image plane, which can be computed by
di(cid:11)erentiating (1) with respect to time. We showed in [18] that the optical (cid:13)ow ( _x; _y)T
induced by a paracatadioptric camera undergoing a motion ((cid:10); V ) is given by:
R3 and an angular velocity (cid:10)
(cid:2)
2
2
(cid:21)
where z , (x2 + y2
(cid:20)
_x
_y
y
(cid:0)
x
y2
xy z
z
x2
(cid:0)
xy
(cid:20)
(cid:0)
1)=2 and (cid:21) , (cid:0)
(cid:0)
=
(cid:0)
(cid:10) +
1+z
x2
1
(cid:21) ”
(cid:0)
1+z
xy
(cid:0)
1+z
xy
(cid:0)
1+z
1+z
(cid:0)
1+z
y2
x
1+z
y
1+z #
V:
(cid:21)
Z + pX 2 + Y 2 + Z 2.
If the camera is attached to a mobile robot moving on the ground plane with the
optical axis parallel to the Z axis, then the angular velocity is (cid:10) = (0; 0; (cid:10)z)T and the
linear velocity is V = (Vx; Vy; 0)T . Thus the optical (cid:13)ow equation can be simpli(cid:12)ed as:
_x
_y
=
y
(cid:0)
x
(cid:20)
(cid:21)
(cid:20)
(cid:21)
(cid:10)z +
1+z
x2
1
(cid:21) ”
(cid:0)
1+z
xy
(cid:0)
1+z
xy
(cid:0)
1+z
y2
1+z
1+z # (cid:20)
(cid:0)
Vx
Vy(cid:21)
:
(2)
2.3 Paracatadioptric Motion Segmentation
Let (xi; yi), i = 1; : : : ; n, be a pixel in the zero-th frame, and imagine we are given
measurements of its optical (cid:13)ow ( _xij; _yij) in frame j = 1; : : : ; m relative to the zero-th.
If the scene is static, then from (2) we get
_xij
_yij
= SiM T
j
(cid:2)
(cid:3)
where
Si =
yi
xi (cid:0)
h
x2
1+zi(cid:0)
i
(1+zi)(cid:21)i
xiyi
(1+zi)(cid:21)i
(cid:0)
y2
1+zi(cid:0)
i
(1+zi)(cid:21)i
5; Mj =
R1
(cid:2)
2
i
0 (cid:10)zj Vxj Vyj
(cid:10)zj
0
0
0
Vxj Vyj(cid:21)
(cid:20)
2
R2
(cid:2)
5:
Therefore the optical (cid:13)ow matrix W
2m satis(cid:12)es
Rn
(cid:2)
2
_x11
…
_xn1
_y11
…
_yn1
(cid:1) (cid:1) (cid:1)
(cid:1) (cid:1) (cid:1)
(cid:1) (cid:1) (cid:1)
(cid:1) (cid:1) (cid:1)
_x1m _y1m
…
…
_xnm _ynm
3
7
5
W , 2
6
4
ST
n ]T
= SM T ;
(3)
2 (cid:1) (cid:1) (cid:1)
1 ST
where S = [ST
2
R2m
5 is the motion matrix. We conclude that, for a static scene taken from a paracata-
(cid:2)
dioptric camera moving on a plane, the collection of optical (cid:13)ows across many frames
lies on a 5-dimensional subspace of R2m.
5 is the structure matrix and M = [M T
Rn
(cid:2)
1 M T
2 (cid:1) (cid:1) (cid:1)
m]T
M T
2
For a dynamic scene with k independent motions, the optical (cid:13)ow matrix can be
decomposed as
W = 2
S1
…
0
(cid:1) (cid:1) (cid:1)
. . .
0
…
Sk
M T
1
…
M T
k
2
3
= SM T
3
(4)
2
2
7
5
6
7
4
5
5k and M
Rn
(cid:2)
(cid:1) (cid:1) (cid:1)
6
4
5k. We showed in [19] that the block diagonal struc-
R2m
where S
(cid:2)
ture of W implies that its leading singular vector will have the same value for pixels
corresponding to the same motion. Furthermore, the eigenvector has a di(cid:11)erent value for
pixels corresponding to di(cid:11)erent motions. This gives an e(cid:11)ective criterion for segmenting
the pixels of the current frame into a collection of k groups corresponding to the inde-
pendent motions. One of these groups corresponds to the motion of the camera, which
is generated by the static background. For a formation control scenario, this group can
always be identi(cid:12)ed as the largest one.
3 Omnidirectional Vision-Based Formation Control
In the previous section, we presented the equations of motion of a pixel in the paracata-
dioptric image plane and proposed an algorithm for segmenting the image into k groups
corresponding to independent motions. In doing so, we assumed that the camera and
the observed objects undergo arbitrary motions. In this section, we derive the equations
of motion in the particular case of a leader-follower con(cid:12)guration in the XY plane, with
the paracatadioptric camera mounted on-board the follower. We assume that the mount-
ing is such that the coordinate system of the on-board camera coincides with the robot
coordinate system, thus the camera’s optical center is located at (X; Y; Z) = 0 in the
follower frame. We then derive a tracking controller for each follower to maintain the
desired formation.
3.1 Relative leader-follower kinematics
Let us start with the simplest case of a single follower tracking a leader. Let (Rf (t); Tf (t))
2
SE(3) and (R‘(t); T‘(t))
SE(3) be the poses of the follower and that of the leader at
R3 be a point located on the
time t with respect to a (cid:12)xed reference frame. Let q
2
2
leader but written in the follower frame. The coordinates of q relative to the reference
frame are q‘(t) = R‘(t)q + T‘(t) and the coordinates of q relative to the follower frame
are:
q‘f (t) = RT
f (t)R‘(t)q + RT
f (t)(T‘(t)
Tf (t)):
(cid:0)
Di(cid:11)erentiating (5) yields:
_q‘f = ( _RT
f R‘ + RT
f
_R‘)q + _RT
f (T‘ (cid:0)
Tf ) + RT
f ( _T‘ (cid:0)
_Tf ):
Combining (5) and (6) gives:
_q‘f = ( _RT
f Rf + RT
f
_R‘RT
‘ Rf )q‘f + RT
p ( _Te (cid:0)
_Tp (cid:0)
_ReRT
e (Te (cid:0)
Tp)):
so(3) be the skew-symmetric matrix generating the cross product, i.e.
(cid:10)
Let
2
R3. Since _RRT
for all q
b
de(cid:12)ne the angular velocities (cid:10)f ; (cid:10)‘ 2
(cid:10)‘ = _R‘RT
‘
2
2
Combining (7) and (8) yields:
c
so(3), [RT (cid:10) = RT
(cid:10)R and _RT R =
R3 by:
b
(cid:10)f = _Rf RT
f :
and
q
(cid:2)
RT _RRT R [15], we may
(cid:10)q = (cid:10)
(cid:0)
b
_q‘f = [RT
f ((cid:10)‘ (cid:0)
(cid:10)f )]
(cid:2)
q‘f + RT
f ( _T‘ (cid:0)
Tf )) ,
(cid:10)‘f q‘f + V‘f ;
(9)
where (cid:10)‘f and V‘f , de(cid:12)ned by the equation above, are the angular and translational
velocities of the leader relative to the follower.
d
c
_Tf (cid:0)
(cid:10)‘(T‘ (cid:0)
c
3.2 Nonholonomic mobile robot dynamics
We consider a nonholonomic kinematic model for the dynamics of the ground robots.
The dynamics of the i-th robot are given by:
_Xi = vi cos (cid:18)i;
_Yi = vi sin (cid:18)i;
_(cid:18)i = !i
where the state (Xi; Yi; (cid:18)i)
velocities, respectively. Letting Ti = (Xi; Yi; 0)T
exp(
SO(3), (cid:10)i = (0; 0; !i)T
SE(2), and the inputs vi and !i are the linear and angular
R3, Ri =
R3, e3 = (0; 0; 1)T
R3 and replacing in (9) we obtain:
e3(cid:18)i)
2
2
2
2
b
(cid:10)‘f =
!f ); V‘f =
vf +
(cid:0)2
3
1
0
0
F‘f
0
(cid:20)
(cid:21)
2
0
0
1
2
4
(!‘ (cid:0)
3
5
where F‘f , F‘f (T‘; Tf ; (cid:18)‘; (cid:18)f ; v‘; !‘)
5
R2 is given by:
4
2
cos((cid:18)f ) sin((cid:18)f )
sin((cid:18)f ) cos((cid:18)f )
cos((cid:18)‘ (cid:0)
sin((cid:18)‘ (cid:0)
(cid:20)
(cid:18)f )
(cid:18)f )
(cid:21)
v‘ (cid:0)
(cid:20)
(cid:0)
(Y‘ (cid:0)
(cid:0)
X‘ (cid:0)
Yf )
Xf (cid:21)
(cid:21) (cid:20)
!‘:
(5)
(6)
(7)
(8)
(10)
(11)
3.3 Paracatadioptric leader-follower dynamics
Replacing the expressions for (cid:10)‘f and V‘f in (2), we obtain the following equations for
the motion of a pixel in the paracatadioptric plane:
_x
_y
(cid:20)
(cid:21)
=
(cid:0)”
x2
1+z
y
(cid:0)
(1+z)(cid:21) (cid:0)
xy
x#(cid:20)
(cid:0)
(1+z)(cid:21)
vf
+
!f (cid:21)
”
1)=2 and (cid:21) = 2Z=(x2 + y2
x2
1+z
(cid:0)
(1+z)(cid:21)
xy
(1+z)(cid:21)
(cid:0)
xy
y
(cid:0)
(1+z)(cid:21) (cid:0)
y2
1+z
x#(cid:20)
(cid:0)
(1+z)(cid:21)
F‘f
!‘ (cid:21)
:
Since z = (x2 + y2
1) = Z=z, if we assume a ground
plane constraint, i.e. if we assume that Z = Zground < 0 is known, then we can write the
equations of motion of a pixel as the drift-free control system:
(cid:0)
(cid:0)
_x
_y
(cid:20)
(cid:21)
= H(x; y)uf + G(x; y)d‘f
(12)
R3
where uf = (vf ; !f )
can be thought of as an external input that depends on the state and control action of
the leader and the state of the follower.
R2 is the control action for the follower and d‘f = (F‘f ; !‘)
2
2
3.4 Omnidirectional visual servoing
We now design a control law for the follower to keep a desired distance rd and an angle (cid:11)d
from the leader. Although the desired formation is usually speci(cid:12)ed in the con(cid:12)guration
space, under the ground plane assumption such a formation can always be converted
into a desired formation in image space. Hence we assume that we are given a desired
pixel location (xd; yd) for the leader in the image plane of the follower, where (xd; yd) =
(rd cos((cid:11)d); rd sin((cid:11)d)).
We now apply feedback linearization to the control system (12) with output (x; y).
We observe that the system has a well de(cid:12)ned vector relative degree1 of (1; 1) for all
pixels (x; y) such that H(x; y) is of rank 2, i.e. whenever x
= 1. We
= 1 from now on2. In this case, the relative degree of
assume that x
the system is 1 + 1 = 2 thus the zero dynamics of the system are trivially exponentially
minimum phase. Therefore the control law
= 0 and x2 + y2
= 0 and x2 + y2
uf =
1
H(x; y)(cid:0)
G(x; y) d‘f +
(cid:0)
(cid:18)
k1(x
k2(y
xd)
yd)
(cid:0)
(cid:0)
(cid:20)
(cid:21)(cid:19)
(13)
results in a locally exponentially stable system around (xd; yd) whenever k1 > 0 and
k2 > 0.
In order to achieve exact inversion and tracking using the control law (13), it is
R2. Although
necessary to measure the e(cid:11)ect of the unknown external input G‘f d‘f 2
this term is a function of the state and control of the leader and the state of the follower,
we emphasize here that it is not necessary to measure any of these quantities. Instead, we
only need to estimate the two-dimensional vector G‘f d‘f . Furthermore, such a vector can
be easily estimated using the image measurements provided by the motion segmentation
techniques developed in Section 2.3 as we show below.
1See [17] page 408 for the de(cid:12)nition of relative degree.
2The degenerate con(cid:12)guration x = 0 is due to the nonholonomy of the follower’s dynamics, while the
degenerate con(cid:12)guration x2 + y2 = 1 corresponds to the projection of the horizon, hence it is due to the
paracatadioptric camera model.
6
6
6
6
Let (xw; yw) and ( _xw; _yw) be the position and optical (cid:13)ow of a pixel that corresponds
to a static 3D point in the world. From (12), the velocities of the follower causing that
optical (cid:13)ow are given by3:
1
uf = H(xw; yw)(cid:0)
:
_xw
_yw (cid:21)
(cid:20)
(14)
Notice that in the presence of noise one may improve the estimation of uf in (14) by
using more than one pixel and solving the equations in a least squares sense.
Now let (x‘; y‘) and ( _x‘; _y‘) be the position and optical (cid:13)ow of a pixel that corresponds
to a 3D point on the leader. From (12) we have:
G(x‘; y‘)d‘f =
1
H(x‘; y‘)H(xw; yw)(cid:0)
(15)
_x‘
_y‘(cid:21)
(cid:20)
(cid:0)
:
_xw
_yw(cid:21)
(cid:20)
4 Experimental Results
We tested the proposed motion segmentation algorithm in a real image sequence. Fig-
ure 2(a) shows one out of 200 frames taken by a paracatadioptric camera observing two
moving robots. The optical (cid:13)ow was computed using Black’s algorithm [3]. Figure 2(b)
shows the results of applying the segmentation algorithm described in Section 2.3. The
image sequence is correctly segmented into two independent motions.
(a)
(b)
Figure 2: An example of motion segmentation based on paracatadioptric optical (cid:13)ow.
(a) One frame of the sequence with optical (cid:13)ow superimposed. (b) Segmentation results.
PSfrag replacements
L
F2
F1
Figure 3: A V-Formation con(cid:12)guration.
3Notice that since the point in 3D space is static, i.e. (vl; !l) = (0; 0), the second term in (12) is zero.
t=1
t=3
t=4
−4
−3
−2
−1
1
2
3
4
−3
−2
−1
0
1
2
3
4
−1
0
1
3
4
5
0
1
2
4
5
6
t=10
6
t=66
2
t=20
t=88
3
t=34
11
t=94
1
2
3
4
5
6
4
5
7
8
8
9
10
11
12
13
9
10
12
13
2
1
0
−1
−2
10
9
8
7
6
9
8
7
6
5
2
3
4
5
6
1
2
3
4
5
9
10
11
12
13
14
10
11
12
13
14
Leader and followers trajectories
−1
−2
2
1
0
2
1
0
−1
−2
4
3
2
1
0
12
10
8
6
4
2
0
−2
−2
t=0
0
t=5
t=50
−1
−2
2
1
0
2
1
0
−1
−2
11
10
9
8
7
0.9
0.88
0.86
0.84
0.82
0.8
0.78
0.76
0.74
0.72
0.7
0
−1
−2
2
1
0
4
3
2
1
0
−1
6
5
4
3
2
30
20
10
0
−10
−20
−30
0
Follower to Leader Distance in pixels
Follower to Leader Angle in degrees
0
2
4
6
8
10
12
14
Follower 1
Follower 2
Leader
Follower 1
Follower 2
Follower 1
Follower 2
10
20
30
40
50
60
70
80
90
100
10
20
30
40
50
60
70
80
90
100
Figure 4: Simulation results for a V-Formation. For t
[0; 10] the followers (cid:13)ip positions
from the initial con(cid:12)guration to the desired one. For t > 10 the followers maintain their
formation into a full circle and then into a line.
2