A manipulator (or a robot, in general) can be seen as a set of rigid bodies (its links) that move (translate and rotate) as the robot's motors are actuated.
It is important to know how to describe the position and orientation of each link as time varies. This is especially true for the end-effector.
For this, we need to know how to describe these bodies in space.
We will start learning how to describe rotations.
We have the following definition:
Definition: A rotation matrix is a matrix \(Q \in \mathbb{R}^{3 \times 3}\) with two properties:
where \(I_{3 \times 3}\) is the identity matrix of order 3.
The first condition (\(Q^TQ=I_{3 \times 3}\)) has a geometric interpretation. Suppose it is written as:
$$ Q = \LARGE{\Big(}\normalsize{}\begin{array}{ccc}
Q_{xx} & Q_{xy} & Q_{xz} \\
Q_{yx} & Q_{yy} & Q_{yz} \\
Q_{zx} & Q_{zy} & Q_{zz}
\end{array}\LARGE{\Big)}\normalsize{} = \left(\begin{array}{ccc}
Q_{x} & Q_y & Q_z
\end{array}\right)$$
where \(Q_x, Q_y\) and \(Q_z\) are the column vectors that form the matrix.
Then, reading the first condition in terms of the columns, we deduce that they are orthonormal, that is, they describe vectors that are orthogonal to each other and of unit length:
$$\begin{eqnarray} && Q_x^TQ_y = Q_x^TQ_z = Q_y^TQ_z = 0 \nonumber \\ && Q_x^TQ_x = Q_y^TQ_y = Q_z^TQ_z = 1. \nonumber \end{eqnarray}$$
In other words, if we choose a reference frame (a set of three orthogonal unit vectors) \(\mathcal{F}_0\) and consider each column of \(Q\) as a vector written in this reference frame, we will have, after drawing the three vectors corresponding to the three columns, another set of orthogonal unit vectors, that is, another reference frame, \(\mathcal{F}_1\).
The second condition, (\(\det(Q)=+1\)) implies that this new reference frame \(\mathcal{F}_1\) is positively oriented according to the right-hand rule.
This means that if you place your right hand with your thumb on the \(x\) axis and your index finger on the \(y\) axis, then the \(z\) axis will be in the palm of your hand.
This procedure (interpreting the columns of a rotation matrix \(Q\) as vectors in a reference frame \(\mathcal{F}_0\), drawing them, and obtaining another reference frame \(\mathcal{F}_1\)) is very relevant and will be addressed soon.
We then have the first interpretation for a rotation matrix \(Q\):
Note that the two reference frames have the same center!
Let's now move on to the second interpretation of a rotation matrix \(Q\):
A rotation matrix \(Q\) implements the rotation, around a specific axis written in a reference frame \(\mathcal{F}_0\) - following the right-hand rule - and a specific angle, of a point!
Example: Consider the rotation matrix \(Q\): $$Q = \Large{\Bigg(}\normalsize{}\begin{array}{ccc} \frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2} & 0 \\ \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} & 0 \\ 0 & 0 & 1 \end{array}\Large{\Bigg)}\normalsize{}.$$
Consider a reference frame \(\mathcal{F}_0\), and a point that is written as a column vector \(p\) in \(\mathcal{F}_0\). Then, the vector \(p'=Qp\) represents, also written in \(\mathcal{F}_0\), the point \(p\) rotated by 45 degrees around the \(z_0\) axis.
Then, if: $$p = \Large{\Bigg(}\normalsize{}\begin{array}{c} \frac{\sqrt{2}}{2} \\ \frac{\sqrt{2}}{2} \\ \frac{1}{2} \end{array}\Large{\Bigg)}\normalsize{} \ \ \ , \ \ \ p' = Qp = \Large{\Bigg(}\normalsize{}\begin{array}{c} 0 \\ 1 \\ \frac{1}{2} \end{array}\Large{\Bigg)}.$$
The point \(p'\) is the point \(p\) rotated by 45 degrees around the \(z_0\) axis.
In UAIBot it is possible to calculate the axis and angle (in radians) of a rotation matrix using the function Utils.axis_angle:
import numpy as np
import uaibot as ub
Q = np.array([[0,-1,0],[1,0,0],[0,0,1]])
axis, angle = ub.Utils.axis_angle(Q)
print(axis) #Prints [0. 0. 1.]
print(angle) #Prints 1.570 rad
Note that the axis is written in a generic reference frame \(\mathcal{F}_0\). The axis is contextualized when we apply the rotation matrix to a point \(p\) written in a specific reference frame.
Two consecutive rotations represented by matrices \(Q_1\) and \(Q_2\) (in that order!) in a reference frame \(\mathcal{F}_0\) form another rotation in the reference frame \(\mathcal{F}_0\) represented by a rotation matrix \(Q_3=Q_2Q_1\) (the product of the matrices).
Note the order! The second rotation comes first in the product! More recent rotations come to the left!
Also note that the rotation axes are always written in the reference frame \(\mathcal{F}_0\), which is fixed throughout the rotations.
Attention! rotations do not generally commute: $$Q_2Q_1 \not= Q_1Q_2.$$
Example:
$$Q_1 = \large{\Bigg(}\normalsize{}\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{array}\large{\Bigg)}\normalsize{} \ \ , \ \ Q_2 = \large{\Bigg(}\normalsize{}\begin{array}{ccc} 0 & 0 & 1 \\ 0 & 1 & 0 \\ -1 & 0 & 0 \end{array}\large{\Bigg)}\normalsize{}.$$
The first matrix represents a 90-degree rotation in x, and the second 90 degrees in y.
Then:
$$Q_1Q_2 = \large{\Bigg(}\normalsize{}\begin{array}{ccc} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array} \large{\Bigg)}\normalsize{}\ \ , \ \ Q_2Q_1 = \large{\Bigg(}\normalsize{}\begin{array}{ccc} 0 & 1 & 0 \\ 0 & 0 & -1 \\ -1 & 0 & 0 \end{array} \large{\Bigg)}\normalsize{}.$$
The first matrix is a rotation around the axis \(r=( \frac{\sqrt{3}}{3} \ \frac{\sqrt{3}}{3} \ \frac{\sqrt{3}}{3} )^T\) of an angle of 2.09 radians.
The second matrix is a rotation around the axis \(r=( \frac{\sqrt{3}}{3} \ \frac{\sqrt{3}}{3} \ -\frac{\sqrt{3}}{3} )^T\) of an angle of 2.09 radians.
We have the elementary rotations, rotations in x, y, and z:
$$R_x(\theta) = \Large{\Bigg(}\normalsize{}\begin{array}{ccc} 1 & 0 & 0 \\ 0 & \cos(\theta) & -\sin(\theta) \\ 0 & \sin(\theta) & \ \ \cos(\theta) \end{array}\Large{\Bigg)}\normalsize{}.$$
$$R_y(\theta) = \Large{\Bigg(}\normalsize{}\begin{array}{ccc} \ \cos(\theta) & 0 & \sin(\theta) \\ 0 & 1 & 0 \\ -\sin(\theta) & 0 & \cos(\theta) \end{array}\Large{\Bigg)}\normalsize{}.$$
$$R_z(\theta) = \Large{\Bigg(}\normalsize{}\begin{array}{ccc} \cos(\theta) & -\sin(\theta) & 0 \\ \sin(\theta) & \ \ \cos(\theta) & 0 \\ 0 & 0 & 1 \end{array}\Large{\Bigg)}\normalsize{}.$$
In UAIBot, the elementary rotations are implemented:
import uaibot as ub
theta=3.14/2
Rx = ub.Utils.rotx(theta)
Ry = ub.Utils.roty(theta)
Rz = ub.Utils.rotz(theta)
In this case, the matrices will be 4x4 instead of 3x3, but you can recover the rotation matrices by discarding the last row and the last column. The reason why these matrices are embedded in the 4x4 matrix will become clear soon.
Rotations around an arbitrary axis \(r\) and an angle \(\theta\) are also implemented
import uaibot as ub
theta=3.14/2
r = [1,2,3]
R = ub.Utils.rot(r, theta)
The axis \(r\), if not normalized, is normalized internally in the function. The length of the vector \(r\) does not matter.
As before, the matrices will be 4x4 instead of 3x3, but again, you can recover the rotation matrices by discarding the last row and the last column.
Theorem: every rotation matrix \(Q\) can be decomposed into:
$$Q = R_z(\alpha)R_y(\beta)R_x(\gamma).$$
The angles \(\alpha, \beta, \gamma\) are called the Euler angles of the rotation.
There are other conventions for Euler angles (e.g., \(R_x(\alpha)R_y(\beta)R_x(\gamma)\)).
In UAIBot, the Euler angles can be calculated using the function Utils.euler_angles
import uaibot as ub
import numpy as np
Q = np.array([[0,0,1],[1,0,0],[0,1,0]])
alpha, beta, gamma = ub.Utils.euler_angles(Q)
print([alpha, beta, gamma])
Note that a rotation matrix has 9 numbers.
However, these 9 numbers cannot be chosen arbitrarily! That is, not every 3x3 matrix is a rotation matrix.
It can be shown that the constraint \(Q^TQ = I_{3 \times 3}\) imposes that there are only 3 degrees of freedom to choose the numbers.
So 9 numbers seem to be an uneconomical representation of a rotation. Are there others?
Yes! For example, Euler Angles can be used to represent a rotation with only 3 numbers (the minimum amount of information).
This representation has problems, however. For example: how to compose two rotations represented by Euler angles? In rotation matrices, just multiply them.
The representation is not unique! For example, the triples \(\alpha = t, \beta = \frac{\pi}{2}, \gamma = t\) represent the same rotation for all \(t\).
It is discontinuous! Rotations that are close can have radically different Euler angles. This hinders differentiation.
There are others. For example, Quaternions represent a rotation with 4 numbers.
It is almost unique (only two quaternions for each rotation), and it is a continuous representation.
We will use transformation matrices in this course, however, as they are more common in the robotics literature.
We have seen that we have two interpretations for a rotation matrix:
(A): It represents a reference frame \(\mathcal{F}_1\) written in a reference frame \(\mathcal{F}_0\)...
(B): It represents a rotation operation around a specific axis and angle written in a reference frame \(\mathcal{F}_0\)...
What connects the two?
Consider a reference frame \(\mathcal{F}_0\). In this reference frame, the axes \(x_0\), \(y_0\), and \(z_0\) are written as the vectors \(x_0=(1 \ 0 \ 0)^T, y_0 = (0 \ 1 \ 0)^T\) and \(z_0 = (0 \ 0 \ 1)^T\).
If we apply the rotation specified by \(Q\) (interpretation (B)) to each of these vectors, we will obtain new vectors \(x_1=Qx_0, y_1=Qy_0, z_1=Qz_0\)...
These three vectors, when considered as written in \(\mathcal{F}_0\), form a new positively oriented reference frame \(\mathcal{F}_1\), which is exactly the same as we would obtain using interpretation (A)!
From the point of view of Physics, a rigid body is an idealized object that cannot undergo deformations, contractions, expansions, or breaks due to the forces acting on it.
Obviously, it is an approximation, and whether an object is a rigid body or not depends on the forces we are considering.
For reasonable forces, we can say that a steel cylinder is a rigid body, while a beach ball is not.
From the point of view of Mathematics, a rigid body is a set in which, for any two particles \(P_1\) and \(P_2\) of it, the distance between them remains constant over time.
In other words, if \(p_1(t)\) and \(p_2(t)\) are the positions of two particles \(P_1\) and \(P_2\) over time, written with respect to a reference frame \(\mathcal{F}_0\), then for all times \(t_A\) and \(t_B\) we have: $$\|p_1(t_A)-p_2(t_A)\| = \|p_1(t_B)-p_2(t_B)\|$$
where \(\| \cdot \|\) is the Euclidean norm of the vector.
We now want to make the mathematical description of the movement of a rigid body.
By this, we mean describing the trajectory of all the particles that make up a rigid body.
Note that this would be extremely difficult for a generic object, such as a beach ball. After all, the beach ball can translate, rotate, contract, distort, etc... in infinite ways, extremely complex.
Fortunately, for a rigid body, the movement of the particles is severely limited, which allows us to make a concise description of the movement.
Before understanding how we can do this, we need to understand what a rigid transformation is.
Definition: A rigid transformation is a function \(T: \mathbb{R}^3 \mapsto \mathbb{R}^3\) with the following property: for any two points \(p_1\) and \(p_2\):
$$\|T(p_1)-T(p_2)\|=\|p_1-p_2\|.$$
In other words, a rigid transformation preserves the distance between points!
We have the following theorem:
The interpretation of this theorem is that every rigid transformation is composed of three sub-operations:
Note that, in all this, a reference frame \(\mathcal{F}_0\) is implied.
We will work here with proper rigid transformations, without reflection:
$$T(p) = Qp+s$$
that is, a rotation and a translation.
Obviously, there is a connection between the movement of rigid bodies and a rigid transformation.
Let \(\mathcal{P}(t_A), \mathcal{P}(t_B) \subseteq \mathbb{R}^3\) be sets representing the positions of the particles of the same rigid body \(\mathcal{P}\) at two different times, \(t=t_A\) and \(t=t_B\).
Then there exists a unique rigid transformation \(T_{AB}\) such that every point in \(\mathcal{P}(t_B)\) can be obtained from a point in \(\mathcal{P}(t_A)\) by applying \(T_{AB}\). Mathematically, there exists a transformation \(T_{AB}\) such that: $$\forall p_B \in \mathcal{P}(t_B), \exists p_A \in \mathcal{P}(t_A) \ \mbox{such that} \ p_B = T_{AB}(p_A).$$
In other words, the set at time \(t_B\) can be obtained from the set at time \(t_A\) by applying a rigid transformation \(T_{AB}\) to all points.
In the drawing on the side, for example, the set at time \(t_B\) is the set at time \(t_A\) applying the rigid transformation \(T_{AB}\) that represents the displacement of one unit along the \(y_0\) axis.
Note that, again, we have the implicit figure of a reference frame \(\mathcal{F}_0\).
We could then describe the movement of the object by providing at every moment in time the rigid transformation between time \(t=0\) and the current time!
There is a more convenient way...
Let's attach a reference frame \(\mathcal{F}_{obj}\) to the object!
By attach we mean that any rigid transformation the object undergoes, the reference frame will also undergo.
Note that a reference frame, if considered as formed by three orthogonal rigid "arrows", is also a rigid body. So we can think, without problems, of applying a rigid transformation to a reference frame.
Also note that if we know how the attached reference frame is at an instant \(t\), we know how the entire object is at that same time \(t\).
Then, instead of specifying the transformation of the object from time \(t=0\) to a generic time \(t\), we specify the transformation \(T_t\) between the base reference frame, \(\mathcal{F}_0\), and the object's reference frame, \(\mathcal{F}_{obj}\), at that time \(t\)!
This is simpler because (i) we no longer need to explicitly know how the object was at \(t=0\) and (ii) it is easier to specify rigid transformations between simple objects like reference frames than between complex objects like an arbitrary rigid body.
Example:
At the time \(t\) shown on the side, the transformation \(T_t\) would be a rotation of -30 degrees around the \(x_0\) axis (note that it is rotated 30 degrees in the negative direction), a displacement of 2 units in \(y_0\) and 1 unit in \(z_0\).
How was the object at \(t=0\)? There is no way to know...
To describe the movement of a rigid body, we need to provide at every time \(t\) the transformation \(T_t\).
Since \(T_t(p) = Q(t)p+s(t)\), this is equivalent to providing, at every moment in time \(t\), the rotation matrix \(Q(t)\) and the displacement vector \(s(t)\).
We can provide this information using only one matrix.
Definition: A homogeneous transformation matrix (HTM) is a matrix \(H \in \mathbb{R}^{4 \times 4}\) constructed as follows:
$$H = \left(\begin{array}{cc} Q & s \\ 0_{1 \times 3} & 1 \end{array}\right)$$where \(Q \in \mathbb{R}^{3 \times 3}\) is a rotation matrix, \(s \in \mathbb{R}^{3 \times 1}\) is a column vector, and \(0_{1 \times 3}\) is a zero row vector of three elements.
A homogeneous transformation matrix represents a rigid transformation, and vice versa.
HTMs have several utilities when representing rigid transformations.
For example, the composition of rigid transformations is also a rigid transformation. If \(T_1(p) = Q_1p+s_1\) and \(T_2(p) = Q_2p+s_2\). Then: $$T_2(T_1(p)) = (Q_2Q_1)p + (Q_2s_1+s_2).$$
Note that, as in the case of rotations, the transformations do not necessarily commute: (\(T_2T_1 \not= T_1T_2\))
Let the respective HTMs of the transformations \(T_1\), \(T_2\), and \(T_3 = T_2T_1\):
$$H_1 = \left(\begin{array}{cc} Q_1 & s_1 \\ 0_{1 \times 3} & 1 \end{array}\right) \ , \ H_2 = \left(\begin{array}{cc} Q_2 & s_2 \\ 0_{1 \times 3} & 1 \end{array}\right),$$
$$H_3 = \left(\begin{array}{cc} Q_2Q_1 & Q_2s_1+s_2 \\ 0_{1 \times 3} & 1 \end{array}\right).$$
It is not difficult to see that \(H_3 = H_2H_1\), that is, if we represent rigid transformations by HTMs, the composition of two transformations is the matrix product (in the correct order) of the respective HTMs!
We have 4 basic HTMs, representing rotations in \(x, y\) and \(z\) and a translation by a vector \(s \in \mathbb{R}^{3 \times 1}\)
$$\tilde{R}_x(\theta) = \left(\begin{array}{cc} R_x(\theta) & 0_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{array}\right) \ , \ \tilde{R}_y(\theta) = \left(\begin{array}{cc} R_y(\theta) & 0_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{array}\right), $$
$$\tilde{R}_z(\theta) = \left(\begin{array}{cc} R_z(\theta) & 0_{3 \times 1} \\ 0_{1 \times 3} & 1 \end{array}\right) \ , \ \tilde{D}(s) = \left(\begin{array}{cc} I_{3 \times 3} & s \\ 0_{1 \times 3} & 1 \end{array}\right). $$
Where \(R_x(\theta), R_y(\theta)\) and \(R_z(\theta)\) are the 3x3 rotation matrices.
In UAIBot, these four matrices are implemented with the functions Utils.rotx, Utils.roty, Utils.rotz, Utils.trn
import numpy as np
import uaibot as ub
theta=3.14/2
s = [1,2,3]
Rx = ub.Utils.rotx(theta)
Ry = ub.Utils.roty(theta)
Rz = ub.Utils.rotz(theta)
D = ub.Utils.trn(s)
Now it is explained why the rotation matrices are 4x4 (and not 3x3): they are in their HTM form.
We have seen that a rotation matrix \(Q\) has two interpretations (as a reference frame and as an operator).
A HTM also has two, analogously. We have already seen one:
A rotation matrix represents a rotation operation, which is a special case of a rigid transformation. A HTM represents a more general rigid transformation.
We can imagine what the second interpretation is:
The difference is that, while a rotation matrix represents a reference frame \(\mathcal{F}_1\) that has the same center as \(\mathcal{F}_0\), a HTM represents a generic reference frame whose centers do not necessarily coincide.
It is very simple to read the reference frame.
Just as in the case of rotation matrices, there is a connection between the two interpretations.
If we apply the transformation described by \(H\) to the three vectors of the reference frame \(\mathcal{F}_0\), we obtain a reference frame \(\mathcal{F}_1\) that is the same as we would obtain by the interpretation that \(H\) represents a reference frame.
Consider a rigid body in which, at a certain moment in time \(t=10s\), its pose is described by the HTM \(H\) in the reference frame \(\mathcal{F}_0\). Draw how the object will be at time \(t=10s\).
Very important: the information given only allows us to know how the rigid body is at time \(t=10s\)! Without more information, it is not possible to infer anything else. For example, we know nothing about how it was at time \(t=0\) nor how it moved from there to time \(t=10s\).
To know the complete movement, we would need to know, at every moment in time \(t\), the homogeneous transformation matrix \(H(t)\).
In the animation on the side, the matrix \(H(t)\) is provided at every moment in time. In this way, it is possible to continuously track the movement of the object.
Finally, we need to talk a little more about the composition of rigid transformations, described as HTMs.
From now on, we will use the symbols \(R_x(\theta), R_y(\theta), R_z(\theta), D_x(d), D_y(d), D_z(d)\) for the HTMs of rotations and displacements, respectively, where \(\theta, d \in \mathbb{R}\).
Consider a reference frame \(\mathcal{F}_0\). We apply the transformations represented by the HTMs \(H_1, H_2, H_3\) and \(H_4\), in that order.
We will then have a final HTM \(H_{F} = H_4 H_3 H_2 H_1\), where the most recent transformations come to the left, multiplying the matrix.
Note that all transformations are interpreted in \(\mathcal{F}_0\). Therefore, if \(H_2\) is a displacement in \(z\) of 5 units, we will apply a displacement in \(z_0\) of 5 units.
Consider the following sequence of transformations:
$$H = R_x(\pi/4) D_y(-0.25) R_z(\pi/2) D_x(1).$$
Note the order (\(D_x(1)\) first, \(R_z(\pi/2)\) later, etc...).
We read from right to left!
Watch the video on the side with the sequence of transformations.
Note that all transformations are made with respect to the reference frame \(\mathcal{F}_0\).
There is another way to interpret the same transformation!
We will reach the same final reference frame \(H\) if we read from left to right, but considering that the transformations are interpreted in the current reference frame instead of \(\mathcal{F}_0\)!
Watch the video on the side with the sequence of transformations.
Note that the final reference frame is the same! Only the intermediate reference frames change.
Thus, when making a composition of transformations, we can multiply the matrix to the left or to the right:
Both applications are useful. In robotics, however, it is more common to perform applications to the right (in the current reference frame).
The idea of multiplying to the right (written in the current reference frame) is often used to make a chain of transformations!
Consider three reference frames \(\mathcal{F}_A\), \(\mathcal{F}_B\), and \(\mathcal{F}_C\). Suppose we have the rigid transformation \(T_A^B\) between \(\mathcal{F}_A\) and \(\mathcal{F}_B\), and the rigid transformation \(T_B^C\) between \(\mathcal{F}_B\) and \(\mathcal{F}_C\). How to obtain the transformation between \(\mathcal{F}_A\) and \(\mathcal{F}_C\)?
First, we transform from the reference frame \(\mathcal{F}_A\) to \(\mathcal{F}_B\), which is the matrix \(T_A^B\). Then, since the transformation \(T_B^C\) is written in \(\mathcal{F}_B\), we multiply to the right to obtain \(T_A^C = T_A^B T_B^C\).
The same logic would be valid if we had, for example, a fourth reference frame \(\mathcal{F}_D\) and the transformation \(T_C^D\). We would have \(T_A^D = T_A^B T_B^C T_C^D\)...
From now on, we will use the notation \(T_A^B\) to represent the transformation from \(\mathcal{F}_A\) to \(\mathcal{F}_B\). By the other interpretation of HTMs, this matrix also gives the reference frame \(\mathcal{F}_B\) written in \(\mathcal{F}_A\).
We can describe the movement of a rigid body if (1) we choose a base reference frame \(\mathcal{F}_0\) and (2) attach a reference frame \(\mathcal{F}_{obj}\) to it.
From this, we just need to provide at every moment in time \(t\) the HTM that represents the rigid transformation between the reference frame \(\mathcal{F}_0\) (fixed) and the object's reference frame \(\mathcal{F}_{obj}\) (moving) at that moment in time \(t\).
Knowing how the reference frame \(\mathcal{F}_{obj}\) is at that time, we can say exactly how all the particles of the rigid body are at that moment in time, fulfilling our goal of making the description of movement of the object.