# ilqr **Repository Path**: Zivecho/ilqr ## Basic Information - **Project Name**: ilqr - **Description**: No description available - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2023-08-27 - **Last Updated**: 2023-08-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Iterative Linear Quadratic Regulator ==================================== .. image:: https://travis-ci.org/anassinator/ilqr.svg?branch=master :target: https://travis-ci.org/anassinator/ilqr This is an implementation of the Iterative Linear Quadratic Regulator (iLQR) for non-linear trajectory optimization based on Yuval Tassa's `paper `_. It is compatible with both Python 2 and 3 and has built-in support for auto-differentiating both the dynamics model and the cost function using `Theano `_. Install ------- To install, clone and run: .. code-block:: bash python setup.py install You may also install the dependencies with `pipenv` as follows: .. code-block:: bash pipenv install Usage ----- After installing, :code:`import` as follows: .. code-block:: python from ilqr import iLQR You can see the `examples `_ directory for `Jupyter `_ notebooks to see how common control problems can be solved through iLQR. Dynamics model ^^^^^^^^^^^^^^ You can set up your own dynamics model by either extending the :code:`Dynamics` class and hard-coding it and its partial derivatives. Alternatively, you can write it up as a `Theano` expression and use the :code:`AutoDiffDynamics` class for it to be auto-differentiated. Finally, if all you have is a function, you can use the :code:`FiniteDiffDynamics` class to approximate the derivatives with finite difference approximation. This section demonstrates how to implement the following dynamics model: .. math:: m \dot{v} = F - \alpha v where :math:`m` is the object's mass in :math:`kg`, :math:`alpha` is the friction coefficient, :math:`v` is the object's velocity in :math:`m/s`, :math:`\dot{v}` is the object's acceleration in :math:`m/s^2`, and :math:`F` is the control (or force) you're applying to the object in :math:`N`. Automatic differentiation """"""""""""""""""""""""" .. code-block:: python import theano.tensor as T from ilqr.dynamics import AutoDiffDynamics x = T.dscalar("x") # Position. x_dot = T.dscalar("x_dot") # Velocity. F = T.dscalar("F") # Force. dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient. # Acceleration. x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m # Discrete dynamics model definition. f = T.stack([ x + x_dot * dt, x_dot + x_dot_dot * dt, ]) x_inputs = [x, x_dot] # State vector. u_inputs = [F] # Control vector. # Compile the dynamics. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. dynamics = AutoDiffDynamics(f, x_inputs, u_inputs) *Note*: If you want to be able to use the Hessians (:code:`f_xx`, :code:`f_ux`, and :code:`f_uu`), you need to pass the :code:`hessians=True` argument to the constructor. This will increase compilation time. Note that :code:`iLQR` does not require second-order derivatives to function. Batch automatic differentiation """"""""""""""""""""""""""""""" .. code-block:: python import theano.tensor as T from ilqr.dynamics import BatchAutoDiffDynamics state_size = 2 # [position, velocity] action_size = 1 # [force] dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient. def f(x, u, i): """Batched implementation of the dynamics model. Args: x: State vector [*, state_size]. u: Control vector [*, action_size]. i: Current time step [*, 1]. Returns: Next state vector [*, state_size]. """ x_ = x[..., 0] x_dot = x[..., 1] F = u[..., 0] # Acceleration. x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m # Discrete dynamics model definition. return T.stack([ x_ + x_dot * dt, x_dot + x_dot_dot * dt, ]).T # Compile the dynamics. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. dynamics = BatchAutoDiffDynamics(f, state_size, action_size) *Note*: This is a faster version of :code:`AutoDiffDynamics` that doesn't support Hessians. Finite difference approximation """"""""""""""""""""""""""""""" .. code-block:: python from ilqr.dynamics import FiniteDiffDynamics state_size = 2 # [position, velocity] action_size = 1 # [force] dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient. def f(x, u, i): """Dynamics model function. Args: x: State vector [state_size]. u: Control vector [action_size]. i: Current time step. Returns: Next state vector [state_size]. """ [x, x_dot] = x [F] = u # Acceleration. x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m return np.array([ x + x_dot * dt, x_dot + x_dot_dot * dt, ]) # NOTE: Unlike with AutoDiffDynamics, this is instantaneous, but will not be # as accurate. dynamics = FiniteDiffDynamics(f, state_size, action_size) *Note*: It is possible you might need to play with the epsilon values (:code:`x_eps` and :code:`u_eps`) used when computing the approximation if you run into numerical instability issues. Usage """"" Regardless of the method used for constructing your dynamics model, you can use them as follows: .. code-block:: python curr_x = np.array([1.0, 2.0]) curr_u = np.array([0.0]) i = 0 # This dynamics model is not time-varying, so this doesn't matter. >>> dynamics.f(curr_x, curr_u, i) ... array([ 1.02 , 2.01998]) >>> dynamics.f_x(curr_x, curr_u, i) ... array([[ 1. , 0.01 ], [ 0. , 1.00999]]) >>> dynamics.f_u(curr_x, curr_u, i) ... array([[ 0. ], [ 0.0001]]) Comparing the output of the :code:`AutoDiffDynamics` and the :code:`FiniteDiffDynamics` models should generally yield consistent results, but the auto-differentiated method will always be more accurate. Generally, the finite difference approximation will be faster unless you're also computing the Hessians: in which case, Theano's compiled derivatives are more optimized. Cost function ^^^^^^^^^^^^^ Similarly, you can set up your own cost function by either extending the :code:`Cost` class and hard-coding it and its partial derivatives. Alternatively, you can write it up as a `Theano` expression and use the :code:`AutoDiffCost` class for it to be auto-differentiated. Finally, if all you have are a loss functions, you can use the :code:`FiniteDiffCost` class to approximate the derivatives with finite difference approximation. The most common cost function is the quadratic format used by Linear Quadratic Regulators: .. math:: (x - x_{goal})^T Q (x - x_{goal}) + (u - u_{goal})^T R (u - u_{goal}) where :math:`Q` and :math:`R` are matrices defining your quadratic state error and quadratic control errors and :math:`x_{goal}` is your target state. For convenience, an implementation of this cost function is made available as the :code:`QRCost` class. :code:`QRCost` class """""""""""""""""""" .. code-block:: python import numpy as np from ilqr.cost import QRCost state_size = 2 # [position, velocity] action_size = 1 # [force] # The coefficients weigh how much your state error is worth to you vs # the size of your controls. You can favor a solution that uses smaller # controls by increasing R's coefficient. Q = 100 * np.eye(state_size) R = 0.01 * np.eye(action_size) # This is optional if you want your cost to be computed differently at a # terminal state. Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]]) # State goal is set to a position of 1 m with no velocity. x_goal = np.array([1.0, 0.0]) # NOTE: This is instantaneous and completely accurate. cost = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal) Automatic differentiation """"""""""""""""""""""""" .. code-block:: python import theano.tensor as T from ilqr.cost import AutoDiffCost x_inputs = [T.dscalar("x"), T.dscalar("x_dot")] u_inputs = [T.dscalar("F")] x = T.stack(x_inputs) u = T.stack(u_inputs) x_diff = x - x_goal l = x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u) l_terminal = x_diff.T.dot(Q_terminal).dot(x_diff) # Compile the cost. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. cost = AutoDiffCost(l, l_terminal, x_inputs, u_inputs) Batch automatic differentiation """"""""""""""""""""""""""""""" .. code-block:: python import theano.tensor as T from ilqr.cost import BatchAutoDiffCost def cost_function(x, u, i, terminal): """Batched implementation of the quadratic cost function. Args: x: State vector [*, state_size]. u: Control vector [*, action_size]. i: Current time step [*, 1]. terminal: Whether to compute the terminal cost. Returns: Instantaneous cost [*]. """ Q_ = Q_terminal if terminal else Q l = x.dot(Q_).dot(x.T) if l.ndim == 2: l = T.diag(l) if not terminal: l_u = u.dot(R).dot(u.T) if l_u.ndim == 2: l_u = T.diag(l_u) l += l_u return l # Compile the cost. # NOTE: This can be slow as it's computing and compiling the derivatives. # But that's okay since it's only a one-time cost on startup. cost = BatchAutoDiffCost(cost_function, state_size, action_size) Finite difference approximation """"""""""""""""""""""""""""""" .. code-block:: python from ilqr.cost import FiniteDiffCost def l(x, u, i): """Instantaneous cost function. Args: x: State vector [state_size]. u: Control vector [action_size]. i: Current time step. Returns: Instantaneous cost [scalar]. """ x_diff = x - x_goal return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u) def l_terminal(x, i): """Terminal cost function. Args: x: State vector [state_size]. i: Current time step. Returns: Terminal cost [scalar]. """ x_diff = x - x_goal return x_diff.T.dot(Q_terminal).dot(x_diff) # NOTE: Unlike with AutoDiffCost, this is instantaneous, but will not be as # accurate. cost = FiniteDiffCost(l, l_terminal, state_size, action_size) *Note*: It is possible you might need to play with the epsilon values (:code:`x_eps` and :code:`u_eps`) used when computing the approximation if you run into numerical instability issues. Usage """"" Regardless of the method used for constructing your cost function, you can use them as follows: .. code-block:: python >>> cost.l(curr_x, curr_u, i) ... 400.0 >>> cost.l_x(curr_x, curr_u, i) ... array([ 0., 400.]) >>> cost.l_u(curr_x, curr_u, i) ... array([ 0.]) >>> cost.l_xx(curr_x, curr_u, i) ... array([[ 200., 0.], [ 0., 200.]]) >>> cost.l_ux(curr_x, curr_u, i) ... array([[ 0., 0.]]) >>> cost.l_uu(curr_x, curr_u, i) ... array([[ 0.02]]) Putting it all together ^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python N = 1000 # Number of time-steps in trajectory. x0 = np.array([0.0, -0.1]) # Initial state. us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path. ilqr = iLQR(dynamics, cost, N) xs, us = ilqr.fit(x0, us_init) :code:`xs` and :code:`us` now hold the optimal state and control trajectory that reaches the desired goal state with minimum cost. Finally, a :code:`RecedingHorizonController` is also bundled with this package to use the :code:`iLQR` controller in Model Predictive Control. Important notes ^^^^^^^^^^^^^^^ To quote from Tassa's paper: "Two important parameters which have a direct impact on performance are the simulation time-step :code:`dt` and the horizon length :code:`N`. Since speed is of the essence, the goal is to choose those values which minimize the number of steps in the trajectory, i.e. the largest possible time-step and the shortest possible horizon. The size of :code:`dt` is limited by our use of Euler integration; beyond some value the simulation becomes unstable. The minimum length of the horizon :code:`N` is a problem-dependent quantity which must be found by trial-and-error." Contributing ------------ Contributions are welcome. Simply open an issue or pull request on the matter. Linting ------- We use `YAPF `_ for all Python formatting needs. You can auto-format your changes with the following command: .. code-block:: bash yapf --recursive --in-place --parallel . You may install the linter as follows: .. code-block:: bash pipenv install --dev License ------- See `LICENSE `_. Credits ------- This implementation was partially based on Yuval Tassa's :code:`MATLAB` `implementation `_, and `navigator8972 `_'s `implementation `_.