Ultra Low-Power 9D Fusion Implementation - EECatalogeecatalog.com/IoT/files/2014/07/CS4302_Sensor_Fusion_WP_HR_FINAL... · Ultra Low-Power 9D Fusion Implementation 2 In this example

White Paper

June 2014

Ultra Low-Power 9D Fusion ImplementationA Case Study

Author

Pieter Struik

R&D Engineer,

Sr. Staff,

Synopsys Inc.

Abstract This paper presents a case study on computing the 3D orientation of a device by means of a 9D fusion algorithm. The focus is on optimizing the fusion algorithm for execution on the DesignWare® Sensor IP Subsystem. Performance measurements show the benefits of using ARC® Processor EXtension (APEX) accelerators, which improve both cycle count and energy consumption in comparison to other commercial processors by a factor of 6-25.

IntroductionToday, the ability to track the orientation or position of a device is a common feature in many portable and wearable products. Computing the orientation is a non-trivial task that converts inputs from multiple motion sensors into accurate position information. This computation is called sensor fusion and it eliminates inaccuracies from noisy sensor inputs. This paper shows how to implement a power-efficient 9D fusion algorithm on an IP subsystem: a processor core that is augmented with hardware accelerators. This implementation makes it applicable in products that require low energy consumption.

We will provide a brief introduction on 3D orientation and then present a 9D sensor fusion1 algorithm. This fusion algorithm has been selected from a broad range of possible fusion solutions and offers good tracking results while performing computations that are common to other fusion algorithms. Next, we will describe the implementation of the 9D fusion application on the DesignWare Sensor IP Subsystem. Performance optimization for this platform is based on an analysis of the fusion algorithm’s hot spots. Switching to a fixed point implementation is one optimization. A second optimization involves applying a number of hardware accelerators that are part of the Sensor IP Subsystem. Both optimizations yield a significant reduction in cycle count. Finally, we will present the results.

3D Orientation Using Inertial/Magnetic SensorsThis paper describes a technique to efficiently compute 3D orientation from the sensory inputs of three motion sensors:

1. An accelerometer measuring linear acceleration along three axes (XYZ)

2. A gyroscope measuring angular velocity around three axes

3. A magnetometer (compass) measuring the magnetic field strength along three axes.

In addition to traditional application areas, today such motion sensors are also commonly applied in a broad range of consumer products, including smartphones, wearable devices, game controllers and sports watches, with applications ranging from screen orientation to indoor navigation.

1 9D fusion refers to the application of inputs from three 3D motion sensors

Ultra Low-Power 9D Fusion Implementation 2

In this example application, the motion sensors are attached to the device and each sensor is sampled at a rate of 50 Hz. After sampling, the sensory data (nine inputs) is used to update the device orientation, expressed in terms of Euler angles, which specify a sequence of rotations around the XYZ axes of the device.

Figure 1: 9D fusion algorithm to compute 3D orientation from motion sensor data

9D Sensor Fusion AlgorithmIn theory, computing orientation angles is simple, as it just requires computing the integral of the angular velocities that are measured by the gyroscope. In practice, the results are not accurate due to gyroscope drift. Drift in the sensor’s output data causes an accumulation of errors in the computed orientation and must be compensated for. This is done by also taking accelerometer data and compass data into account in a sensor fusion algorithm.

Many sensor fusion solutions exist. The 9D fusion algorithm in this paper is based on the principles presented in [1] and is claimed to be at least as accurate as algorithms that are based on Kalman filtering, another popular class of fusion algorithms.

The fusion algorithm has the following elements.

`` A quaternion is used to represent the current orientation of the device. Quaternion Q = [ q0 q1 q2 q3 ] is a 4-dimensional representation that defines a 3x3 rotation matrix ‘R’ that translates each coordinate x = (x,y,z) of the initial device orientation into its current position, Rx. The Hamilton product ‘⊗’ of two quaternions, Q0 ⊗ Q1, is a quaternion that represents the compound rotation of performing the rotation represented by Q0 first, followed by the rotation represented by Q1. Using quaternions to represent orientation (rotation) has the benefit that it does not involve trigonometric functions for most computations.

`` The current orientation, represented by ‘Q’ is updated based on the gyroscope data (gx, gy, gz). The additional rotation of the device since the last measurement is represented by [ 1 (gx/f) (gy/f) (gz/f) ], where f equals the sampling frequency. As quaternion ‘Q’ will be normalized later, it is easier to have the additional rotation represented by [ C gx gy gz ], where ‘C’ is a constant that is based on the sampling frequency.

`` Fusion with accelerator data is applied since the current orientation may not be accurate. Note that vector (ax, ay, az) that is measured by the accelerator equals the gravity vector when the device is not moving.

9D fusion algorithm

Accelerometer Gyroscope Magnetometer

Roll(3rd)

Pitch(2nd)

Yaw(1st)

Y

X Arrow indicatespositive direction

Z

3D orientation


Figure 2: In rest, acceleration vector (ax, ay, az) equals gravity

When the device is being rotated (by rotation R), the gravity vector remains pointed towards the earth’s center. Conversely, relative to the direction of the sensors that are attached to the device, the gravity vector has been rotated by the same rotation R. As a result, the initial gravity vector v ends up as Rv relative to the device, after rotation.

If the device is not moving, the computed gravity vector, Rv, must be aligned with the measured accelerator data (ax, ay, az), see Figure 2. When Rv and (ax, ay, az) are not pointing to the same direction, we conclude that rotation R is not accurate and, hence, Q is not accurate and needs adjustment. The adjustment is proportional to the vector cross product Rv × (ax, ay, az).

`` Next, fusion with magnetometer data is applied. We use the fact that the device is subject to the earth’s magnetic field and assume that there is no interference from another magnetic field, for example a permanent magnet. In this case, the measured magnetic field (mx, my, mz) can be related to the initial device orientation by applying the reverse2 rotation R*: let w = R*(mx,my,mz). Assuming that the earth’s north is aligned in the direction of the X axis, an alignment error should be computed by comparing vector w to vector w’, which is obtained by rotating w onto the device plane [1]. The alignment error vector c is proportional to vector cross product c = w × w’. As c is related to the initial device orientation, Rc is the adjustment that must be applied to the current orientation, represented by quaternion Q.

`` The 9D fusion algorithm updates the orientation at the sampling rate of the sensors. At the end of each iteration, quaternion Q is converted into three Euler angles. The relation between Q and the Euler angles involves inverse sine and inverse tangent functions [1]. As an example, the rotation angle around the Y axis equals –asin(2*q1*q3 + 2*q0*q2) for quaternion Q = [ q0 q1 q2 q3 ].

Note: it appears that the fusion steps are quite robust. In the case of a moving device, the measured accelerometer data (ax, ay, az) equals a non-zero linear acceleration next to gravity. Yet, the fusion algorithm provides good results. In addition, fusing of compass data is quite accurate even when there is a distortion of the magnetic field.

az

ay

ax

az

ay

ax

Z

Z

Y

X

X

Y

Initial orientationGravity vector alongnegative z-axis: (0, 0, -g)

Orientation after rotationGravity vector (ax, ay, az)

2 In principle, this is similar to the relationship between the measured accelerator data and the initial gravity vector. For the accelerator, however, the forward rotation R is used since the gravity vector, v = (0,0,-g), contains two zero coordinates, which makes evaluation of Rv computationally efficient.


DesignWare Sensor IP SubsystemThe 9D fusion application has been implemented on a specific configuration of the DesignWare Sensor IP Subsystem, which is a pre-verified IP subsystem that is optimized to process data from digital and analog sensors, enabling power-efficient processing of the sensor data [2]. It consists of an ARC EM4 32-bit processor [3] and a user-configurable set of serial digital interfaces, analog-to-digital converter interfaces, hardware accelerators, and a software library of DSP functions and I/O software drivers.

Figure 3: DesignWare Sensor IP Subsystem integrated hardware and software solution

The Sensor IP Subsystem exploits ARC Processor EXtension (APEX) technology to implement tightly coupled I/O peripherals and tightly coupled hardware accelerators. These extensions are accessed directly from the processor pipeline without the overhead of additional hardware infrastructure, like busses, bridges, and adapters.

The subsystem configuration that has been used for the 3D orientation application contains two I2C master peripherals, a SPI master peripheral, and general purpose I/O (GPIO). The ARC EM4 processor is configured to have a single-cycle 32-bit integer multiplier. In addition, the subsystem contains fixed-point hardware accelerators for multiply-accumulate (MAC), trigonometric functions and the square root function. As an example of the advantage of using hardware accelerators, consider the square root accelerator. It takes a 32-bit fixed point number3 as input and computes the square root in 31 cycles, evaluating one bit at a time. Calculating a square root using hardware accelerators is much faster than by only using software delivered as part of a compiler’s standard library.

Optimizing the 9D Sensor Fusion Algorithm for Ultra Low PowerThis section presents implementation choices of the 9D sensor fusion algorithm. The algorithm repetitively processes input samples from the 3 sensors:

gyr[3] /* XYZ measurement of the gyroscope */ acc[3] /* XYZ measurement of the accelerometer */ mag[3] /* XYZ measurement of the magnetometer */

and updates the quaternion that represents the device orientation:

Q[4] /* state variable: quaternion */

AHB master

AHB slave

JTAG

Hostinterface

Sensorand actuatorconnectivity

Pipeline

Hardwareaccelerators

ARC EM4

CommitExecute

Debug

IFQ

EmbeddedROM/SRAM

Interruptcontroller

Filteringaccelerators

Vectoraccelerators

APBIF

Fast mathaccelerators

GPIOI2C

masterSPI

masterADC

IF

I2Cslave

Interpolationaccelerators

Directmemory port

EmbeddedSRAM

Timers

Software (I/O drivers, host drivers, DSP functions)

3 A 32-bit fixed-point number is a signed 32-bit integer that represents the integer value divided by 231; hence, it represents a number in the range [-1..1) with granularity of 2-31 (also known as Q0.31 or Q31, for short)


Consider a single iteration of the algorithm:

/* fusion step #1 *//* use accelerometer data and rotated (initial) gravity [0,0,-1] */ gravity[3] = rotate( [0,0,-1] );/* compute inverse norm of accelerator data */ norm_inv = 1 / sqrt( accX*accX + accY*accY + accZ*accZ );/* compute vector cross product of normalized accelerator data and rotated gravity */ err[3] = vector_cross_product(acc[3], gravity[3]) * norm_inv;/* fusion: compensate gyroscope data using compensation factor ACF (constant) */ gyr[3] += ACF * err[3];/* fusion step#2 *//* compute inverse norm of magnetometer data */ norm_inv = 1 / sqrt( magX*magX + magY*magY + magZ*magZ );/* compute inverse rotation of normalized magnetometer data */ rev_mag[3] = rotate_inverse( mag[3] ) * norm_inv;/* align earth’s north in device plane */ rev_magY = sqrt( rev_magY*rev_magY + rev_magZ*rev_magZ);/* rotate back and compute cross product with measured magnetometer data */ dir_mag[3] = rotate( [rev_magX, rev_magY, 0] ); err[3] = vector_cross_product(mag[3], dir_mag[3]);/* fusion: compensate gyroscope data using compensation factor MCF (constant) */ gyr[3] += MCF * err[3];/* update device orientation based on adjusted gyroscope data */ u[4] = hamilton_product( Q[4], [ C, gyrX, gyrY, gyrZ ]);/* normalize */ norm_inv = 1 / sqrt( u0*u0 + u1*u1 + u2*u2 + u3*u3 ); Q[4] = u[4] * norm_inv;/* compute Euler angles for updated Q */ Angles[3] = euler_angles( Q[4] );

Take a closer look at the operations used in this algorithm:

`` Rotation and inverse rotation are matrix-vector multiplications where the (inverse) rotation matrix has coefficients that are based on quaternion Q. These matrix coefficient are expressions like “2*q0*q0 -1+2*q1*q1” and “2*(q1*q2+q0*q3)” [1]. Hence, (inverse) rotation involves many multiplications and additions

`` Computing the inverse norm of a vector or quaternion takes multiplications, additions, a square root, and a division

`` Computation of a vector cross product takes multiplications and additions

`` Computing the Hamilton product of two quaternions takes many multiplications and additions [1]

`` Computing Euler angles involves trigonometric functions (atan2 and asin), of which the arguments are expressions that are based on quaternion Q. Again, the expressions involve multiplications and additions.

The algorithm has been implemented as a floating point version first. It is possible to optimize the implementation by converting it into a fixed point version, but it is important to note that conversion from floating point to fixed point requires special care with respect to overflow and underflow in computations. The benefit, however, is that after conversion all floating point multiplications and additions have been replaced by single-cycle 32-bit multiplications and simple 32-bit additions (or subtractions).

Further analysis of the resulting fixed-point version of the algorithm reveals that the square root function, which is executed four times, consumes many cycles. Using the square root function call, which is included in the library of DSP functions, to apply the square root hardware accelerator further reduces cycle count.

A final optimization is to reduce the cycle count of the Euler angles computation by applying a hardware accelerator for the inverse sine function. As the inverse tangent function can be expressed in terms of inverse sine and square root operations, using the hardware accelerators also pays off here.


In summary, the cycle count of the 9D fusion algorithm has been reduced by implementing a fixed point version of the algorithm that effectively exploits the availability of hardware accelerators. Reduction of cycle count leads to lower energy, as the energy that is used for executing the algorithm is proportional to the required performance.

ResultsTable 1 highlights cycle count numbers of the 9D fusion algorithm, which has been executed on three platforms: a platform that is based on the DesignWare Sensor IP Subsystem containing an ARC EM4 core and APEX accelerators, and two platforms that are based on two well-known commercial processors, referred to as Processor A and B.

Based on the cycle count number, the required processor performance, in MHz, is computed for a 3D orientation application that samples its motion sensors at a rate of 50 Hz. Multiplication with μW/MHz numbers for the various processor cores yields the total energy for one second of operation.

These energy numbers relate to the processor core only. The topic of reducing the energy consumed by memories, peripherals and hardware infrastructure (busses, bridges, etc.) is addressed in the whitepaper entitled “Building an Efficient, Tightly-Coupled Embedded System Using an Extensible Processor” [4].

The table shows that the Sensor IP Subsystem outperforms Processor A and B in energy consumption by a factor of 6-25.

Table 1: Energy consumption results

ConclusionThis paper describes the need for sensor fusion when implementing applications like 3D orientation. The basic operation of a 9D sensor fusion algorithm is shown, including an outline of a C implementation. The cycle count of the algorithm is reduced in a number of optimization steps. Converting the floating point version of the algorithm into a fixed point version enables using a (single cycle) integer multiplier and, more importantly, opens the way for further optimization by applying hardware accelerators. The DesignWare Sensor IP Subsystem, built around an ARC EM4 processor, which includes a number of user-configurable APEX extensions, can reduce energy consumption by a factor of 6-25 compared to other commercial processors.

Figure 4: Demonstration kit for 9D sensor fusion: on the left-hand side an ARC EM Starter Kit (where the Sensor IP Subsystem is mapped onto an FPGA) interfacing to 3 motion sensors;

9D fusion is executed on the embedded core and the orientation is sent to the tablet to be displayed

Platform 9D fusion function

Cycle count [#] Required frequency @50Hz [MHz]

Energy (50 samples) 40LP HS SVT [uJ]

Ratio vs optimized Sensor IP Subsystem

A Floating point 62.1k 3.11 21.5 24.9 x

Fixed point 24.7k 1.23 8.54 9.9 x

B Floating point 23.0k 1.15 13.3 15.4 x

Fixed point 9.1k 0.46 5.3 6.1 x

Sensor IP Subsystem

Fixed point 1.7k 0.09 0.86 -

Note: Energy consumption numbers for Processors A and B derived from publicly available power specifications. Energy consumption numbers for ARC EM4 derived from netlist power measurements.

06/14.AP.CS4302.

Synopsys, Inc. • 700 East Middlefield Road • Mountain View, CA 94043 • www.synopsys.com

©2014 Synopsys, Inc. All rights reserved. Synopsys is a trademark of Synopsys, Inc. in the United States and other countries. A list of Synopsys trademarks is available at http://www.synopsys.com/copyright.html . All other names mentioned herein are trademarks or registered trademarks of their respective owners.

References[1] “An efficient orientation filter for inertial and inertial/magnetic sensor arrays”, Sebastian O.H. Madgwick,

April 30, 2010 (http://www.x-io.co.uk/res/doc/madgwick_internal_report.pdf)

[2] “DesignWare Sensor IP Subsystem”, Synopsys, Inc (http://www.synopsys.com/dw/ipdir.php?ds=sensor_subsystem )

[3] “DesignWare ARC EM Processor Core Family”, Synopsys, Inc (http://www.synopsys.com/IP/ProcessorIP/ARCProcessors/ARCEM )

[4] “Building an Efficient, Tightly-Coupled Embedded System Using an Extensible Processor”, Jeroen Geuzebroek, white paper, Synopsys, Inc., June 2014 ( https://www.synopsys.com/dw/doc.php/wp/building_an_efficient_tightly_coupled_embedded_system_using_an_extensible_processor.pdf )

[5] “Leveraging Processor Extensibility to Build an Ultra Low-Power Embedded Subsystem”, Jeroen Geuzebroek, white paper, Synopsys, Inc., March 2014 (https://www.synopsys.com/dw/doc.php/wp/leveraging_processor_extensibility.pdf )

http://www.x-io.co.uk/res/doc/madgwick_internal_report.pdf

http://www.synopsys.com/dw/ipdir.php?ds=sensor_subsystem

http://www.synopsys.com/dw/ipdir.php?ds=sensor_subsystem

http://www.synopsys.com/IP/ProcessorIP/ARCProcessors/ARCEM

http://www.synopsys.com/IP/ProcessorIP/ARCProcessors/ARCEM

https://www.synopsys.com/dw/doc.php/wp/building_an_efficient_tightly_coupled_embedded_system_using_an_extensible_processor.pdf

https://www.synopsys.com/dw/doc.php/wp/building_an_efficient_tightly_coupled_embedded_system_using_an_extensible_processor.pdf

https://www.synopsys.com/dw/doc.php/wp/leveraging_processor_extensibility.pdf

https://www.synopsys.com/dw/doc.php/wp/leveraging_processor_extensibility.pdf

Documents

Ultra Low-Power 9D Fusion Implementation - EECatalogeecatalog.com/IoT/files/2014/07/CS4302_Sensor_Fusion_WP_HR_FINAL... · Ultra Low-Power 9D Fusion Implementation 2 In this example