16
1 ARMv8: Advantages for Android Ian Rickards

64-bit Android

Embed Size (px)

Citation preview

1

ARMv8: Advantages for Android

Ian Rickards

2

Why 64-bit in mobile?

64-bit support in Android™ Lollipop

Where next?

Summary

Contents

3

The ARM® 64-bit Architecture : ARMv8-A

Full native 32-bit execution, side-by-side

with 64-bit

New, modern, A64 instruction set

architecture (ISA)

Double the number (and size) of registers

New instructions for both A32 and A64

ARMv8-A

Crypto

Advanced SIMD

Scalar FP

AArch32 AArch64

T32 + A32

ISA A64 ISA

4

Why 64-bit in mobile?

Performance through

architecture

Cleaner instruction set architecture

Hard-float ABI by default in ARMv8-A

More registers, less stack spillage

Cheaper function calls

Up to 16x crypto acceleration

Preparation for larger memory devices

5

64-bit support in Android Lollipop

64-bit support for ARM®

32-bit & 64-bit apps exist in the same build

Also, introduces the ART runtime

Source : http://www.android.com/

6

What does L mean for developers?

Pure Java apps get ARMv8-A benefit for

free via ART

32-bit NDK apps run without change, and

at full performance

Rebuild NDK code with

APP_ABI="arm64-v8a” to take full

advantage of A64

Interworking rules mean Java apps run as

32-bits if they call 32-bit NDK code

7

What is ART?

ART is a replacement for Dalvik

AOT vs JIT (ahead of time - i.e. at install)

Redesigned to be better on multi-core

systems

Fits well with big.LITTLE™ technology

Measured on Nexus 7 with Dalvik/ART Preview on 4.4

0%

100%

200%

Quadrant CPU Linpack MT

Rela

tive

to D

alvi

k JIT

Dalvik ART

8

ART on ARMv8-A: performance features

Utilizes the modern A64 ISA for 64-bit

apps

Single-cycle instructions for Java long &

double types

Uses hard-float ABI

32-bit object references - no 64-bit

pointer penalty

Rocket by Luis Prado from the Noun Project

9

Considerations for native developers

Porting C code to 64-bit is the same as for

any other architecture

Review your feature detection code when

moving to 64-bit

Assembly code needs to be ported to the

more efficient A64 ISA

NEON™ changes can be simply

recompiled if written using compiler

intrinsics

Change graphic

10

NEON Intrinsics

Include intrinsics header file (ACLE standard)

#include <arm_neon.h>

Use special NEON data types which

correspond to D and Q registers, e.g.

int8x8_t D-register 8x 8-bit values

int16x4_t D-register 4x 16-bit values

int32x4_t Q-register 4x 32-bit values

Use NEON intrinsics versions of instructions

vin1 = vld1q_s32(ptr);

vout = vaddq_s32(vin1, vin2);

vst1q_s32(vout, ptr);

Strongly typed!

Use vreinterpret_s16_s32( ) to change the type

static inline void Filter_32_opaque_neon(unsigned x, unsigned y, SkPMColor a00, SkPMColor a01, SkPMColor a10, SkPMColor a11, SkPMColor *dst) { uint8x8_t vy, vconst16_8, v16_y, vres; uint16x4_t vx, vconst16_16, v16_x, tmp; uint32x2_t va0, va1; uint16x8_t tmp1, tmp2; vy = vdup_n_u8(y); // duplicate y into vy vconst16_8 = vmov_n_u8(16); // set up constant in vconst16_8 v16_y = vsub_u8(vconst16_8, vy); // v16_y = 16-y va0 = vdup_n_u32(a00); // duplicate a00 va1 = vdup_n_u32(a10); // duplicate a10 va0 = vset_lane_u32(a01, va0, 1); // set top to a01 va1 = vset_lane_u32(a11, va1, 1); // set top to a11 tmp1 = vmull_u8(vreinterpret_u8_u32(va0), v16_y); // tmp1 = [a01|a00] * (16-y) tmp2 = vmull_u8(vreinterpret_u8_u32(va1), vy); // tmp2 = [a11|a10] * y

Fully compatible with AArch64

11

Compatibility

C/instrinsics will port with no effort

Asm requires reworking of .s file

(mostly cosmetic, but can take

advantage of additional registers)

AArch64 NEON optimization in

progress

ARM & Linaro working on key Android

libraries using intrinsics

ffmpeg AArch64 NEON decoders (asm)

X264 AArch64 NEON encoder (asm)

AArch64 NEON coding

technique

Compatible?

Vectorized “C” Fully compatible

Intrinsics

(“arm_neon.h”)

Fully compatible

Asm (.s) Some porting required

Library routines Yes, if library available

12

Performance – Native

0%

5%

10%

15%

20%

25%

30%

Single Thread Multithreaded

AA

rch64 im

pro

vem

ent

ove

r A

Arc

h32

AnTuTu 32/64bit CPU Test v5.0

Measured on Juno (2x Cortex-A57, 4x Cortex-A53)

0%

5%

10%

15%

20%

25%

30%

bionic

AA

rch64 im

pro

vem

ent

ove

r A

Arc

h32

bionic-benchmarks

13

Performance – ART

Measured on Juno (2x Cortex-A57, 4x Cortex-A53)

0%

5%

10%

15%

20%

25%

30%

AA

rch64 im

pro

vem

ent

ove

r A

Arc

h32

CPU Score

Quadrant 2.0

0%

5%

10%

15%

20%

25%

30%

AA

rch64 im

pro

vem

ent

ove

r A

Arc

h32

Multi-threaded

Linpack

14

Want to know more?

Join us in the Android group at

Connected Community!

http://community.arm.com/groups/android-

community

ARMv8-A Porting Guide:

http://community.arm.com/docs/DOC-

8453

Taming ARMv8-A NEON: from theory

to benchmark results

http://youtu.be/ixuDntaSnHI?list=UUIVqQ

KxCyQLJS6xvSmfndLA

Porting & optimizing for 64-bit, a compiler

perspective

http://www.linaro.org/assets/common/campus

-party-presentation-Sept_2013.pdf

https://www.youtube.com/watch?v=epzYErIIx

0Y

An OSX perspective of the 32-64-bit

transition

https://developer.apple.com/library/mac/docu

mentation/Darwin/Conceptual/64bitPorting/in

tro/intro.html

15

Summary

The ARMv8-A architecture makes the

difference for mobile and 64-bit

Android Lollipop provides multi-arch

support enabling both 32/64-bit

applications

Performance gains for those taking

advantage of the ARMv8-A architecture

Come join us at Connected Community

16

Thank You

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU

and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners

The Android robot is reproduced or modified from work created and shared by Google and used according to terms described in the

Creative Commons 3.0 Attribution License.

Google Play is a trademark of Google Inc.