Transcript
Page 1: LCU14 303- Toolchain Collaboration

LCU14 BURLINGAME

Ryan Arnold, LCU14

LCU14-303: Toolchain Collaboration

Page 2: LCU14 303- Toolchain Collaboration

● Participants● Linaro● ARM● QuIC● Cavium● ST

● Topics● Participant Introductions and Development Focus● GNU Toolchain Roadmaps● GNU Toolchain Specifics● LLVM Roadmaps● LLVM Specifics● System Libraries, Linkers, Debuggers, and Tools

Toolchain Collaboration For The Next 6 Months

Page 3: LCU14 303- Toolchain Collaboration

● Representation● Ryan Arnold - Engineering Manager● Maxim Kuvyrkov - Tech Lead● Team - 6 Linaro employees and 6 member assignees

● Kugan Vivekenandarajah, Venkataramanan Kumar, Bernie Ogden, Omair Javaid, Will

Newton, Rob Savoye, Michael Collison, Christophe Lyon, Charles Baylis, Yvan Roux, Renato

Golin, Wang Deqiang

● Purpose● Improve Collaboration● Eliminate Roadmap Redundancy● Identify gaps in eco-system

Linaro - Introduction & Purpose

Page 4: LCU14 303- Toolchain Collaboration

● Product Validation Framework Improvements● Backport, Release, and Binary Toolchain validation automation and reporting

● Toolchain Performance● GCC and LLVM Performance

● Benchmark Automation● Backport, Release, and Binary Toolchain benchmark automation and reporting

● Product offering expansions in 2015● x86_64 hosted cross toolchains● Aarch32 targeted cross toolchains● ARMv7 and ARMv8 hosted toolchains

Linaro - Focus from LCA14 into 2015

Page 5: LCU14 303- Toolchain Collaboration

PUBLIC

Open Source Core ToolchainsThe Next Six Months

Matthew Gretton-DannAugust 2014

Page 6: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Tell you what ARM plans to work on, and what its current priorities are▪ However, things are likely to change – so:▪ We do not promise to achieve all of this in the next six months; nor▪ Do we promise not to do other work

▪ If your plans include the same topics, or work in the same areas▪ Come and talk to us – we should work together▪ Preferably this conversation should happen in the appropriate upstream communities.

▪ If you feel that we’re doing the wrong thing▪ Come and talk to us – we’re happy to work out a better way forward

▪ We are moving to tracking all our ‘public’ work in the appropriate community Bugzilla databases.

▪ This is the best place to have the conversation about best ways forward.

Purpose of this Presentation

Page 7: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Support the Architecture & Cores▪ Teams are involved in development of new cores and architecture extensions▪ We will not discuss those here▪ However, we plan to upstream functionality as soon as possible after public announcements

▪ Support the Community

▪ Improve Performance:▪ Focus on Cortex-A57 performance improvement▪ Focus on a range of benchmarks, including industry standard CPU benchmarks.▪ We analyze benchmarks both:

◦ for improvements we can make to the toolchains; and◦ to note any regressions and get them fixed in co-operation with the community

Overview of Goals for Year

Page 8: LCU14 303- Toolchain Collaboration

ST - Introduction & Purpose

Page 9: LCU14 303- Toolchain Collaboration

ST - Focus from LCU14 into 2015

Page 10: LCU14 303- Toolchain Collaboration

QuIC - Introduction & Purpose

Page 11: LCU14 303- Toolchain Collaboration

QuIC - Focus from LCU14 into 2015

Page 12: LCU14 303- Toolchain Collaboration

● Supports GNU based ThunderX toolchain internally (and other Cavium products)

● Make sure that GCC performance areas are covered but not twice● Implemented ILP32 support in the kernel, glibc and parts of gcc

and binutils support● Helped in getting some performance improvements for AARCH64

already○ Naveen implemented many patterns in the back-end for the instructions which were not

being emitted○ Andrew helped with part of conditional compares; improving ifcombine○ Added issue rate to the AARCH64 cost table○ Added trap pattern so abort function is not used for __builtin_trap○ Removed some redundant cmp’s

● Added many new testcases

Cavium - Introduction

Page 13: LCU14 303- Toolchain Collaboration

● Finish upstreaming ILP32 support○ Including gdb and glibc support○ glibc patch is almost done, just finalizing the patch set

● Upstream base ThunderX support○ Will not include a schedule model to begin with

● Upstreaming patches for GCC 6 stage 1○ Conditional moves improvements○ Improvements to conditional compares○ Large system extension support in GCC

■ Joel posted an infrastructure change that was rejected; might need to rewrite them○ LSE HWCAP support in glibc and kernel

■ Need to know what path is acceptable for glibc○ Some tweaks to the cost tables in AARCH64; needed for ThunderX support

● Looking into prefetch loop arrays

Cavium - Focus from LCU14 into 2015

Page 14: LCU14 303- Toolchain Collaboration

GNU Toolchain Collaboration

Page 16: LCU14 303- Toolchain Collaboration

● Continue Member Driven Optimizationscurrent examples in development:● Zero-sign-extension elimination using value-range propagation● NEON intrinsics improvements in Libvpx on ARM & Aarch64● STREAMS performance improvements

● Identify Linaro Toolchain product driven optimizations● Benchmarking Linaro toolchain products● Identifying Regressions● Improving performance based on investigations

● Performance Comparisons● Identify potential optimizations based on performance gains seen on other

architectures.● Future

● Whole System Profiling & Workload Profiling● Feature exploitation

● LTO for Aarch64

Linaro - What’s Next for GCC Performance?

Page 17: LCU14 303- Toolchain Collaboration

● Improve NEON testing coverage and correctness● GCC community stewardship

● bug triage● patch review

● Unified Driver Development● LLVM Community Releases

Linaro - Community Involvement

Page 18: LCU14 303- Toolchain Collaboration

● Improve validation of Linaro GCC source package backports● Improve automation● Add default configurations validated per backport: 8 17

● Provide expansive source release validation of existing products● all default configurations● all enabled secondary configurations● all supported languages● various tunings

● Offer new products● arm and aarch64 native binary toolchains● x86_64 hosted cross toolchains● Aarch32 targeted cross toolchains

Linaro - What’s Next for Product Offerings?

Page 19: LCU14 303- Toolchain Collaboration

● Release Candidate Benchmarking○ Current Release Benchmarking

■ Manual SPEC2K (looking for release regressions)○ Future Release Benchmarking

■ Automated SPEC2K, SPEC2K6, EEMBC Suite● Backport Validation Benchmarking

○ Current Backport Benchmarking■ None

○ Future Backport Benchmarking■ Automated Coremark in development

● Reporting - uploading permitted relative results to members only portal● Why does Linaro do benchmarking?

○ Guides future development○ Informs validity of patches in development

■ Current Development Benchmarking● as-needed: Coremark, SPEC2K, SPEC2K6

Linaro - What’s next for Benchmarking?

Page 20: LCU14 303- Toolchain Collaboration

PUBLIC

GNU Roadmap : Cortex-AM

OBI

LE

ENTE

RPRI

SE

CO

MM

ON

2014 FutureH1 2015

Released

Development

Adv. Planning

Concept

ARMv8 A32 - ISA extension

Cortex-A12 - Arch support

A64 toolchain production ready - GCC 4.9

Cortex-A12/A17 - uArch tuning, cost model

A64 performance gains

ACLE 64 - Specification

Cortex-A57 - uArch tuning, cost model

ILP32- User space & production

ACLE 64- Implementation

Big Endian- AArch64 auto-vectorization

Maths libraries

A64 GOLD

Big Endian – Basic AArch64 support

Performance optimization - CPU-centric performance enhancements

Toolchain features - Continuous ecosystem contribution for performance and features, NEON intrinsics

GCC 4.9

GCC 4.10 / 5.0

A7/A15 A32 big.LITTLE

Page 21: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Reworked AArch64 RTX costs ▪ Improved Neon intrinsics code generation▪ PUSH_ARGS_REVERSED improvements.▪ GLIBC math library improvements for AArch32 and AArch64 ▪ Improved code generation for copysign intrinsic ▪ Improved choice of spill size for FP registers (decreasing memory bandwidth)▪ Restructured and improved prologue/epilogue sequences – especially with –fomit-

frame-pointer.▪ Improved addressing modes for vectors on AArch64▪ Improve AArch32 memset inlining

What We’ve Done In the Past Three Months or SoGNU Toolchain

Page 22: LCU14 303- Toolchain Collaboration

PUBLIC

▪ General bug fixes and maintenance▪ Enable shrink-wrapping for AArch64 (GNUTOOLS-2476)▪ Investigate and initial RFC for better load store pair generation (GNUTOOLS-154)▪ Improved bit field handling instructions (GNUTOOLS-197)▪ Big Endian AArch64 fixes (Focused on SIMD and vectorisation correctness)▪ Improved Register move costs (GNUTOOLS-4528)▪ Misc performance improvements based on scheduler / backend tweaks (GNUTOOLS-4317,

GNUTOOLS-4508)▪ Improved csinc / csneg generation (GNUTOOLS-4335)▪ Conditional compares ▪ Core tuning: Cortex-A57, Cortex-A12 and Cortex-A17▪ IVOpts improvements▪ Memcpy for AArch64 – inlining and improved alignment

What’s NextGCC – Things to do before Stage 1 closes (mid-October 2014)

Page 23: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Stage 3▪ Bug fixes/Regression fixes.▪ Improved conformance and performance for Advanced SIMD Intrinsics.

▪ Stage 4▪ Regression fixes.▪ Help community get GCC 5.0 released.

What’s NextGCC – During Stage 3 (October – December 2014) and Stage 4 (Early 2015)

Page 24: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Maintenance▪ Support the architectural roadmap▪ Help community get Binutils 2.25 released.

What’s NextBinutils & GDB

Page 25: LCU14 303- Toolchain Collaboration

QuIC - GNU Toolchain Roadmap

Page 26: LCU14 303- Toolchain Collaboration

QuIC - GNU Toolchain Details

Page 27: LCU14 303- Toolchain Collaboration

ST - GNU Toolchain Roadmap

Page 28: LCU14 303- Toolchain Collaboration

ST - GNU Toolchain Details

Page 29: LCU14 303- Toolchain Collaboration

LLVM Collaboration

Page 31: LCU14 303- Toolchain Collaboration

● Become the compiler of choice for all Qualcomm processor cores● Today LLVM is the compiler of choice for DSP and GPU● Would like to see LLVM reach that level acceptance for CPU before the end of 2015

● Realize the full benefits of code hygiene on ARM from LLVM’s family of projects, i.e., sanitizers.

QuIC - Goals for LLVM

Page 32: LCU14 303- Toolchain Collaboration

● Collaborated with ARM on initial Aarch64 backend● Worked with the community on the ARM64/Aarch64 merge

● CortexA53 machine description● CortexA57 machine description

● Contributed initial Aarch64 ELF support to lld● ASAN bug fixes

QuIC - What has QuIC done with LLVM

Page 33: LCU14 303- Toolchain Collaboration

● Continue weekly collaborate with ARM on performance optimizations, particularly Aarch64.

● Greedy inliner● PGO● Incremental use of sanitizers

QuIC - What QuIC will be working on

Page 34: LCU14 303- Toolchain Collaboration

● Community Maintainership● LLVM 3.5 and LLVM 3.6 release maintainership

● Support● LLVM Kernel initiative, Android bugs, buildbots, member support

● LLVM Toolchain Stability● Assembler, compiler libraries, linker, tools, libc++

● LLVM Performance● Benchmarking & Profiling● Comparing against GCC/x86● Performance parity of 32-bit vs. 64-bit

● Sanitizers - might be covered under GCC development plan

● LLVM Linker● LLVM Integration on Android for Aarch64

Linaro - What’s Next For LLVM in Linaro?

current staff coverage line

Page 35: LCU14 303- Toolchain Collaboration

PUBLIC

MO

BIL

EEN

TERP

RIS

EC

OM

MO

N

2014 FutureH1 2015

Released

Development

Adv. Planning

Concept

LLVM 3.4

LLVM 3.5

LLVM 3.6

LLVM Roadmap : Cortex-A

v8 NEON - AArch64

Big Endian - Basic

Benchmarking infrastructure – Public performance tracking buildbot

L

libc++ buildbotInitial Autovectorization

L

AArch32 buildbot

Cortex-A53 - uArch tuning

L

AArch64 and Cortex-A57 - Performance tuningARM64 / AArch64 backend merge

Page 36: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Completion of the ARM64 and AArch64 backend merge▪ Performance improvements:

▪ Improve code generation for converting in-memory 16-bit integer to 64-bit float (LLVM-1508)▪ Optimistically use ‘sqrt’ instruction where available, and only fall back to a library call in the

presence of NaNs (LLVM-1509)▪ Reduce spilling of Q registers (LLVM-1538)▪ Improve code selection between conditional instructions and branches. (LLVM-1489)▪ A57 Fused multiply tuning (LLVM-1610)▪ Improve Global Value Numbering (LLVM-1612)

▪ Re-engineering of ARM Neon intrinsic support▪ Big Endian Support - AArch32 & initial AArch64 support▪ Stack size reduction patches – some work still to do.

What We’ve Done In the Past Three Months or SoLLVM Toolchain

Page 37: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Inline parameter tuning (LLVM-1500)▪ Improve spilling heuristics (LLVM-1524, LLVM-1504, LLVM-1586)▪ Common expression hoisting (LLVM-1247, LLVM-1490, LLVM-1550)▪ TBNZ and CBNZ optimization (LLVM-1575)▪ Register coalesce and rematerialization (LLVM-1582)▪ Redundant common comparison expressions (LLVM-1491)▪ Loop induction variable selection (LLVM-1492)▪ Remove redundant stores▪ Improved usage of vectorization opportunites using structs (LLVM-1501)▪ Reduce xzr assignment on cbz target (LLVM-1583)

What’s NextLLVM – Performance

Page 38: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Global variable store should be hoisted (LLVM-1493)▪ Too many MOVs on function call boundaries (LLVM-1504)▪ Optimise LDR, LDRSW sequence into LDR, SXTW (LLVM-1581)▪ Tune loop unrolling (LLVM-1587, LLVM-1590, LLVM-1646)

What’s NextLLVM – Performance

Page 39: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Buildbots & benchmarking infrastructure▪ Plan to setup a public performance tracking bot on Juno-A57▪ To be publicly visible, maintained, and continuously producing performance numbers▪ Running the LLVM LNT test-suite as a benchmark

▪ Various bug fixes and improvements▪ Focus on ARMv8-A, ARMv7, and ARMv6-M.

▪ Support for selected ACLE (non Neon) intrinsics

What’s NextLLVM - Other

Page 40: LCU14 303- Toolchain Collaboration

ST - LLVM Toolchain Roadmap

Page 41: LCU14 303- Toolchain Collaboration

ST - LLVM Toolchain Details

Page 42: LCU14 303- Toolchain Collaboration

System Libraries, Tools, Debuggers Collaboration

Page 43: LCU14 303- Toolchain Collaboration

● System Libraries● malloc benchmarking● malloc improvements● string and memory function optimizations for arm-linux-gnueabihf● Linaro GDB and glibc source package releases with backported optimizations

● GDB● Finish GDB on Android for ARMv8 support - catchpoints● Aarch32/Aarch64 completeness - test-suite parity● Aarch32 mix-mode debugging (thumb and arm modes)

Linaro - What’s next for system libs & dev tools?

Page 44: LCU14 303- Toolchain Collaboration

PUBLIC

▪ String routine improvements▪ Maintenance activities. ▪ Help community get 2.21 released.

What’s NextGlibc – up to 2.21 release (end of 2014)

Page 45: LCU14 303- Toolchain Collaboration

PUBLIC

▪ Linkers: LLD & Gold▪ Libc++▪ Sanitizers▪ ILP32

What We Are Not Currently DoingBut Are Interested In…

Page 46: LCU14 303- Toolchain Collaboration

QuIC - Libraries, Linkers, Debuggers, Tools

Page 47: LCU14 303- Toolchain Collaboration

ST - Libraries, Linkers, Debuggers, Tools

Page 48: LCU14 303- Toolchain Collaboration

More about Linaro Connect: connect.linaro.org Linaro members: www.linaro.org/membersMore about Linaro: www.linaro.org/about/