21
Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Embed Size (px)

Citation preview

Page 1: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Tizen Architectural Specification: Crash Reporting

TIZEN ADS 0000Ver. 0.42013-11-18Leonid Moiseichuk

Page 2: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Contents

• Introduction• Legacy solutions• Architecture• Detailed Architecture• Appendix

Page 3: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Introduction

Crash reporting for embedded system has number of differences in comparison to desktops/servers by limitations and amount of devices.

For example, we cannot have installed debugging symbols because they consume several hundreds megabytes of space. Thus we cannot get backtrace on device.

On the other hand, the centralized secure crash information storage opens new opportunities to make a cross-component analysis to identify most important issues to be fixed first, often across several products if server part will support it.

This presentation will show how we can improve existing solution by ex-tending towards server-based approach.

Page 4: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Copyright © 2012 Samsung Electronics, Co., Ltd. All rights reserved. 4

Introduction | Feature Overview

• Easy crash collection on release images – just install Crash Reporter packages. They might be installed always in all images as well but in disabled state.

• Kernel and application crashes/oopses will be collected as well any kind of device runtime information, the crash reasons coverage will be closer to 100%

• No symbols required but they might be installed if developer needs to analyze traces on device e.g. for security reasons

• All possible information will be collected in the moment of crash and it will simplify analysis later by developer to fix issue.

• Using centralized processing allows to identify most critical/often issue and verify integrated fix based on statistics from device population i.e. absence/reduction of new crashes with the same backtrace.

• Integration with test cases (auto-upload), JIRA and probably sources indexing services (we have used Mozilla MXR) – it will reduce a lot efforts to issues reporting, identification, prioritization, fixing and verification.

• Secure dumps/crashes delivery from device to collection servers depending of dump type and application.

Page 5: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Legacy architecture: Tizen

Page 6: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Legacy architecture: TizenThe legacy implementation has the following areas which would be nice to improve:• kernel oopses (crashes) are not supported• using preload library libsys-assert.so lead to unwanted code

execution during any application startup and required symbols installed on device

• Crash Worker starts to do work after crash in short but non-controllable time – so reported data outdated for crash

• the core dump files are large (in theory up to 3 GB size), thus not in all cases we can copy core file as expected in workflow

• the processing crash jeopardizes device consumer qualities like performance and reliability (many copy operations with large files, using gzip, use the same space to store data)

• the server part ccr.samsung.com was turned off due to security reasons, and that makes cross-analysis very difficult even practically not possible.

Page 7: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Android native – Google Breakpad

Page 8: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Android native – Google BreakpadThe Google Breakpad is a best multiplatform solution but:

• Required linking for every process, according to documentation it leads to code changes but it can be done as a shared library

• Not all application crashes can be handled – only after Breakpad initialized and if signal not handled by application

• Produces minidumps based on:• ptracing crashed process – required CAP_SYS_PTRACE• processing core file – which might be up to 3 GB size

• Dump generation done from server – so we may not have dumps when server is not started, crashed or already shut

• By the way, debuggerd leaks in 4.2.1 about 25 MB dirty memory per 1000 crashes

• File format is very strict and not compressed• Processor is already implemented and works for clients from Linux,

Android, Windows, iOs, MacOS, Solaris, arm, x86, x86_64, ppc, ppc64, mips, sparc

• The kernel panics are not supported at all even Android Panic facility is a part of kernel (apanic.c and apanic_mmc.c)

• The VM crashes are not supported

Page 9: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Android VM – e.g. ACRA/Acralyzer

Page 10: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Android VM – e.g. ACRA/AcralyzerThere are huge amount VM-based crash reporters, common

problems:• Covers just a VM cases and provides just a basic information

about system• Expected to have server available on-line, have unsecure and

non-throttling connectivity, or have a problems with logic (e.g. cannot send – delete file)

• Analyzer (server) part is primitive or just proprietary

Page 11: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture

Page 12: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture: on device part

Page 13: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture: oops processing

Page 14: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture: crash processing

Page 15: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture: VM crash processing

Page 16: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture: server farm

Page 17: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture: dump file format

Page 18: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture: configurability• The uploader and crash reporting controlled from Settings, could

be part of product but in disabled mode• The uploader partition could be not used for production devices

(not expected many crashes)• Configuration for type=crash may looks the following:• /etc/dumper -- main configuration folder

• config -- general configuration file• crash/ -- configuration for crash reporting

• config -- file to be used for all crashes• app1 -- crash settings for app1 if non-standard

• config -- e.g. own upload server or files• app2 -- crash settings for app2 if non-standard

• etc…..

• statistics/ -- configuration for statistics• config -- file to be used for all statistics uplods

Page 19: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Architecture: remarksThe proposed architecture is not a final and it is mostly a process due to crash reporting service will require constant work to cover new builds/requests from Customers. Running through /proc/../core_pattern avoid any impact to userspace until crash happened. Absense of daemon and having separate non-reflashable partitions guaranteed that crashes will be delivered from bricked device after re-flashing, if reboot happened during uploading and in other cases.

The adaptation of proposals to other components (MobileCare etc.) is a next step in this process, most likely some pieces should be re-used or replaced because implemented in a better way that I could imagine based on my experience.

The lifelogging (e.g. memory, power, system logs), kernel OOPS support, basic and extended crash dumping, support for VM problems reporting from Java, Python, etc. could be done on the further steps and in parallel.

Page 20: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Appendix

Page 21: Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk

Copyright © 2012 Samsung Electronics, Co., Ltd. All rights reserved. 21

Version Author Description Date

0.1Leonid Moise-ichuk

First draft, mostly focus on overview solutions around 08-Nov-2013

0.2Leonid Moise-ichuk

Updated according to comments and discussion with Tizen Crash Reporter team 08-Nov-2013

0.3Leonid Moise-ichuk

Small polishing to explain relations in between Breakpad and proposed approach 11-Nov-2013

0.4Leonid Moise-ichuk

Removed shell scripts according to security hardening actions proposed for requirements ver-sion 0.11

18-Nov-2013

Revision History