Upload
pratik-jain
View
25
Download
2
Embed Size (px)
Citation preview
Understanding the Characteristics of Android Wear OSRenju Liu, Felix Xiaozhu LinPurdue ECEPresentation By: Pratik Jain
Motivation
Interactive wearables, like smart watches, are a newcomer to the spectrum of mobile computers.
Integrate computing even tighter with our daily lives.
Substantial increase in demand for smart watches.
Usage Patterns
&Device
Hardware
Users interact with wearable devices frequently throughout the daily use
Each interaction is short ( < 10s ), and is dedicated to a simple task
Due to the limited content that can be displayed on one screen, users spend a short time on one screen before switching to the next.
Tiny Battery capacity (200 – 400mAh) Slower CPU – Fewer cores Simpler CPU – Scaled-down but often architecturally
identical to handheld’s CPU
Android Wear OS
One of the most popular OSes for interactive wearables. Wearable OS with the most public information. Supports third-party applications and features a
resigned system UI, including Card for notifications, Context streams, and voice input.
Apps – renovated UI – Follow Android’s conventional programming paradigm – Written in Java – Compiled ahead-of-time – executed atop the managed Android Runtime.
Major OS components – System Server – Key daemon hosting the core OS services Surface Flinger – Daemon controlling UI animation Clockwork – OS shell that implements the system UI
Benchmark Scenarios
A benchmark suite that consists of 15 benchmarks falling into the following 4 categories: 1. Wakeup – Due to internal or external events, device
transits out of suspended mode and presents brief information. Due to frequent daily wakeup, energy efficiency is the most important metric.
2. Single input – A waking wearable device responds to a single input from the user. Because the user is waiting, the device needs to achieve low UI latency.
3. Continuous interaction – Users are interacting with the devicecontinuously. The resultant UI animation requires the device to produce a steady stream of graphic frames.
4. Sensing – For the execution of wearable apps, sensor data is sampled and processed periodically to collect context information.
METHODOLOGY
Experimental Setup
All the benchmarks are run on 2 state-of-the-art Android Wear devices
LG Watch R Samsung Gear Live
Qualcomm’s APQ8026 system on-chip Android Wear 5.0 “Lollipop”
Power Manageme
nt
Batteries have tiny contacts which are incompatible with commodity power monitors.
A compatible interface circuit is carved out from a smartphone battery.
Used the interface as an adapter between the smart watch and an external power monitor.
The battery interface carved out from Nexus 5
The interface (flipped) connected to the LG watch R
Toolset
Used the following to examine system behaviors at different levels and granularities
Systrace – for capturing global system events such as scheduling, I/O activities, and IPC
Android Runtime’s built-in function tracer – for recording function call history in individual processes
Linux perf – for sampling CPU performance counters.
Tackling profiling
overhead
Event Tracing – Major profiling overhead Memory overhead can be overwhelming in tracing
function invocations. 2 ways used to tackle
In quantifying global system behaviors, the paper only relies on system events. It collects function trace from extra runs.
In quantifying function-level activities, deduction of an overhead of 4 µs from each traced function invocation ( constant overhead ).
CPU Usage
CPU usage is collected at two granularities Task-level breakdown. An analyzer is built to identify
the tasks . Function-level breakdown. To further locate the
performancehot spots in System Server, the following 2 metrics are employed:
Exclusive CPU cycles are spent in the function’s own code Inclusive CPU cycles are spent in the function’s code as
well asin all subroutines being called
Both metrics include the time spent in both user and kernel spaces and do not cover the time when a task is off CPU due to being scheduled out.
Idle Time Analysis
Amount and duration of the observed idle episodes are unusual. Match some idle episodes to system events known to cause idle,
e.g. I/O and power management. Others often root in stalling of OS service in serving app’s
requests. IdleChecker, an analyzer that helps mapping anomalous idle
episodes to the responsible code regions, based on a simple rationale:
The function calls and IPC transactions spanning an anomalous idle episode are suspicious.
IdleChecker runs the following steps for each idle episode. Identifies suspicious app tasks that are blocked throughout the entire
idle episode but run after the episode. For each suspicious task, it identifies two suspicious CPU time
quanta:the one right before the idle episode and the one right after it.
Examines the suspicious quanta, looking for IPC transactions spanning across the idle episode.
Identifies the function invocations that either coincide with the IPCor span across the idle episode.
Thread-level Parallelism
Metric widely used for gauging an interactive system’s need for core count.
Average number of busy CPU cores during the non-idle time.
TLP - total time when no threads are running
- time when exactly i threads are running simultaneouslyn - number of cores available.
For measurement, all 4 cores are forced online
Microarchitectural behaviors
Microarchitecture design is a Mystery By using the Linux perf, the paper samples the
performance counters of the Cortex-A7 CPU on test devices.
Observe branch prediction, cache, and TLB in all benchmarks
RESULTS
Where do CPU cycles
go?
Intensive OS execution often dominates the global CPU usage.
Many costly OS services are likely to make software unnecessarily complicated
The CPU time distribution of hot functions is highly skewed.
Manipulating basic data structures consumes substantial CPUcycles.
Legacy OS functions may become serious performance bottlenecks
OS Execution Bottlenecks setLight(), Layout(), computeOom(), getSimpleName()
Idle Episodes
Plentiful and of a variety of lengths Improper OS Designs
Interference from voice UI Legacy support for device suspending
Performance overprovision during continuous Interaction
Design Implications Hunting OS inefficiencies Filling idle time with useful work
reducing CPU & GPU clock rates which will shrink idle episodes
predictive execution
Thread-level parallelism
Short interactions exhibit substantial TLP, which is on par with desktop workloads.
While apps are mostly single-threaded, OS daemons contribute to TLP significantly.
A wearable device needs at least two cores.
Microarchitectural behaviors
A significant mismatch exists between the OS and CPU microarchitecture, particularly in L1 icache, iTLB, and branch predictor.
The mismatch is largely due to the OS code complexity, and will not be eliminated by a unilateral enhancement of wearable CPU.
OS should be trimmed down to match the simplicity of its apps.
Related Work
Gao et al. find that smartphone workloads show limited TLP, concluding that they need no more than two cores.
ProfileDroid contributes an approach for charactering smartphone apps at multiple layers
Min et al. studies the battery usage of smart watches WearDrive creates synthetic benchmarks to shed light
on wearable storage. RisQ and TypingRing target gesture recognition iShadow tracks gaze in real time Ha et al. build wearable for cognitive assistance Cornelius et al. focus on user identification
Recap
In-depth analysis of one of the most popular wearable Oses, Android Wear.
Examination of 4 key aspects: CPU usage, idle episodes, TLP, and micro-architectural behaviors – in fifteen benchmarks.
Discovery of serious OS inefficiencies and system bottlenecks that were widespread but unknown before.
The results clearly point out the system bottlenecks for immediate optimization and have strong implications on future wearable system software and hardware design.
THANK YOU!