24
2017 February 07 Hieu LE ([email protected] ) Fujitsu Vietnam Limited PODC (Platform Offshore Development Center) Vietnam OpenStack Community - VFOSSA Logging/Request Tracing in Distributed Environment Copyright 2017 Fujitsu Vietnam Limited

Apricot2017 Request tracing in distributed environment

Embed Size (px)

Citation preview

Page 1: Apricot2017 Request tracing in distributed environment

2017 February 07Hieu LE ([email protected])Fujitsu Vietnam LimitedPODC (Platform Offshore Development Center)Vietnam OpenStack Community - VFOSSA

Logging/Request Tracing in Distributed Environment

Copyright 2017 Fujitsu Vietnam Limited

Page 2: Apricot2017 Request tracing in distributed environment

/me

2 APRICOT 2017

Hieu LEVietnam Official OpenStack Community OrganizerVFOSSA Executive MemberOpenStack Project leader @ FujitsuOpenStack ATC/AUCEmail: [email protected]

Page 3: Apricot2017 Request tracing in distributed environment

Outline

3 APRICOT 2017

1. Intro

2. Current Logging solution

Pros

Cons

3. Tracing requirements

4. Request tracing

Demo with OpenStack

Page 4: Apricot2017 Request tracing in distributed environment

Intro

4 APRICOT 2017

Distributed Environment:

Cloud Computing – Fog Computing.

IoT environment.

Micro-services architecture.

Page 5: Apricot2017 Request tracing in distributed environment

IoT – Fog – Cloud

5 APRICOT 2017

(Virtual) Storage Services/Servers

Virtual Compute Resources

Virtual Network

O2M2 Thingworx DeviceHiveOther

Platforms

Multiple Clouds

- Routing+ Optimizing paths+ Data pre-processing

Page 6: Apricot2017 Request tracing in distributed environment

6 APRICOT 2017

• What if something happened in our system?

• How can we resolve the problems as quick as possible?

Page 7: Apricot2017 Request tracing in distributed environment

Current Logging solution (1)

7 APRICOT 2017

ELK, Graylog:

Collecting logs from systems and appliances.

Indexing and filtering RCA

Multiple Alert/Notify mechanisms.

Visualization based on user’s needs.

Page 8: Apricot2017 Request tracing in distributed environment

Current Logging solution (2)

8 APRICOT 2017

Pros: Quickly trouble-shoot problems of systems/appliances. Reduce cost for storing log, based on PCI DSS or HIPAA

requirements.

Cons: Mostly depend on systems/appliances log. Require more efforts on sizing/deploying, maintaining and operating

these logging solution. Ate up resources (mostly storage) May not suitable for small

sensors.

Page 9: Apricot2017 Request tracing in distributed environment

Current Logging solution (3)

9 APRICOT 2017

Example 01:

Single request for launching 01 VM in OpenStack cloud system can

go through at least 04 micro-services.

Log INFO level sometimes contain misleading information or not-

enough information for trouble-shooting

Turn on DEBUG log level

Too much information and eat up storage.

Hard to control the overhead threshold.

Page 10: Apricot2017 Request tracing in distributed environment

Current Logging solution (4)

10 APRICOT 2017

Example 02:

ELK/Graylog requires some tweaks and efforts on visualize,

collecting, profiling and RCA in distributed environment.

Consider following queries in environments with >10 services:

“Find me the root cause of all error requests where the requests

process X business.”

“Find me requests where the user was logged in and the request

took more than two seconds and a DB transaction was held open

for more than 500 ms.”

Page 11: Apricot2017 Request tracing in distributed environment

Tracing Requirements

Address the Data Explosion

Logs, Metrics, Events,Active/Passive Checks,

End-to-End DebuggingUnderstand what the real

issue is and what is affected when errors occur

VisibilityDeliver centralized

intelligence for cloud operations at scale

Operator NeedsResource UtilizationUnderstand resource

availability and utilization

Solution RequirementsAble to Collect,

Store and Access all types of data

in one place

Highly Performant and

Scalable Platform

Flexible Processing Pipeline that can support multiple use cases: diagnostics, root cause analysis,

SLA calculations, utilization reporting, …

Extensible Platform that can be extended to

support new types of data and processing

11 APRICOT 2017

Page 12: Apricot2017 Request tracing in distributed environment

Tracing Requirements

• Users need centralize solution that provide enoughinformation related to machine centric (monitor) andworkflow centric (tracing).

– Provide general picture for every workflow: thecommunication steps, req/resp time for each stepfor performance reviewing purpose.

– Show monitoring metrics of hardware/services foreach step at the time of investigation.

– Provide general purpose RCA method for quicklytroubleshooting.

12 APRICOT 2017

Page 13: Apricot2017 Request tracing in distributed environment

Workflow Centric solution quick survey

There are many solutions aim to tracing the workflow centric, divided into 3 categories: [1]

1. Explicit metadata propagation: inject tracing metadata into current system (Zipkin, Kieker, X-Trace, Tracelytics, Cloudera Htrace, ExplorViz, OpenTracing - CNCF)

2. Schema-based: rely on the event semantics of system and use temporal schema of custom log message for tracing. (Magpie)

3. Black-box tracing: rely on log analysis for inferring relationship among events. (Fchain, Netmedic)

[1]. HANSEL: Diagnosing Faults in OpenStack – IBM Research

13 APRICOT 2017

Page 14: Apricot2017 Request tracing in distributed environment

Workflow centric solutions (1)

14 APRICOT 2017

• Figure of traditional workflow

Service A Service B Service C Service D

Req

Page 15: Apricot2017 Request tracing in distributed environment

Workflow centric solutions (2)

15 APRICOT 2017

• Explicit metadata propagation

Figure of explicit metadata tracing workflow: inject metadata in request/response and send to tracing mechanism (Zipkin, Dapper..)

Service A Service B Service C Service DTracing

Mechanism

Req

Page 16: Apricot2017 Request tracing in distributed environment

Workflow centric solutions (3)

16 APRICOT 2017

• Explicit metadata propagation

Pros:

• Give enough detail for tracing the problems

• Highly scalability.

Cons:

• Must modify code base and inject meta-data into header of each request and response

• Increase network packet (maybe a little bit like Zipkin - around 500bytes)

Page 17: Apricot2017 Request tracing in distributed environment

Workflow centric solutions (4)

17 APRICOT 2017

• Schema-based: based on sematic of event generated from system (including OS, services and applications), then joining all related event schema for final inference.

Service A Service B Service C Service D

Authenticate

Authenticate

Authenticate

Get Image

Create port, IP and attach

Req Read/Write

DB

Event Listener

Page 18: Apricot2017 Request tracing in distributed environment

Workflow centric solutions (5)

18 APRICOT 2017

• Schema-based

Pros:• Less modification into code base

Cons:• Low scalability. (the result is delayed until all event are collected).

• Less details than explicit meta-data. (the semantic of event, the event list and also the way to join schemas define the success of this approach we need to build a warehouse of event semantic)

Page 19: Apricot2017 Request tracing in distributed environment

Workflow centric solutions (6)

19 APRICOT 2017

• Black-box tracing: collect logs of all services, then do analyzing all the logs and infer the root cause of problem.

Service A Service B Service C Service D

DB

Log Collector and Analyzer

LogsLogs Logs Logs

Logs

Page 20: Apricot2017 Request tracing in distributed environment

Workflow centric solutions (7)

20 APRICOT 2017

• Black-box tracing:

Pros:• No modification to code base.

Cons:• High error rate. (almost is probabilistic data mining approaches)

Page 21: Apricot2017 Request tracing in distributed environment

Example (1)

21 APRICOT 2017

Magpie: Schema-based

Page 22: Apricot2017 Request tracing in distributed environment

Example (2)

22 APRICOT 2017

Zipkin: Explicit metadata propagation

Page 23: Apricot2017 Request tracing in distributed environment

Demo with OpenStack

23 APRICOT 2017

OSProfiler: Explicit metadata propagation small library

Page 24: Apricot2017 Request tracing in distributed environment

Q & A

THANK YOU!

24 APRICOT 2017