45
METRIC ABUSE Frequently Misused Metrics in Oracle

Metric Abuse: Frequently Misused Metrics in Oracle

Embed Size (px)

DESCRIPTION

This is a presentation I created for RMOUG 2014 which I was sadly unable to attend. However, I wanted to share it with the Oracle community so that you can learn a bit about metrics that are frequently cited, frequently demonized, and frequently misused. In this deck we will go through the steps to diagnose issues and what NOT to blame as you go through the process. The topics and concepts discussed here were originally formed in a blog post on the OracleAlchemist.com site: http://www.oraclealchemist.com/news/these-arent-the-metrics-youre-looking-for/

Citation preview

Page 1: Metric Abuse: Frequently Misused Metrics in Oracle

METRIC ABUSEFrequently Misused Metrics in Oracle

Page 2: Metric Abuse: Frequently Misused Metrics in Oracle

Steve Karam

Technical Manager at Delphix Oracle Certified Master, ACE, and other

acronyms Just a little social

Blog: http://www.oraclealchemist.comTwitter: @OracleAlchemistGoogle Plus: +SteveKaramFacebook: OracleAlchemist

Page 3: Metric Abuse: Frequently Misused Metrics in Oracle

Hunting for Metrics

Oracle has more metrics than you can shake a stick atAutomatic Workload Repository (AWR)Active Session History (ASH)STATSPACKV$ and X$ views

Page 4: Metric Abuse: Frequently Misused Metrics in Oracle

These Aren’t the Metrics You’re Looking For

The problem is not a lack of data, it’s knowing how—and when—to use it.

Page 5: Metric Abuse: Frequently Misused Metrics in Oracle

The Database is…

Broken Slow Down Not working Giving me errors

Step 1: What is the actual problem?

Page 6: Metric Abuse: Frequently Misused Metrics in Oracle

I’m going to…

Gather stats Add an index to something Bounce the database Blame the SysAdmins Blame the code Kill the backup

Step 2: Don’t be hasty. Suppress kneejerk reactions. They have no place in problem analysis.

Page 7: Metric Abuse: Frequently Misused Metrics in Oracle

I think I know what to do!That’s great! Thinking is good. But if you only think you found a solution, chances are good that there’s more to it.

Step 3: Don’t immediately think in terms of fixes. Think in terms of findings and recommendations.

Page 8: Metric Abuse: Frequently Misused Metrics in Oracle

Gathering StatsJust like the optimizer needs to gather stats for proper query analysis, you need to gather stats for problem analysis.

Think of it like a popular TV medical drama:

Your database is the patient. It’s their job to be sick. Your end users are the concerned family and friends.

It’s their job to be panicky. You are the doctor and team. It’s your job to be brilliant.

Step 4. Be brilliant.

Page 9: Metric Abuse: Frequently Misused Metrics in Oracle

Problem Analysis

Okay, so “Be Brilliant” isn’t a good step. At this point, what you really need to do is choose a path to solving the issue at hand. There are a few methods for doing this: Top down: Review events and waits at a global

level and drill down from there. Scientific Method: Do background research,

form a hypothesis, test your hypothesis, analyze the outcome.

Differential Diagnosis: shrink the probability of various issues using the process of elimination.

Page 10: Metric Abuse: Frequently Misused Metrics in Oracle

The Top Down Approach

Top down tuning is a viable method, and is almost always preferable to bottom up tuning.

This is very useful when you know the issue is global and you need to drill down into a root cause. It’s good when things suddenly go wrong; however, it can be difficult when there are multiple root causes.

Page 11: Metric Abuse: Frequently Misused Metrics in Oracle

Scientific Method

This method is highly effective at ensuring factual resolutions to problems. While it may not always be suitable for quickly resolving a critical issue, it’s always suitable for case studies and post-fix root cause analysis.

Page 12: Metric Abuse: Frequently Misused Metrics in Oracle

Differential Diagnosis (DDX)This method is great for global issues where the root cause is unknown and no significant change has occurred.

Gather information List symptoms List possible conditions

based on the symptoms Test Test Test Eliminate conditions Don’t kill the patient

Page 13: Metric Abuse: Frequently Misused Metrics in Oracle

Speaking of House

In the show “House”, the main character has a saying: Everybody lies.

DBA: So everyone, what changed?Developer: Nothing.SysAdmin: Nothing.Network Admin: Nothing.Project Manager: Nothing.

Page 14: Metric Abuse: Frequently Misused Metrics in Oracle

You never told us the real Step 4

Step 4 is simple.

Solve the problem.

Page 15: Metric Abuse: Frequently Misused Metrics in Oracle

Well sure, but how?

The methods we’ve discussed are all well and good for looking into problems and figuring out how the cause and a solution. For the most part, it will be up to you to: Gather the right metrics Synthesize your data Create findings and recommendations Test for success

Page 16: Metric Abuse: Frequently Misused Metrics in Oracle

What are the right metrics?There are tons of papers and articles out there on wait events, metrics, and other metadata you should look for. We’re not here for that.

There are guides on how to use the metrics you find. We’re not here for that either.

No, we’re here to discuss…

Page 17: Metric Abuse: Frequently Misused Metrics in Oracle

WHAT PEOPLE DO WRONG

And how we can fix that

Page 18: Metric Abuse: Frequently Misused Metrics in Oracle

#5: db file scattered readWhat it is: An indication of a multiblock I/O

What it is not: A full table scan A reason to panic The culprit (not always, anyways)

Page 19: Metric Abuse: Frequently Misused Metrics in Oracle

#5: db file scattered readThe ‘db file scattered read’ event happens when Oracle performs a multiblock I/O; for instance, when a full table scan occurs.

Index full scans and fast full scans also result in multiblock I/O. But those don’t sound so horrible, now do they?

Why is that?

Page 20: Metric Abuse: Frequently Misused Metrics in Oracle

#5: db file scattered readOver the years, DBAs and developers have cultivated a mortal terror of full table scans. Of course, they can be a problem, but are they always the problem? Of course not.

Some facts about db file scattered reads: They are an incredibly optimal way to utilize disk

to gather large amounts of unordered data They aren’t the only indication of full scans or

multiblock I/O. direct path read and db file parallel read events also are.

Page 21: Metric Abuse: Frequently Misused Metrics in Oracle

#5: db file scattered readBefore you go off on a witch hunt because of a ‘db file scattered read’ event, consider the following:

Are there any indications that full scans are actually the problem?

Are you sure that an index read would be more efficient in this case?

Do your other symptoms match up with the conclusion that a query performing a full table scan is your culprit?

Full table scans are the devil!

Page 22: Metric Abuse: Frequently Misused Metrics in Oracle

#4: Parse to Execute RatioWhat it is: An indication of how often you’re parsing

vs. executing queries

What it is not: An indication of how often you’re hard

parsing vs. executing queries

Page 23: Metric Abuse: Frequently Misused Metrics in Oracle

#4: Parse to Execute RatioBased on this formula:

round(100*(1-:parse/:execute),2)

If you hard parse a query and then execute it, your Execute to Parse % is 0.

If you soft parse a query and then execute it, your Execute to Parse % is 0.

Page 24: Metric Abuse: Frequently Misused Metrics in Oracle

#4: Parse to Execute RatioWhat about all those articles and forum posts that say adding bind variables will improve your Execute to Parse %?

They’re not wrong, but incomplete. Adding bind variables will improve your Execute to Parse %... IF you have some form of statement caching enabled.

Page 25: Metric Abuse: Frequently Misused Metrics in Oracle

#4: Parse to Execute Ratio

Hard Parses can take up valuable CPU cycles Soft Parses can still cripple your Oracle

instance The best way to reduce library cache

contention is to not touch it at all!

Page 26: Metric Abuse: Frequently Misused Metrics in Oracle

#4: Parse to Execute RatioTom Kyte said it best:

there are three types of parses (well, maybe four) in Oracle... there is the dreaded hard parse - they are VERY VERY VERY

bad. there is the hurtful soft parse - they are VERY VERY very bad. there is the hated softer soft parse you might be able to

achieve with session cached cursors - they are VERY very very bad.

then there is the absence of a parse, no parse, silence. This is golden, this is perfect, this is the goal.

Page 27: Metric Abuse: Frequently Misused Metrics in Oracle

#4: Parse to Execute Ratio To see hard parses vs. soft parses, check out

the Parse Count (total) and Parse Count (hard) in an AWR report or V$ views

To reduce parsing as a whole (the actual goal), make sure the code does not explicitly parse per execution OR that the client software has statement caching enabled.For example, in JBoss, you can set the prepared-

statement-cache-size parameterSESSION_CACHED_CURSORS is not the same

thing!

Page 28: Metric Abuse: Frequently Misused Metrics in Oracle

#3: Buffer Hit Ratio

What it is: Another ratio A proportional view of LIOs to PIOs

What it is not: A silver bullet A magic ratio A valuable performance indicator

Page 29: Metric Abuse: Frequently Misused Metrics in Oracle

#3: Buffer Hit Ratio

Wait, buffer hit ratio isn’t valuable?

Okay, maybe that was a little heavy handed. It can be valuable as an “at-a-glance” metric to see if something is absolutely abysmal.

Page 30: Metric Abuse: Frequently Misused Metrics in Oracle

#3: Buffer Hit Ratio

It is important to remember that a high buffer hit ratio doesn’t necessarily mean the data you needed was available in cache when it was needed. It also doesn’t mean the queries you’re running are optimal…they just happen to be getting their data from cache.

100% of crap in RAM is still crap. It’s just logical crap.

Page 31: Metric Abuse: Frequently Misused Metrics in Oracle

#3: Buffer Hit Ratio

So what is it good for? If you know your queries are perfect

(lolright) then it can indicate that you don’t have enough RAM allocated to your buffer cache.

That’s it, I just have a second bullet here to keep the other one company.

Page 32: Metric Abuse: Frequently Misused Metrics in Oracle

#2: CPU %

What it is: CPU Usage per CPU

What it is not: Equivalent to your laptop’s CPU % A viable measure of CPU usage (alone) A way to diagnose performance

Page 33: Metric Abuse: Frequently Misused Metrics in Oracle

#2: CPU %

This isn’t your Windows laptop.

When your PC shows 99% or 100% CPU usage, you panic. That’s because you only have one CPU (usually), and 99% means you can barely drag a window from one side of the screen to the other.

Page 34: Metric Abuse: Frequently Misused Metrics in Oracle

#2: CPU %

In the multi-processor world, it’s not as big of a problem. In fact, it can be a huge benefit.

You have multiple CPUs on your servers. 99% usage of one or more is probably not a big deal.

CPU is the processor, and the part of the system that performs work (as opposed to wait). You want this to be heavily utilized.

Page 35: Metric Abuse: Frequently Misused Metrics in Oracle

#2: CPU %

What do you pay licensing based on?

Number of CPUs.

So what do you actually want to be as fully utilized as possible?

Page 36: Metric Abuse: Frequently Misused Metrics in Oracle

#2: CPU %

Instead, we should be looking at:

Runqueue length – Provided by vmstat, uptime, top, and other tools. Shows the number of processes actively waiting or working on CPU at any given time.

Oracle Average Active Sessions – This metric is usually more pertinent from the DBA side, as it shows the number of sessions actively waiting or working at any given time.

Page 37: Metric Abuse: Frequently Misused Metrics in Oracle

#2: CPU %

The focus should be on concurrency Using a single CPU heavily is only a

problem if the other CPUs are fairly dormant…but that’s another issue entirely.

Even run queue is not a perfect metric—some things, like uninterruptable I/O wait, can skew the results.I/O wait should be part of the bigger picture

along with run queue length.

Page 38: Metric Abuse: Frequently Misused Metrics in Oracle

#1: Cost

What it is: A numerical estimation proportional to

the expected resources necessary to execute a statement with a given plan.

What it is not: Anything else.

Page 39: Metric Abuse: Frequently Misused Metrics in Oracle

#1: Cost

This one comes up all. the. time.

Here’s a simple thing to keep in mind: Oracle’s optimizer

is cost based Your tuning

practices are not

Page 40: Metric Abuse: Frequently Misused Metrics in Oracle

#1: Cost

Cost is good to understand, so you can understand why Oracle chose the plan it did.

However, you shouldn’t try to tune specifically to reduce cost.

Cost is not proportional to time. A high or low cost doesn’t necessarily mean a query will be slower or faster.

Page 41: Metric Abuse: Frequently Misused Metrics in Oracle

#1: Cost

Why is cost misused?

“Gather stats” is like the “restart Windows” of the Oracle world. Gathering stats changes plans. Plans have costs. I should tune costs.

The cost based optimizer changed my plan. It’s cost based. I’m cost based.

Page 42: Metric Abuse: Frequently Misused Metrics in Oracle

#1: Cost

Cost is not a bottleneck, nor is it indicative of actual work. It’s indicative of relative work based on parameters that exist purely in the calculations of your particular Oracle instance.

Instead of tuning to reduce cost, tune to reduce bottlenecks. Those are real things that cause real wait.

Page 43: Metric Abuse: Frequently Misused Metrics in Oracle

#1: Cost

Real things to tuneReduce block touches (both physical and

logical) by improving your query selectivity, join order, index usage, etc.

Reduce parses, both hard and soft.Investigate execution plans and use

statistics, hints, or other methods to improve Oracle’s costing—just don’t try to ‘tune down cost’ directly.

Page 44: Metric Abuse: Frequently Misused Metrics in Oracle

Step 4…

Step 4, if you remember, was “solve the problem.”

That advice still stands.

But make sure you use

the right metrics to do it.

And good luck!

Page 45: Metric Abuse: Frequently Misused Metrics in Oracle

Q&A?