Story Points considered harmful – a new look at estimation techniques

PowerPoint Presentation

Story Points Considered Harmful

Or why the future of estimation is really in our past...

OOP 2012, Munich

All pictures available on

Vasco Duarte

@duarte_vascohttp://bit.ly/vasco_blog

Joseph Pelrine

@josephpelrinewww.metaprog.com/blogs

Blog:185 on 18.01.2012Slideshare: 48 on 18.01.2012

I used to live in a warm, sunny place: Portugal but

Tell me again: why did we move here? for some reason (which I dont understand yet) I decided to move to Finalnd :O)

Galileo started by asking questions about what he saw...

The Flat Earth Society (also known as the International Flat Earth Society or the International Flat Earth Research Society) is an organization that seeks to further the belief that the Earth is flat instead of an oblate spheroid

More at: http://theflatearthsociety.org

Just like we should...

Expert estimation

Consensus estimation

Function Point Analysis

COCOMO

SDM

Weve all been exposed to various estimation techniques. Just quickly: can you name a few? Expert estimationConsensus estimationOther complex estimation techniques like: Gantt, PERT, Function Point Analysis. Then we have cost vs time with cost techniques like: COCOMO, SDM, etc.

And of course, the topic for today: Story Point Estimation.What do all of these have in common? They all look at the future. Why is this important?

Precognition [pree-kog-nish-uhn] precognition [pree-kog-nish-uhn] knowledge of a future event or situation, especially through extrasensory means.

Because looking at the future is always difficult. We humans are very good at anticipating immediate events in the physical world, but in the software world what we estimate is neither immediate, nor does it follow any physical laws that we intuitively understand!

The indisputed fact is that we, humans are very bad at predicting the future. But that is not all! Lately and especially in the agile field we have been finding a new field of study: Complexity Sciences.

(Hindsight is always twenty-twenty)-Anonymous (the other one!)

Life Can only be understood backwards, but it must be lived forwards- Soren Kierkegaard

Retrospective Coherence...Fibonnaci series as an example of this

A field of study that tries to identify rules that help us navigate a world where even causality (cause and effect) are challenged. An example may be what you may have heard of: the Butterfly effect...

Complexity Sciences are helping us develop our own understanding of software development based on the theories developed in the last few years. Scrum being a perfect example of a method that has used complexity to inspire and justify its approach to many of the common problems we face in Software development. Scrum has used self-organization, and emergence as concepts in explaining why its approach works. However, theres a catch.

GREEN

What do you think of when you read this word? This is just a simple example of something that Complexity Sciences explore. In a complex environment we dont have discernable causality!Some times this is due to delayed effects from our actions, most often it is so that we attribute causality to events in the past when in fact no cause-effect relationship exists (restrospective coherence). But, in the field of estimation this manifests itself in a different way. In order for us to be able to estimate we need to assume that causality exists (if I ask Tom for the code review, then Helen will be happy with my pro-activeness and give me a bonus). The fact is that in a complex environment, this basic assumption that we can assume causality is not valid! Without causality, the very basic assumption that justifies estimation falls flat!

Cognitive ping-pong to re-inforce the complexity point.

To be or not to be complex! That is the question!

So, which is it? Do we have a complex environment in software development or not? If we do then we cannot at the same time argue for estimation (and build a whole religion on it)!And, if we are not in a complex environment we cannot then claim that Scrum, with its focus on solving a problem in the complex domain, can work!

So then, the question for us is: Can this Story Point based estimation be so important to the point that its creator now claims exhorbitant amounts of money to teach you what either does not work, of fully invalidates the method (scrum) which he will also happily sell you?

Luckily we have a simple alternative that allows for the existence of a complex environment and solves the same problems that Story Points were designed (but failed to) solve.

Not everything scales Not everything scales equallyNot everything scales linearly

Looking for an alternative...

The alternative to Story Point estimation is simple: Just count the number of Stories you have completed (done) in the previous iterations. They are the best indicator of future performance! Then use that information to project future progress. Basically, the best predictor of the future is your past performance!Can it really be that simple? To test this approach I looked at data from different projects and tried to answer a few simple questions

Q1: Is there sufficient difference between what Story Points and number of items measure to say that they dont measure the same thing?

Q2: Which one of the two metrics is more stable? And what does that mean?

Q3: Are both metrics close enough so that measuring one (# of items) is equivalent to measuring the other (Story Points)?

Here are the questions that I started with...

Data summary

Nine (9) data sets

I was not a stakeholder or had any role in any of these projects

Data came from different companies and different sized teams

The Data

Correlation: 0,755

Team A / Company N

Team HC / Company N

Correlation (w/out) normalization: 0,83

Correlation (w/out normalization): 0,92

Team CB / Company N

Team CF / Company N

Correlation: 0,51(0,71 without the spr14)

!!

The Data

Team HCM / Company N

Correlation (w/out normalization): 0,88

Correlation = 0,86

Team A / Company JO

Correlation: 0,70

Team 2 / Company RF

Correlation: 0,75

Team 1 / Company RF

The Data

What does this mean:

Q1: With so high correlation it is likely that both metrics represent a signal of the same underlying information.

Q2: The normalized data has similar value of Standard Variation (equaly stable). No significant difference in stability

Q3: They seem to measure the same thing so...

Team AT / Company AT

Correlation: 0,75

We should analyse the claims that justify Story Points...

Claim 1: allows us to change our mind whenever we have new information about a story

Claim 2: works for both epics and smaller stories

Claim 3: doesnt take a lot of time

Claim 4: provides useful information about our progress and the work remaining

Claim 5: is tolerant of imprecision in the estimates

Claim 6: can be used to plan releases

Source: Mike Cohn, User Stories Applied, page 87

Claim 1: allows us to change our mind whenever we have new information about a story

No explanation about what this means in the User Stories Applied book

Measuring completed number of items allows for immediate visibility of the impact of the new items in the progress (project burndown)

Claim 2: works for both epics and smaller stories

Allowing for large estimates for items in the backlog does help to account for the impact of very large items by adding uncertainty.

The same uncertainty exists in any way we may use to measure progress. The fact is that we dont really know if an Epic (say 100 SPs) is really equivalent to a similar size aggregate of User Stories (say 100 times 1 SP story). Conclusion: there is no significant added information by classifying a story in a 100 SP category.

Exchange rate how people transform SPs into time (real understandable)

Story Points

Hours

Claim 3: doesnt take a lot of time

Not my experience. Although some progress has been done by people like Ken Power (at Cisco) with the Silent Grouping technique, the fact that we need such technique should dispute any idea that estimating in SPs doesnt take a lot of time

Silent Grouping technique: http://slidesha.re/AgileKonstanz_silentgrouping

Claim 4: provides useful information about our progress and the work remaining

This claim holds if, and only if you have estimated all of your stories. Even the stories that will only be developed a few months or even a year later (for long project). This approach is not very efficient (Claim 3).

Basing your progress assessment on the Number of Items completed in each Sprint is faster to calculate (# of items in the PBL / velocity per Sprint = number of Sprints left) and can be used to provide critical information about project progress. Example:

Shelf-life and how long term estimates dont work

The example you are about to see is a real life example. One where the data collected made a big impact on an important business decision.

The names have been changed to protect the innocent...

Sprint x

Evolution of velocityStart of pilot/betaRelease dateStart of pilot/betaActual progress trendWhat progress trend should beWhat progress trend should be

This was a very important project for this company. They were suffering pressure in a market segment that was very important for the revenues, and they needed a good product with lots of featuresHeres how you should read this graph, let me go slow to make sure you all understand this

Sprint x + 1

The Velocity Bet

Their history stated the following velocity evolution in the last 3 sprints: 1 8 8

They were learning the product and area in the first few sprints, which allowed for a getting-up-to-speed assumption. Additionally they had committed to 15 items in the Sprint planning meeting.

The product Owner stated that the R&D team would start doing 15 items per sprint (which would help them meet the goal of releasing the pilot and the release on time.)

What was the result after the sprint?

Lets quickly survey the room. How many items did that team complete in the third Sprint ? Why?

Sprint x + 2

They did 10 items. A 20% increase in velocity.

Finally...

We release Stories/Backlog items, not story points...

The final question is really: why do we really estimate stories in Story Points and have built an entire religion with ceremonies, sacrifices, priests and a pope of Story Point estimation? The fact is that we dont release Story Points! When all is said and done the question that me or you as a customer will be interested in answering is: does this product have the story that I am interested in?

The Number of Items technique in a nutshell

When doing Backlog Grooming or Sprint Planning just ask: can this Story be completed in a Sprint by one person? If not, break the story down!

For large projects use a further level of abstraction: Stories fit into Sprints, therefore Epics fit into meta-Sprints (for example: meta-Sprint = 4 Sprints)

Why it works

By continuously harmonizing the size fo the Stories/Epics you are creating a distribution of the sizes around the median:

Assuming a normal distribution of the size of the stories means that you can assume that for the purposes of looking at the long term estimation/progress of the project, you can assume that all stories are the same size, and can therefore measure progress by measuring the number of items completed per Sprint.

Vasco Duarte

@duarte_vascohttp://bit.ly/vasco_blog

Joseph Pelrine

@josephpelrinehttp://metaprog.com/blogs

Blog:185 on 18.01.2012Slideshare: 48 on 18.01.2012

SP normalizedItems Normalized

12345678910111213141516171819202122

Story point velocityItem velocity

12345678910111213141516

ActualProjected before additionProjected after addition

abcdefghijklmnopqrstv

Series1

1234567891011121314151617181920212223

Click to edit Master title style

Click to edit Master text styles

Second level

Third level

Fourth level

Fifth level

17.02.2012

17.02.2012

Click to edit Master text stylesSecond level

Third level

Fourth level

Fifth level


Click to edit Master subtitle style

17.02.2012

17.02.2012


Third level

Fourth level

Fifth level



Second level

Third level

Fourth level

Fifth level

17.02.2012

17.02.2012


Third level

Fourth level

Fifth level



17.02.2012

17.02.2012


Third level

Fourth level

Fifth level



Second level

Third level

Fourth level

Fifth level


Second level

Third level

Fourth level

Fifth level

17.02.2012

17.02.2012


Third level

Fourth level

Fifth level




Second level

Third level

Fourth level

Fifth level



Second level

Third level

Fourth level

Fifth level

17.02.2012

17.02.2012


Third level

Fourth level

Fifth level


17.02.2012

17.02.2012


Third level

Fourth level

Fifth level

17.02.2012

17.02.2012


Third level

Fourth level

Fifth level



Second level

Third level

Fourth level

Fifth level


17.02.2012

17.02.2012


Third level

Fourth level

Fifth level



17.02.2012

17.02.2012


Third level

Fourth level

Fifth level



Second level

Third level

Fourth level

Fifth level

17.02.2012

17.02.2012


Third level

Fourth level

Fifth level



Second level

Third level

Fourth level

Fifth level

17.02.2012

17.02.2012


Third level

Fourth level

Fifth level

Story point velocityItem velocity

1234567891011121314151617

SP normalizedItems normalized

Sprint ASprint BSprint CSprint DSprint ESprint F

Sp NormalizedItems done normalized

spr Fspr Espr Dspr Cspr Bspr A

Story pts doneitems done

spr22spr21spr20spr19spr18spr17spr16spr15


spr21spr20spr19spr18spr17


sprint 40sprint 41sprint 39sprint 38sprint 37sprint 36sprint 35sprint 34

Story pts doneitems done

spr14spr13spr12spr11spr10spr9spr8spr7

Technology

Story Points considered harmful – a new look at estimation techniques