Sunday, March 25, 2012

Safety in estimates

For software projects most of the times it is necessary to estimate the effort upfront in order to secure funding, obtain approval or find what resources are missing and need to be hired. When a new project is explored developers start with the vague specification, divide the functionality in user stories or use cases and estimate the time they will need to implement each one of them.
Below is presented a histogram for the cycle time for a real project.

1 day cycle time has 64.15 % probability of occurring, 2 days have 79.25% and 3 days have 96.23% probability of occurring. The higher the uncertainty the longer the tail of the distribution. The median of the distribution is 1 day.
The difference between the median of the probability distribution and the actual estimate is the safety we put in. We include safety in order to protect from uncertainty or in other words Murphy’s Law.
In most environments there is little positive incentive, if any, to finish ahead of time.
When we add safety or pad the estimate in order to provide estimate with a very high probability such as more than 90% we will give 3 days. When we compare the time indicated by the median (1 day) to the time indicated as reasonable estimate (3 days) it doesn't look like the safety added is in the range of 31%. It is 300 percent!
The cycle time estimate that gives us 50% chance is much shorter than the one that provides 90% chance of completing a task before the estimated cycle time. And the higher the uncertainty the bigger the difference between the two.
Uncertainty exists. People are not blind to it and they do add a lot of safety in their estimates.

Reasons why safety is inserted into the time estimates

1. Pessimistic experience

The time estimates are impacted in a major way by the last overrun the developer had.

2. Dark matter

Dark matter are tasks that need to be done, but aren't captured in the estimates.

3. Every management level adds safety.

The larger the number of management levels involved, the higher the total estimation because each level adds its own safety factor.

4. Protect estimations from a management cut

Top management is frequently unhappy with the final estimation of when the project is expected to be completed. They need the results sooner for political or business reasons. So in some cases when all the tasks are estimated management will demand that the total time be cut off by a percentage – say 20 percent. As time goes developers get used to it so they will inflate the task estimates by the same percent.

5. Having a bonus if the project delivers according to estimates

People have a tendency to strongly prefer avoiding losses to making gains. That is called loss aversion and it was demonstrated by Amos Tversky and Daniel Kahneman. It explains the shape of the prospect theory utility graph. Prospect theory is a behavioural economic theory that describes 
how people choose between probabilistic alternatives and evaluate potential losses and gains.

Let's say that the person estimating a task will be positively or negatively affected by how correct estimate is - for instance there will be a bonus if the project delivers according to estimates. That actually would imply that there will be no bonus if the estimates are incorrect - which is a loss!
Hence the person will want to avoid the loss by trying to make the estimate as safe as possible - by putting safety in it.
The above applies to Story Point estimating as well. It is because the team is measured using the burn-down chart. If there is a bonus for burning all the points committed for a give iteration loss aversion will affect the estimate.

Three mechanisms that waste the safety inserted into the time estimates

1. The student syndrome

First fight for safety time. When you get it you have enough time in the estimates, so don’t rush and start at the last minute. That’s human nature.

2. Multi-tasking

Multi-tasking is probably the biggest killer of lead time. Call it meetings, call it emergencies, call it other jobs and called it browsing the Internet. Whenever developers give a time estimate they know that the actual work time (touch time, engineering time) is just a fraction of the estimate and they intuitively factor in the impact of multi-tasking.

3. Dependencies between tasks

Dependencies cause delays to accumulate and advances to be wasted.
In the case of two consecutive tasks in the project a delay in the first step is passed in full to the next step.

An advance made in the first step is usually wasted because:

  • There is no reward for finishing early, but there may be a penalty because management will put pressure to cut the times estimated.
  • In the unlikely event that the early finish is reported the second step will not start immediately because the person who is supposed to do the next step knows there is sufficient time so no need to rush.

In the case of parallel tasks in the project the biggest delay is passed on to the next task. All other early finishes (if any reported) do not count at all.

What to do?

There is no point asking people how much safety they put in their estimates because people believe they give realistic estimations. The problem is in what they call realistic. We should ask people about the chance of finishing the task in the time they estimated. We can then do the translation. For instance 80% chance would mean some 200 percent safety.
Better still is to ask for the riskiest estimate (50%) and apply risk profiles.

No comments: