The real reason so many software projects are late, over budget and cancelled.
If you want to build a ship, don’t drum up people to collect wood and don’t assign them tasks and work, but rather teach them to long for the endless immensity of the sea.
Most software development projects, in fact most IT development projects have unforeseen problems that cause them to run late, overrun budgets, collapse under their own weight or get cancelled. This is best practice, it is not a problem, it only looks like a problem.
A Thought Experiment
A bank has an annual budget of $10 million for internal software projects, all projects are the same size and complexity. Over the years it has collected all sorts of data about the projects it runs, it has found that projects have fallen into two distinct groups:
Group One
These are projects run by standard development groups and behave pretty normally. They normally succeed, but often take longer and cost more than originally expected and sometimes have unforeseen problems, during or after development. In general things work out, but very few projects go completely to plan, approximately 15% of projects are cancelled outright and never result in anything usable. Projects normally cost about $100,000 to complete, involve a number of developers, testers, a documenter and a project manager. They typically take 3-6 months
Group Two
These projects are critical and use the critical development process, they are no larger nor more complicated than group one projects, but cost of error or project failure is very high (e.g. death, bankruptcy etc). These projects are very thoroughly researched, designed and developed, they are verified using tools and manual review at each stage and all products are formally tested before being integrated into the whole product. Prototypes are created to illuminate unfamiliar areas. All issues are tracked back to their point of origin, analysed and the software is checked for errors with the same pathology or method of introduction, sometimes new processes are added to the methodology to prevent similar errors being introduced. The bank has never cancelled one of these projects after initial evaluation and so far, they have all been completed to time and budget. Projects normally cost $1 million, involve several inter-dependant development teams each with its own testers, project manager and documenter. There is an independent test team that continually tests the products of the development teams, a documentation team, a project board and an overall project manager. These projects typically take 12-24 months.
In summary group one projects cost $100,000 and take 3-6 months, but are not fully controlled, 15% failure. Group two projects of the SAME SIZE and COMPLEXITY cost 10 times more and take much longer, but always work as expected.
The Software development manager moves to a new position where he is advising blue chip multinationals about introducing in house software development, none of these blue chip companies use software that could kill anyone or bankrupt the company, which process should be suggest:
Group 1
or Group 2
Well…
For $10 million you can have
Group 1:
10,000,000 / 100,000) * 85% and some hassle because project don’t run to plan.
=> 85 successful projects.
Group 2:
(10,000,000 / 1,000,000) * 100% and some hassle because the development process is so much work.
=> 10 successful projects.
And even though group 1 projects overrun, they still take less time than group 2 projects, because most of the overrun is unforeseen work, not mistakes.
Conclusion
Group 1, every time.
In the real world
The European information technology observatory(http://www.eito.com/) estimates total IT spending for 2007 will exceed 2 Trillion US$, that a whole lot, about one Microsoft every 8 weeks. Clearly some of this is basic office equipment and other zero risk expenses, but using almost any percentage you care to name (I seem to remember 40% from somewhere) there is still an awful lot being spent on new developments and other higher risk projects.
This is important because the numbers are large enough that statistical effects are overwhelming.
Best practice is not to work out every detail in advance, not to test everything to absolute certainty, best practice is to work on the fine detail of a project on the project. In best practice, we find things out after the project has started, in best practice projects often run late, overrun budgets and occasionally get cancelled.
Best practice is not an excuse to do a bad job, good staff, concerted effort at all stages and sound practices are still needed. However, shit happens, live with it.
For some reasons why this might be see What is obvious to me now and The very brilliant, Facts and Fallacies of Software Engineering will be useful as well.
This is best practice, it is not a problem, it only looks like a problem.

Great essay. This principle applies to many things in life as well. If you try to control things to be perfect, it will come at a great cost. Trying to completely control your children, spouse, or students works, but you break their creative spirit. Trying to control society through fascism, tyrannies, whatever centralized state power works, but at the cost of deaths of millions and castration of that state’s economic potential.
When Ford wanted to save a few pennies on a plastic collar for their Pinto, which tended to explode in rear-end collisions, they made up a number for the value of a human life to justify those expected deaths so that the cost of the plastic collars would come out higher than the cost of the deaths. They wanted to justify skipping using those collars, and they did. You’ve done the same thing with your arbitrary numbers of $100,000 and $1 Million dollars and the number of projects involved. You set it up so you can’t lose, but, now, if you actually had some statistical data…that would be informative…
Barry,
Clearly the “thought experiment” uses highly constrained examples in order to illuminate its point. However the qualitative relationships described are very real. By far the simplest way to decide what is needed to complete a project is to actually complete the project.
But how do you then test it, when its a nuclear weapon or a life support machine, hence one of the reasons for the different approach.
If however it turns out you have introduced a lethal problem and you become aware of it, it must be fixed, this is not methodology it is morals, ethics, humanity and if you don’t care about other people its the law as well.
As we know the judge in this famous case saw it that way as well and ruled accordingly. Unfortunatly it is not normally so, large companies have expensive lawyers. The ford mondeo pulled hard left under extreme breaking for many years. Fords engine management system had a very serious (probably fatal) design decision that meant it was catastrophically sensitive to certain types of electrical interference resulting in runaway acceleration, and that is only one company.
Assuming you take the group one development approach, you have to test thoroughly using repeatable testing, should use formal reviews (which also give better results than testing) and provide customer support. The purpose of all of these is to find problems that you then fix. Ok if the footnote is light green instead of olive green on a mac, then thats probably ok, so it’s not quite so black and white, but what is.
At the moment I do some work for an insurance company. The “thought experiment” reflects what is going on. However, there are projects, especially of the legislation type, that cannot afford to be among the 15% that fail, nor overrun in time, nor may cause disruptions in linked processes.
Also what I tend to read between the lines is that Group 1 projects are not necessarily must-do projects. Given the fact that more and more organizations have less ‘change’ capacity than ‘change’ demand (like to do more than there is capacity to do all of it), one can wonder if these not-must-do projects aren’t clogging up the effective capacity.
I assume that the thought experiment includes the same behavior that is found all over: people work tend to work on more than one project. This behavior, surely if left unmanaged, results in significantly longer lead-times (easily up to 2-3 times longer) for tasks, projects and programmes.
I do agree with the suggestion to change something in the approach of Group 2. How to increase the return on investment of those projects, preferably by decreasing the time and money needed. Regarding Group 1, do they really impact the bottom-line at all to provide a good enough business case?