Myth Buster: Do Engineers Trust Parametric Models Over Their Own Intuition?

of 15

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
15 pages
0 downs
Abstract This paper explores the abilities of engineers to estimate everyday tasks and their reliance on their own intuition when performing cost estimates. The approach to answering these questions is similar to that of the popular television show
  Myth Buster: Do Engineers TrustParametric Models Over Their Own Intuition? Ricardo ValerdiMITCambridge, MA Abstract This paper explores the abilities of engineers to estimate everyday tasks and their reliance ontheir own intuition when performing cost estimates. The approach to answering these questionsis similar to that of the popular television show MythBusters which aims to separate truth fromurban legend using controlled experiments. In MythBusters, methods for testing myths andurban legends are usually planned and executed in a manner to produce the most visuallydramatic results possible, which generally involves explosions, fires, or vehicle crashes. Whilethe question of parametric models versus intuition is not as exciting, we provide an interestingresult that demonstrates the difference between what is real and what is fiction in the world of cost estimation.Two heuristics, representativeness and anchoring, are explored in two experiments involvingpsychology students, engineering students, and engineering practitioners. The first experiment,designed to determine if there is a difference in estimating ability in everyday quantities,demonstrates that the three groups estimate with relatively equal accuracy. The results shed lighton the distribution of estimates and the process of subjective judgment. The second experiment,designed to explore abilities for estimating the cost of software-intensive systems givenincomplete information, shows that predictions by engineering students and practitioners arewithin 3-12% of each other. Results also show that engineers rely more on their intuition than onparametric models to make decisions.The value of this work is in helping better understand how software engineers make decisionsbased on limited information. Implications for the development of software cost estimationmodels are discussed in light of the findings from the two experiments. 1. Introduction The process of estimating the cost of software has been of interest to researchers for decades.Some have developed sophisticated algorithms calibrated with historical data to improve theestimation process (Bailey & Basili 1981; Boehm et al 2000; Putnam & Myers 2003). Othershave found ways to combine different estimation methods such as bottoms up and analogy toarrive at estimates with a high degree of confidence (Jorgensen et al 2003; Jorgensen 2004).While this research has helped shift the field of software cost estimation from an art to more of ascience, the process of estimation remains prone to human errors and biases. These can beespecially problematic when there is little information available about the people, technologies,development environment, and process used for developing software.  Even in the face of missing information, humans make assumptions that help them developsoftware cost estimates. While these assumptions are not always justified, they have a stronginfluence on the outcome and accuracy of software cost estimates. The fields of human decisionmaking and cognitive science help to further inform this issue.Tversky and Kahneman (1974) proposed that many human decisions are based on beliefsconcerning the likelihood of uncertain events. Occasionally, beliefs concerning uncertain eventsare expressed in numerical form as odds or subjective probabilities. Their work showed thatpeople rely on a limited number of heuristic principles which reduce the complex task of assessing probabilities and predicting values to simpler judgmental operations. Many heuristicsexist in software engineering (Endres & Rombach 2003); arguably the most popular one insoftware cost estimation is the cube root law (Cook & Leishman 2004) which contends that thesoftware development time in calendar months is roughly three times the cube root of theestimated effort in person-months provided by a model like COCOMO II. This paper does notfocus on technology-based heuristics, rather on decision making heuristics that rely heavily onsubjective assessments by software engineers.The subjective assessment of probabilities resembles the subjective assessment of physicalquantities such as distance or size. For example, the apparent distance of an object is determinedin part by its clarity. The more sharply the object is seen, the closer it appears to be. Similarly,in software engineering, the cost of developing software often depends on the intuitive judgments by the stakeholders involved relative to their point of view.It is proposed that two heuristics developed by Tversky and Kahneman (1974) have anapplication in software cost estimation. The first is representativeness which is based on theconcept that people are concerned with the degree to which  A is representative of   B . The symbol  A could represent a completed software project and  B could be a new project being estimated.The experiments described in this paper explore this heuristic in the context of predictions of every day values and software-intensive systems.A second heuristic proposed by Tversky and Kahneman is called anchoring which isconcerned with the ability for people to make an estimate by starting from an initial value that isadjusted to yield the final answer. The initial value, or starting point, may be suggested by theformulation of the problem, or it may be the result of a partial computation. In the case of thispaper, the initial value will be related to the progress of a software-intensive project as itapproaches completion. The second experiment described in this paper will explore theapplication of this heuristic in the context of software cost estimation. 1.1 Research Questions In light of the current theories of human cognition and decision making, the interest in thispaper is to explore how software engineers make decisions on the basis of limited information.The research questions of interest are:  How accurately can software engineers estimate future events given limited information? How much do engineers rely on their intuition to perform cost estimates? The exploration of these questions can help inform the field of software cost estimation onmany fronts. First, they provide empirical evidence to help better understand the way software  engineers make decisions based on limited information. Second, they shed light on the cognitivelimits of software engineers under controlled scenarios which allows for comparison to otherpopulations; technical vs. non-technical as well as student vs. practitioner. This helps determinewhether software engineers are necessarily better or worse at estimating certain phenomena.Third, they help determine to what degree software engineers rely on the representativeness and anchoring heuristics for purposes of decision making. 2. Method Following the lead of the popular British television show MythBusters, which aims to separatetruth from urban legend, two experiments were conducted to test the research questions. In thiscase, the urban legend is that engineers trust parametric models more than they trust their ownintuition. The two experiments were conducted to assess the ability of participants to estimatecommon quantities as well as the duration of development for a software-intensive system givenan elapsed period of time. The first experiment was inspired by previous work on optimalpredictions in everyday cognition (Griffiths & Tenenbaum 2006) but was extended to the area of cost estimation by applying the idea of cognitive estimation limits. The srcinal set of questionsremained the same so that data from previous studies could be compared to newly obtained data.Results were obtained for this experiment through the use of a survey instrument provided inAppendix A. The second experiment involved only engineering students and practitioners sinceit was intended to assess the ability of participants to estimate the duration, in person months, of the development of a software-intensive system and reliance on intuition over a parametricmodel. 2.1 Participants Participants were tested in three groups, with each group making predictions about differentphenomena. The first group, made up of 142 undergraduate students, participated in theexperiment as part of a psychology class and is referred to as  psychology students throughout thepaper. The second group, made up of 36 graduate-level engineering students, participated in theexperiment as part of a lecture in a project management class and is referred to as engineeringstudents throughout the paper. The third group, made up of 49 software and system costestimation professionals, participated in the experiment as part of a day-long workshop on costestimation and is referred to as  practitioners throughout the paper. The engineering students hadanywhere between 0-2 years of work experience in cost estimation whereas the practitionershave an average of 12 years of experience and were familiar with advanced cost estimationprinciples. 2.2 Description of Experiment #1 The first experiment was conducted by giving individual pieces of information to each of theparticipants in the study, and asking them to draw a general conclusion. For example, many of the participants were told the amount of money that a film had supposedly earned since itsrelease, and asked to estimate what its total “gross” would be, even though they were not told forhow long it had been playing. In other words, participants were asked to predict t  total given t   past  .  No additional information was given about the film such as the genre, country of srcin, actors,or production studio.In addition to the returns on films, the participants were asked about things as diverse as thenumber of lines in a poem (given how far into the poem a single line is), an individual’s life span(given his current age), the duration of a Pharaoh’s reign (given he had reigned for a certaintime), the run-time of a film (given an already elapsed time), the total length of the term thatwould be served by an American congressman (given how long he has already been in the Houseof Representatives), the time it takes to bake a cake (given how long it has already been in theoven), and the amount of time spent on hold in a telephone queuing system (given an alreadyelapsed time). All of these items have known values and well-established probabilitydistributions. The intent of the experiment was to determine whether the individual participantswere able to provide an estimate from the lone pieces of data and, as a group, derive the expecteddistribution of answers for each item. The eight questions are provided in Appendix A, Part I. 2.3 Description of Experiment #2 The second experiment was conducted in a similar fashion except it only involved theengineering students and practitioners because of the technical content. The focus was to capturethe estimation limits of participants given a limited amount of information and the reliance of intuition when performing cost estimates. The first part of the experiment contained questionsabout the expected duration of a software-intensive project given an elapsed period of time.Participants were given four system life cycle phases to use as their mental model: conceptualize,develop, operational test & evaluation, and transition to operation. Similar to experiment 1, noadditional information was given about the project such as application domain, developmentorganization, or historical performance. Participants were asked to predict the total effort neededfor a project, t  total , given a certain amount of effort had already been expended on one or morelife cycle phases, t   past  . In the first question, t   past  = 300 person months for the Conceptualizephase. In the second question, t   past  = 300 person months in the Conceptualize and Developphases. In the third question, t   past  = 300 person months for the Conceptualize, Develop, andOperational Test & Evaluation phases. The three questions are provided in Appendix A, Part II.The second part of the experiment asked participants to predict the total systems engineeringeffort for a software-intensive system, t  total , given the predicted effort from a cost model, t   predicted  ,and a historical data point, t  historical , from a similar system of equivalent scope and complexity. Arelatively new cost model, COSYSMO, was selected for this experiment to avoid anyunbalanced expertise from practitioners. Moreover, both the engineering students and thepractitioners received an initial tutorial on the use of COSYSMO and its definitions to ensurethat there were no misinterpretations of the questions. In the first question, t   predicted  = 100 personmonths and t  historical , = 110 person months. In the second question, t   predicted  = 1,000 personmonths and t  historical , = 1,100 person months. The two questions are provided in Appendix A,Part III.  3. Results People’s predictions about everyday events were on the whole extremely accurate. The resultsof the responses from the psychology students are provided in Figure 1.Figure 1. Relative Probabilities of t values for Psychology Students, n = 142(Griffiths & Tenenbaum 2006)The distributions for movie grosses and poems are approximately power-law which accuratelyindicates that the majority of movies gross very little money but there are a few which becomeblockbuster hits. For example, out of over 7,300 films worldwide from the period 1900-2006only three films grossed over $1B. Similarly, the majority of poems are very short but there area few which are very long.The distribution of life spans approximately follows a Gaussian distribution which accuratelyindicates that most males, at least in the U.S., the distribution is centered around the average lifeexpectancy of 75. Half of the male population dies before reaching the age of 75 and half of thepopulation dies after but at a much sharper rate. The movie runtime also follows a quasi-Gaussian distribution since most movies run at least 90 minutes and some of them longer. Thedistribution of length of terms for representatives is approximately Erlang which accuratelyindicates that most representatives serve a small amount of two-year terms. Very few of themget re-elected despite the fact that they are eligible to get re-elected an unlimited number of times. The cake distribution is complex and irregular but can be described as a bimodaldistribution that is Gaussian-like around the value of 45 and spikes at the value 60. This isconsistent with recipes that indicate that most cakes take either 45 minutes or 60 minutesdepending on the type of cake, ingredients, and altitude among other factors. The complete listof sources of data for the eight questions is provided in Appendix B.Of particular interest is the similarity in the distribution of the answers across the threepopulation types and the proximity in the mean values for t  total . The psychology students andengineering students were just as accurate in estimating t  total for the eight questions in the firstexperiment compared to the practitioners as shown in Table 1.
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks