CHAPTER 6
EXPERIMENTAL RESULTS
In this chapter, the experimental results obtained from the controlled experiment are discussed. This experiment was carried out to evaluate the effectiveness of ADAT as a teaching tool when used by novice programmers. The following hypothesis concerning the effectiveness of the ADAT were tested:
a) The use of ADAT improves the performance of the debugging activity.
b) The use of ADAT improves the understanding of concurrency.
The subjects of this test consisted of forty undergraduate and graduate students of the School of Engineering and Applied Science (SEAS) at The George Washington University.
6.2. Analysis - Demographic Data
Table 6.1 shows the demographic data summary by level of education. The complete experimental data is shown in Appendix G.
Level |
Number of Subjects |
MeanGPA |
MeanAge |
Mean Programming experience (months) |
Mean No. of Programming Languages |
Number of Females |
Number of Males |
CS131 |
CS148 |
Undergraduate |
21 |
3.07 |
21.5 |
23.6 |
4.76 |
6 |
16 |
11 |
10 |
Graduate |
19 |
3.24 |
29.3 |
24.3 |
3.79 |
4 |
14 |
19 |
- |
Total |
40 |
3.15 |
25.2 |
24.5 |
4.19 |
10 |
30 |
30 |
10 |
Table 6.1 - Demographics Summary of all Subjects
Table 6.2 shows the demographic data summary by group. The values in the columns for GPA, Age, Months, and Number of Programming Languages, are mean values. While the GPA of the control group was higher than the GPA of the experimental group, this did not help control group subjects outperform the later group. None of these variables had any significance with the findings discussed below. However, a t-test grouping by sex of subjects showed that the females had better results with the use of ADAT than the males.
Group |
Subjects |
GPA |
Age |
Months |
Number of Programming Languages. |
Females |
Males |
CS131 |
CS148 |
Control |
20 |
3.20 |
25.1 |
21.7 |
4.1 |
7 |
13 |
18 |
2 |
Experimental |
20 |
3.10 |
25.4 |
26.2 |
4.5 |
3 |
17 |
12 |
8 |
Total |
40 |
3.15 |
25.2 |
23.9 |
4.3 |
10 |
30 |
30 |
10 |
Table 6.2 - Demographics Summary by Group
The means and standard deviations for the independent variables are shown in Table 6.3. The first two lines refer to the control group. The inner two lines refer to the experimental group. The last two lines refer to the means and standard deviations of all subjects. The column labeled Months refers to computer related work experience in terms of months. The column labeled Ada Experience refers to Ada programming experience in terms of months. The column labeled Number of Programming Languages refers to the number of programming languages worked with. The column labeled Self Rate Ada refers to how the subjects rated themselves in terms of experience with Ada. Three categories were listed: novice, intermediate, and expert, with values of 1, 2, and 3 respectively. The maximum value found was two.
The column labeled Self Rate Concurrency refers to how the subjects rated themselves in terms of experience with concurrency. Three categories were also listed: novice, intermediate, and expert, with values of 1, 2, and 3 respectively. The maximum value found was also two. No significant differences were found among the groups regarding this variable. A t-test showed (t=0, df=38, p>0.05).
GPA |
Age |
Months |
Ada Experience |
Number of Programming languages |
Self Rate Ada |
Self Rate Concurrency |
||||
C Mean |
3.20 |
25.05 |
21.65 |
2.80 |
4.10 |
1.30 |
1.10 |
|||
C SD |
0.46 |
6.17 |
21.09 |
2.35 |
1.74 |
0.47 |
0.31 |
|||
E Mean |
3.10 |
25.35 |
26.20 |
2.90 |
4.50 |
1.45 |
1.10 |
|||
E SD |
0.41 |
4.65 |
32.16 |
2.45 |
1.54 |
0.51 |
0.31 |
|||
T Mean |
3.15 |
25.2 |
23.93 |
2.85 |
4.30 |
1.38 |
1.10 |
|||
T SD |
0.43 |
5.39 |
26.95 |
2.37 |
1.64 |
0.49 |
0.30 |
Table 6.3 - Independent Variables - Statistics
Table 6.4 shows the percentage of the subjects who were familiar with Small-Ada and with concurrency. It also shows which program (Pgm1, Pgm2) was assigned first within Stage One.
Group |
Small Ada |
Concurrency |
Pgm 1 |
Pgm 2 |
Control |
10% |
10% |
45% |
55% |
Experimental |
10% |
20% |
60% |
40% |
Table 6.4 - Familiarity with Small-Ada and Concurrency
Within the control group, subjects S24 and S28 acknowledged familiarity with Small-Ada and subjects S11 and S28 acknowledged familiarity with concurrency. Within the experimental group, subjects S15, S16, S17 and S39 acknowledged familiarity with Small-Ada and subjects S16 and S36 acknowledged familiarity with concurrency. Table 6.5 shows the performance (time score) of these subjects. It shows that the alleged familiarity with Small-Ada and concurrency had no major impact on performance. Comparing the performance of these subjects with the others, as shown in Figures 6.6 and 6.7, many subjects from both groups had superior performance. This conclusion is confirmed by the results of a t-test that showed no statistical significance among familiarity with Small-Ada and concurrency.
Subject |
Time Score Pgm1 |
Time Score Pgm2 |
Time Score Pgm3 |
S11 |
60 |
60 |
60 |
S24 |
60 |
40 |
14 |
S28 |
60 |
60 |
60 |
S15 |
6 |
40 |
11 |
S16 |
6 |
40 |
60 |
S25 |
12 |
20 |
18 |
S39 |
6 |
11 |
40 |
Table 6.5 - Familiarity with Small-Ada and Concurrency and Performance
None of these variables were found to have impact on the results discussed below.
6.3. The Results with the Programs
This experiment was performed to confirm or reject the expectations on how novice concurrent programmers might respond to the use of ADAT. These expectations are formally stated by the following hypotheses:
a) The use of ADAT does improve the performance of the debugging activity
Null hypothesis: no difference exists between the two groups in the first stage.
b) The use of ADAT does provide an improvement in the understanding of concurrency
Null hypothesis: no difference exists between the two groups in the second stage.
The independent variable is the use of ADAT. The dependant variable is the time score, explained below, of each subject.
Each subject was allowed to work no more than 20 minutes per program. Each subject was given a score of 1 for fixing, 2 for almost fixing and 3 for failing to fix each program. In addition, the time that each subject took working with program was also recorded. The time score is the product of the time that each subject took working with each program times its score. Therefore, if a subject took 20 minutes to fix a program, then his or her time score would be 20, and if a subject took 20 minutes and could not fix a program, his or her time score would 60.
To test hypothesis, a, a two-sample t-test analysis was performed with the time score and also with just the score of Stage One. To test hypotheses, b, a two-sample t-test analysis was also performed, using the time score of Stage Two.
The experimental group showed an improved performance over the control group. Considering the time score, the t-test analysis showed a statistically significant difference among the groups (t=5.8, df=38, p < .01), which favored the experimental group. Figure 6.1 shows the individual results chart with Pgm1. The right axis shows the scores. Smaller scores indicate better performance. In the control group, two subjects fixed program Pgm1, two almost fixed, and 16 failed to perform the fix. In the experimental group, 16 subjects fixed program Pgm1 and four failed to perform the fix. The shortest time score for fixing Pgm1 was obtained by subject S01 (3 minutes).

Figure 6.1 - Individual Results with Pgm1
Figure 6.2 shows the average and total time scores chart obtained with Pgm1. The left vertical axis shows refers to the time score values and the right vertical axis refers to the total time score (the summation of all time scores within each group). The total time score can be seen as one measure of cost of debugging. In this case, the use of ADAT resulted in a savings of 716 units for Pgm1.
Figure 6.2 - Average and Total Time Scores of
Pgm1
Figure 6.3 shows the individual time scores and the time of the first change chart for Pgm1. Vertical bars represent the time of the first change. The right axis represents the time scores. This chart shows one of the many interesting observations that can be drawn from the experimental data obtained. The time of first change, concerning Pgm1, had no influence on the subjects' performance.
Figure 6.3 - Individual Time Scores and Time of
the First Change with Pgm1
The results obtained with Pgm2 confirmed the result obtained with Pgm1. A t-test analysis with the time score of Pgm2 showed a statistically significant difference among the groups (t = 3.4, df = 38, p < 0.01). Figure 6.4 shows the individual results chart with Pgm2. The right axis shows the scores. Considering the control group, four subjects fixed program Pgm2, four almost fixed, and 12 failed to perform the fix. In the experimental group, 11 subjects fixed Pgm2, eight almost fixed and one failed to perform the fix. The record time for fixing Pgm2 was obtained by subjects S07 and S40 (six minutes).
Figure 6.4 - Individual Results with Pgm2
Figure 6.5 shows the average and total time score obtained with Pgm2. The left vertical axis refers to the time score values and the right vertical axis refers to the total time score (the summation of all time scores within each group). The total time score can be seen, for instance, as a measure of the cost of debugging. In this case, the use of ADAT resulted in a savings of 409 units with Pgm2.
Figure 6.5 - Average and Total Time Scores of
Pgm2
6.3.3. The Results with Stage One
Considering the combined time score, the t-test analysis confirmed the results obtained with programs Pgm1 and Pgm2 (t = 5.2, df = 38, p < 0.0001). Figure 6.6 shows the individual results chart with Stage One. From the statistical point of view, the debugging activity carried out by novice concurrent programmers was improved by ADAT. This result negates the null hypothesis A.

Figure 6.6 - Individual Results with Stage One
An interesting finding came from the answers given by the subjects to the question "Which program was the easiest to correct?" (see Follow Up Questionnaire, Appendix E). The value of one was given for the answer Pgm1 and the value of two was given to the answer Pgm2.
A t-test analysis employing this variable showed a significant difference among groups (t = 3.1, df= 38, p < 0.01). The use of ADAT affected how the subjects perceived the complexity of a program. Figure 6.9, which shows the overall time scores with each program, demonstrates this finding.
6.3.4. The Results with Stage Two
The results obtained with Pgm3 indicated that the subjects who used ADAT to debug Pgm1 and Pgm2 were influenced positively when they were asked to extend a concurrent program. A t-test was performed with the time score of Stage Two. The control group obtained a mean of 34.95 (SD=19.86) and the experimental group obtained a mean of 28.45 (SD=20.51). However, no statistical significance was obtained with the test. Thus, hypothesis b could not be rejected by using the time score alone. Possible justifications for this result include: insufficient number of subjects and the limited time that the subjects had to consolidate the knowledge acquired in Stage One. Also, if more programs were used in Stage One, then perhaps the result obtained in Stage Two would have been different. The percentage of transfer found is 20.86%. Figure 6.7 shows the individual results chart with Pgm3. Considering the control group, nine subjects extended program Pgm3, four almost, and six failed to perform the extension. Considering the experimental group, 13 subjects extended Pgm3, two almost extended and five did not. The record to extend Pgm3 was obtained by subject S42 (six minutes).
Figure 6.7 - Individual results with Pgm3
Figure 6.8 shows the overall time scores chart for the performances discussed above.

Smaller time scores indicate better performances. The time score for stage 1 is the average between the time scores of Pgm1 and Pgm2.
Figure 6.8 - Overall Time Scores
Figure 6.9 shows overall time scores with each program.

The symbols H, M, and L refer to High, Medium, and Low time scores obtained with each program, respectively.
Figure 6.9 - Overall Time Scores with Each Program
As shown in figure 6.9, the subjects of the control group exhibited lower performance than the subjects of the experimental group on all programs. An interesting observation is the inversion with the level of time scores for each program among the two groups. For the subjects of the control group, the time score is higher with Pgm1 and lower with Pgm2. For the subjects of the experimental group, the time score is lower with Pgm1 and higher with Pgm2. For Pgm3, the same inversion is found between the two groups. As it was mentioned in Step 5 of the Procedure section, the order in which the programs Pgm1 and Pgm2 were given to the subjects was randomly assigned. Table 6.6 summarizes the discussion above. The use of ADAT provoked a symmetrical reversed effect on the performance.
GROUP |
Pgm1 |
Pgm2 |
Pgm3 |
Control |
H |
M |
L |
Experimental |
L |
M |
H |
Table 6.6 - Effectiveness of Groups
Perhaps one explanation for this result could be found by analyzing the different way that ADAT presented the recommendation for guiding the correction of each of the two first programs. With Pgm1, ADAT used a simple screen page to guide the correction process. With Pgm2, ADAT used several screen pages, which seems to indicate that a more detailed explanation made a positive effect on how the subjects reacted to ADAT. However, from the Follow-Up Questionnaire (Appendix E), some subjects suggested that the recommendations issued by ADAT were too long. Further study concerning this issue may provide more data. The analysis of such data may provide answers to the question of why ADAT affected the perception related to the difficulty to understand the programs used in the experiment.
Figure 6.10 - Time of First Change
Figure 6.10 shows the time of the first change chart. It shows the average time before the first change was initiated. The change was detected by a change in the source file being edited.
6.4. Subject Preferences and Suggestions
From the Follow Up Questionnaire (Appendix E), a set of responses were selected for representing the overall impressions obtained. These impressions show common strategies for locating errors. Many expressed the gains obtained in terms of less time to perform the required tasks as well as how ADAT made easier the job of locating and correcting non-syntatic errors. A few subjects provided suggestions for improving ADAT's user interface, such as reducing the number of words and also placing the recommendations in a text file. In response to item number 2, "Please describe your general strategy or plan of attack for debugging a program," the answers given by the subjects of the experimental group fell into three categories:
a) Run, identify error, correct, run;
"Run a program after writing it. On location of the errors, correct the program accordingly and run it again..."
b) Use hints given by error messages or the debugger; and
"First try out hints given by error statements or debuggers, then check for typos, then try to reason what else might be wrong."
c) Attempt to understand the program and follow its logic.
"The strategy is to understand what the program does and then to follow it."
Regarding the same item, the subjects of the control group stated:
"First understand the function of the program, then target possible places that cause the error. Develop small changes and examine their effect. Sometime small change may lead to an answer."
"Look for the termination time of each task and try to correct it."
As it can be seen, the subjects who used ADAT had a much clearer view of the debugging process. The subjects in the control group showed a lower degree of understanding and confidence regarding the debugging process.
For item number 3 ("Do you believe that having available debugging features would have helped you when debugging the first two programs? Please explain."), the subjects of the experimental group gave an overwellmingly positive responses. Typical of these responses were the comments that ADAT was "exceptional," "specific" and had "no confusing error messages." They also stated that without ADAT the programs: "would be difficult to debug," that it would "take much longer to trace the exact problem," and that it "would have taken 4 or 5 times the time to solve the problem." By using ADAT, a novice Ada programmer can perform as an expert Ada programmer. It is important to note that no subject mentioned any dificulty in using ADAT as opposed to conventional debugging approaches, in which a specification languages is also used and needs to be learned. Below is a representative sample of their answers:
"Yes. The debugging features of ADAT is exceptional. It gives specific changes required for the program which no other debugging program offers."
"Yes, This debugger lets you know more specifically where the program fails - no confusing error messages that could lead to more errors in debugging attempt. This debugger helps you learn from your mistakes."
"Yes, concurrent programs are difficult to debug with just print statements."
"Yes. When debugging the second program, it is helpful if we can know the locations of "SELECT ... END."
"Yes. Because the debugging features are similar to Pascal 6.0 which is convenient."
"Yes, having tentative fixes suggested by the debugger is very helpful. Especially for a novice Ada programmer."
"I am sure of that. Without the debugger, it would have taken much longer for me to trace the exact problem."
"Yes, of course. The more advanced debugger the better. That only leaves me the task of running the program, not having to know the source code really well."
"Yes it did helped me a lot. If not for it, it would have taken me 4 or 5 times the time to solve the problem."
Regarding the same item the subjects of the control group, not knowing the existance of ADAT, also gave positive responses, but these were regarding the use of SAPM. They mentioned particularly the ability to see the execution of statements within each task and the program output in a well-defined view. Some subjects seemed confused about what could have helped them. Below is a representative sample of their answers.
"Yes, because the 1st two programs are concurrent I need to know where its error is, and what segment of program and see how the other tasks are being effected."
"Yes! It is easy for me to compile the program and have the debugging features in the same screen & program codes, then I don't have to switch the screen (better than what I am using in school)."
"No, the output made it pretty clean what was going on."
"I don't know. I don't think so because debuggers can only help syntax errors, not logic errors."
Concerning item number 4 ("What changes would you make in available debugging aids to make them more helpful/effective?"), some of the subjects in the experimental group mentioned that direct instructions for the correction would be beneficial. Others mentioned the addition of more conventional debugging features such as breakpoints and data inspection.
"There is a lot of other material given apart from the needed instructions. Though the program is user-friendly, it becomes over friendly."
"The ability to set breakpoints before starting the execution would be most helpful. Often the buggy part of the program could execute before I could pause the program. Also testing the value of variables, examining the entry queues, and high-lighting the quadrant when the task operating in it is swapped out."
"The debugging aid is great as it is because it not only identifies the problem but also gives possible solutions. The output format of the text could be easier to read."
Regarding the same item, the subjects of the control group gave mixed responses. Here, again, one can conclude that the use of ADAT did not caused any subject feel the need of on line help as opposed to the need felt by the subjects in the control group.
"More explanatory."
"Maybe a tool to show when global variables were being changed."
Concerning item number 5 ("Any other comments you care to contribute about the experimental experience, which you think may be relevant will be appreciated and may be provided below.") while many were happy with ADAT some subjects of the experimental group mentioned: "the need for less information," and suggested that "the role of line numbers could be improved".
"The information provided during the debugging with ADAT is a lot. Perhaps, a little less "talk" may clarify the instructions needed to change the program."
"I had print out of the debugger, so when I was doing the correction in the program, I was not sure that the line# that the debugger mentioned it was after the correction or not."
"The Ada debugging process that told you where your mistake was and how to fix it was very helpful."
"The environment of Small Ada is very nice and easy to use. As mentioned before, the debugger was useful and very descriptive."
"This is the best debugger I have ever used."
"I am actually surprised that the debugger works so well and gives suggestions in plain English."
"I just want to add that by using the debugger, it was easy to trace the problem, less time of understanding and following the structure . . . ."
"I can see the potential of a tool like this right away. It is a great idea. With more user interface and features refinements it could be a great product. Does it have to be run with (or inside) Small Ada? It would be nice to have a general purpose tool for all Ada compilers. Also it is nice to be able to save all messages in a text file, instead of having to print screen."
"The windows with the different tasks is a great tool . . . ."
Regarding the same item the subjects of the control group gave responses were two main themes emerged. In one, the students felt confident with SAPM:
"The way the debugging program is organized is great, but it takes a little time to get familiar with."
"I hope that I can use this feature on some programs that I am familiar with, and understand that will help me see how the tool is effective."
"To see a practical code and running of a concurrent program was a great happiness to me . . . ."
"The editor was extremely easy to use: the help on the screens was very neat and concise."
"This is a well structured and menu driven program. I believe debugging could be improved by locating the error statements more visibly to the user."
Others felt the need of on-line help:
"It was really frustrating not having anything to help me with the debugging."
The Follow-Up Questionnaire (Appendix E) produced the overall impression that the subjects who used ADAT were very excited to have the opportunity to use such a tool. Their suggestions when asked about possible improvements were mostly concerned with the ADAT user interface. Appendix F shows some of the recommendations issued by ADAT for Pgm2. If the hard copy of the programs that were given to the subjects had a line number printed to the left of each line of the source code, then perhaps the experimental group would have fixed the programs much faster. The lack of line numbers caused several subjects to compile and run the programs several times. Many subjects from the control group felt the need for low-level debugging features such as data inspection and break points. Whether the availability of such debugging features might have helped the subjects of the control group or not is a subject for future research.