This study compared student learning outcomes and student perceptions of / and satisfaction with the learning process between two sections of the same class course—an online section and a traditional face to face section. Using a quasi-experimental design, students were randomly assigned to the two course sections. Conclusions suggest that the face to face encounter motivates students to a higher degree and also provides students with another layer of information concerning the instructor that is absent in the online course.





Recent research in online education has focused upon whether web based courses provide students with the same degree of personalized learning and content mastery that students experience in face-to-face (f2f) classes. Few studies, however, utilized experimental design across several variables including student learning as well as satisfaction with the learning experience.

Progress and innovative use of technology in education has greatly improved the quality of web based delivered courses (Schott, et. al., 2003). To determine whether web based courses indeed provide students with a comparable if not more superior learning experience, researchers over the past 5 years have conducted a plethora of studies comparing aspects of the traditionally delivered instruction with online instruction (Rivera, McAlister, & Rice, 2003 not in ref list). Findings are mixed (in the Rivera study or in other studies?), but the general consensus is that students learn just as well using web based instruction, but are less satisfied with the learning experience. Miller, Rainer, & Corley (2003) noted that the more negative aspects of web based instruction include student procrastination, poor attendance, and a the students’ sense of isolation. Other studies noted that online courses are more effective with particular personality types (Daughbenbaugh, et. al., 2002 check spelling—this is not in ref list under this spelling). Few if any studies have utilized random assignment to determine whether the “average” student might fare just as well in an online course as in an f2f course. Rather than comparing two potentially unequal groups, this study utilized random assignment in order to compare equivalent groups thereby controlling for predispositions towards one type of learning style over another. Big leap in this paragraph--- need a transition between the bulk of the para and the last 3 sentences--  Explain the self-selection bias that is inherentthat will ease the lead into the last 3 sentences.

As a LR, this is too brief—there are a lot of issues and concerns (self-selection, why Universities encourage online learning, various definitions of online learning, instructor characteristics and  involvement, university support—financial and otherwiseare just a few of those issues)---  a fuller discussion of at least some of these issues would be helpful.

The course in this study, Early Childhood Education: Philosophy and Practice, is a beginning  entry level survey course required for early childhood majors who just entered their pre-professional program. (200 level?)  Currently there are more than 900 students enrolled in the Bachelor of Education in Early Childhood Education program which prepares students to teach children ages 3-8 with a variety of learning styles including those at-risk, typically developing, mild to moderately disabled and gifted. (reverse order of sentences 1 and 2) The f2f sections of the course are scheduled to meet twice weekly in seminar fashion, while the online sections...... Content covered in the all sections of the course ranges frominclude ECE (define ECE) history, theorists, curriculum, inclusive learning environments, designing and planning themes, webbing to strategies, evaluation and parent involvement. Central to the course is the development of reflective thinking and application to reflective practice.

To make both sections of the course “equivalent”, the instructor used duplicate syllabi, revealing  including duplicate assignment requirements. Students in the web based section were required to attend at least two “Live Chat” sessions per week which served to replace the discussion time in the f2f section. Students in both sections were given equal credit for attendance. All students in both sections were assigned to small groups for in-class or online assignments.





Often students who enroll in web based courses have a predisposition towards this means by which to learn. This issue threatens the validity of findings based upon comparisons between web based and f2f courses. The groups, by nature of learning preference and computer comfort levels, are not equivalent (fuller discussion of this in the LR would be very helpful) and therefore findings cannot be generalized beyond the restrictions of the studies. To address this weakness, this study used a quasi-experimental design that infused non-random selection of whom? with random assignment to the control (f2f) and experimental (web based) groups. Prior to registration, students were asked whether they would be amenable to allowing the department to assign them to either the f2f or the web based section of the course. While students volunteered to participate in the study, random assignment to the groups strengthened the internal validity of the study and enhanced group equivalency.  (Were there students in the f2f and online sections who were not part of the study? Folks who specifically chose those particular sections?)

To validate group equivalency, all students (in the study population? all students in all sections?) completed the VARK (visual, aural, read/write, kinesthetic)—a diagnostic instrument designed to determine learning preferences (Copyright Version 4.1, 2002, held by Neil D. Fleming, Christchurch, New Zealand and Charles C. Bonwell, Green Mountain Falls, Colorado 80819 U.S.A. citation). Using the VARK, students can beare classified with mild, strong or very strong preferences in any of the four identified learning styles. In addition, students can show multimodal tendencies (more than one style appears to be preferred). For the purposes of this study, students were classified in one of 5 categories—visual, aural, read/write, kinesthetic, and multimodal learners.

To control other confounding variables that might result from the delivery methods of two sections of the course, the same instructor taught both sections during the same semester. The instructor took care to compare the design and delivery of both sections of the course to ensure that topics covered, work required, testing, and the classroom experience were as closely matched as possible. The syllabi of both courses were also compared by a colleague to provide content validity.

        So this study is based on findings from 2 sections (our of how many sections?) of this course.  How many students were in each section?  Were all students in these 2 sections participants?  If not, how many non-participants were in each section? 

In order to provide an unbiased measure and comparison of student-teacher interaction between groups, a modified interaction analysis instrument (IA) based upon the work of Flanders (1970) was utilized. Flanders’ IA is a systematic method of coding spontaneous verbal communication that has been used in classroom observation studies to examine teacher interaction styles. The IA instrument consists of the following 10 categories:



Teacher talks                                          Accepts feelings

                                                                 Praises or encourages

                                                                 Accepts or used ideas of pupils

                                                                 Asks questions


                                                                 Gives directions



Student talks                                          Responds




Four categories were added to “student talks”: “validation of others’ ideas”, “praise or courtesy remarks”, “questions or asks for clarification”, and “silence due to ‘down time’”. This last category was designed to earmark extra time needed in a live chat online. Lengthy contributions in the chat room require both longer time for typing as well as for reading. In this case, “silence/confusion” is not an appropriate label for what is occurring. The “down time” category was used only for the web based course and was not a function of comparison between groups. It was verified by rereading logs of the live chats.   A better explanation of the IA is needed--- these categories define what the teacher (or student) are doing when talking?  From this very brief overview, it appears not to be interactional” but, instead, one-sided.  This may be another area to discuss in the LR--  a discussion of analysis tools and the + and = of what they measure (and why a 35 yr old tool is being used in this context)

IA scoring is measured by using anAn observer to listens to the classroom interaction and check off the type of interaction taking place from the list of categoriesidentifies the IA category represented by that interaction.  Categories are identified . The observer marks a category every 3 seconds. Frequencies of categories are then tabulated and preferences or trends can be seen by comparing categories within a sessionobserved. In this study, comparisons were made between f2f and web based discussions to determine whether the interaction experience between the groups varied.  (The “observer” sat in on the web-based discussion, viewing the chats in real time?)

Two 20-minute sessions were randomly selected and video-taped from all possible f2f classroom discussions. Two corresponding web based chat room discussions were also monitored for 20 minutes. The resulting frequencies were then compared using a chi-square test of homogeneity to observe differences between multiple variables with multiple categories.


The examination of student learning outcomes compared group means of student test grades and overall grades using an independent t-test. Test scores (as opposed to letter grades) were used with the assumption that they reflected interval spacing meaning?.  To measure student perceptions of student-teacher interactions as well as satisfaction with the course as a whole, identical end-of-semester evaluation were completed and an independent sample t-test to compare mean evaluation scores for the groups was calculated.





Sample info:  Of the total (100+) students who enrolled in all four sections of the ECE: Philosophy and Practices course, 42 agreed to participate in the study.  The f2f course  section  (control) had 24 students—3 males and 21 females—and the web based course section had 18 students—1 male and 17 females. The unequal class sizes resulted when some students either added or dropped the course at a late date after the assignment control process was halted. All of the students in the f2f course were considered traditional students in that they enrolled in college right out of high school. There were two non-traditional students (returning for licensure) enrolled in the web based course.

Statistical issues:

1.             Should the non-trad students be included  in this analysis? Need support one way or the other

2.             Question asked earlier---  were there any students who were in these 2 sections who were not part of the study?

3.             Analysis needed of the students who dropped the course after selection.  Need to confirm that there wasn’t differential “dropping.  In other words, did the same types of study participants drop both courses?  

Group equivalency: The VARK survey of learning preferences was completed by 18 students in the f2f group and 15 students in the web based group. Why not all students?  The distribution of learning preferences for each group was equally distributed across the learning styles. A chi square goodness of fit test was administered using the control group as expected frequencies and the experimental group as the observed frequencies. Results showed no statistically significant difference between group learning preferences (χ2 = 3.36; df = 4; α= 0.05). Therefore it was assumed that the groups were equivalent.


Interaction Analysis: Results of the chi square test of homogeneity revealed that a statistically significant difference did indeed exist between the nature of teacher/student interaction in the two groups (χ2 = 900.035; df=9; α= 0.05) confusing—was there 1 interactional score (as this implies) or several subscores?. An examination of the standardized residuals revealed the interaction categories contributing to the differences. Areas where the observed frequency was significantly higher (H) than expected for the web-based course included student responds, student supports others in class, student silence/confusion, and teacher accepts feelings. Lecturing was the only area lower than expected. Not clear how/why “lecturing” was included in the online course—how was “lecturing” defined there?   For the f2f course, student responds, student asks questions, student initiates and idea were all higher than expected and silence/confusion was lower. The instructor lectures was also higher than expected.  Because the categories were imperfectly defined/described earlier, it is difficult to understand this.

The instructor spent less time lecturing in the chat room than in the classroom. In a web based course, lecturing often takes the form of a web page and is not a typical use of the chat room. On the other hand, the f2f classroom does not allow for the clear-cut compartmentalization of lecture versus discussion. Because only two samples from each group were observed, it is possible that other f2f sessions may have shown less time spent lecturing. The general trend, however, is that lecturing did not dominate the web based course discussions.  It is also possible that is is not an appropriate measure to discuss at all and perhaps should not be included here---- unless there is video component to the online course that allows for more “traditional” lecturing.  Otherwise, it appears that what is being compared is apples and oranges.  There are enough other measures to consider.

The instructor also allowed for more and longer periods of silence in the chat room than in the classroom—most likely due to the expectant?  nature of chat room discussions. The instructor, without the aid of visual contact with the students, was unable to determine whether students were simply thinking and formulating questions and answers or whether they indeed had nothing to add. How long were the silences in f2f vs online?  It was observed that a period of silence was followed by several contributions from students popping on the screen almost simultaneously. In an f2f setting, students and the instructor can tell exactly the discussion floor is open. The chat room discussions smudge this demarcation into fluctuations of silence and activity.

A unique not a good use of “unique” difference between the two groups was illustrated in the first web based session where students showed support for one another to a higher degree than expected. Example? evidence? Chat room discussions may put students and the instructor on an even footing meaning? thereby encouraging students to not only support one another but to take on a more empowered role in the class discussion. Not clear what is meant by “support


Student Evaluations:  Students in both classes completed identical course evaluations before their final exam. The evaluation included items that explored student perceptions of both the instructor and the course. Instructor items focused upon perceived teacher effectiveness. Course items included those dealing with the general organization, the value of the course as it related to their major area of study, the textbooks, exams, and general assignment workload. All evaluations were anonymous.

Students in the f2f class rated the instructor and the course significantly higher than those students in the web based course (with p = 0??  p < .001?). Mean scores for the f2f and web based classes were 1.22 and 1.82 respectively on a 5 point scale where a “1” indicated the highest ranking (outstanding) and a “5” the lowest (poor). In both cases the instructor received very good scores; - yet the students in the f2f course felt the quality of the instructor and the course to be better. T tests were then conducted on individual questions to locate where the classes differed significantly. The alpha level was lowered to 0.01 to control for Type I error (awkwardly phrased but good choice—why  .01? was this calculated (based on # of measures to be compared) or simply chosen?) and the analysis revealed statistically significant differences on each of the 22 questions suggesting that students collect extra information concerning an instructor based upon direct observation. For example, in the web based course, students have limited access to instructor interaction with other students. A student in the web based class will ask about personal difficulties using private email. However, it is common for students to ask questions of this type before, during, and after an f2f class where other students may observe the exchange. It is logical, therefore, that an instructor might receive a lower rating on an item like offering assistance to students with problems connected to the course in a web based course where this interaction is less evident.  Which raises the question whether this is a valid comparative measure to be included here.

Overall, the web based students gave the instructor a high rating and the f2f students gave a stellar rating. In neither case did the students indicate a negative experience but rather a slightly less positive experience. Interesting comparisons indicated that the students in the f2f course expected an average grade of A- while those in the web based course expected a B-. As far as grading the instructor, f2f students assigned an average grade of A and the web based students assigned a grade of B+. 1.  Where did the final grade data come from?  2.  How did the instructor evaluate students?  Was there some area of evaluation that could not be adequately “proven” by online students?  (“Quieter” students who learn by listening and maintaining an internal dialogue may not be “seen” online while the same student may be “obvious” in a f2f class.   There have been many studies conducted showing the high correlation between student expected grade and student evaluation of the instructor (discuss in LR). To determine whether students in one section of the course actually did perform better than those in the other, exam grades and overall grades were compared.

Three indicators of student success were examined—midterm examination, final examination, and overall points earned for the semester (included other assignments). Of the three comparisons, only the mean score for overall grade differed at a significant level (p = 0.02). Students in the f2f course averaged an A- and those in the web based course averaged a B. Students seemed to predict their final grade with accuracy indicating that the grading process for both sections was clear-cut. The main difference between test scores and overall points earned for the semester were other assignments required throughout the semester. A closer look at student records revealed that students in the web based course did not earn lower grades on these assignments but merely failed to submit some of them suggesting that learning outcomes were similar but that the personal contact of an f2f course positively influenced and motivated students to turn in assignments.





General findings of this study showed that two equivalent groups, randomly assigned to either an f2f or web based course, do not have equal experiences in the area of student perceptions defined as?. Learning outcomes can be considered to be equal based upon test scores. Because the instructor was the same for both courses, it can be concluded that the course delivery (f2f versus online) may have some effect on the variables examined rather than instructor differences.  (How much experience did the instructor have teaching online sections?)   The interaction analysis showed that the instructor tended to lecture less in the web based course See comments earlier . Because only two pairs of discussion sessions were scrutinized, findings in other areas of interaction, and specially (?) student interaction, may not generalize. Student evaluations of the course and the instructor also differed. Students in the web based course tended to rate both the course and the instructor lower than students in the f2f course—although ratings for both groups were considered to be above average. Finally, student achievement differed only in the area of course assignments. Test scores showed no statistically significant difference indicating that student mastery levels were essentially the same; , yet students in the web based course were more likely to omit submitting one or more assignments (did these missed assignments cause their lower grades?). So, students in the web base course may be less motivated to complete assignments. because they are not counted towards the final grade?

Limitations of this study include a small sample size and a restricted population. How might that have impacted the results?  Future research might apply this model to other content areas and explore the specific differences in course delivery methods that account for student perceptions. Some of the differences between the f2f and web based courses in this study were due to the random assignment of students to the groups. Students who may not be familiar or comfortable with web based courses were in the experimental group. Isn’t that exactly one of the goals of the study?  This should not be discussed in a para on limitations.  Their perceptions and experiences, therefore, were more indicative of that of the “average” student as opposed to those students who generally enroll in web based courses.




1.         Article titles are not surrounded by quotation marks

2.         Review correct “retrieval statement” format

3.         All references must be cited, all citations must be referenced.


Cohen, First Initial? (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145-153. not cited


Daughenbaugh, R., Daughenbaugh, L., Surry, D., & Islam, M. (2002).  Personality type and online versus in-class course satisfaction.” Educause Quarterly, 25 (3), 71-72. not cited


Flanders, N.A. (1970). Analyzing Teaching Behavior. Reading, MA: Addison-Wesley. not cited


Miller, M.D., Rainer Jr., P.K., & Corley, J.K (2003). Predictors of engagement and participation in an on-line course.”  Online Journal of Distance Learning Administration, 6 (1). Retrieved from http://www.westga.edu/%7Edistance/ojdla/summer62/schott62.html, December 12, 2003.

Rivera, J.C., McAlister, M.K., & Rice, M.L. (2002). Comparison of Student Outcomes & Satisfaction Between Traditional & Web Based Course Offerings. Online Journal of Distance Learning Education Administration, 5 (3). Retrieved from http://www.westga.edu/%7Edistance/ojdla/fall53/fall53.html , December 3, 2003.  not cited

Rivera, McAlister, & Rice (2003) ??

Schott, M., Chernish, W., Dooley, K.E., & Lindar, J.R. (2003). Innovations in distance learning program development and delivery.” Online Journal of Distance Learning Administration, 5 (2). Retrieved from http://www.westga.edu/%7Edistance/ojdla/summer62/schott62.html, September 9, 2003.



Rating Table
Submission Number. . . . .
Submission title . . . . . . . . . . . . . . . . . . . . .

Quality Statements





A:  The manuscript deals with a significant problem.





B:  The manuscript is creative or deals with the
subject in a new or novel way.





C:  The author included the appropriate background
or literature review.





D:  The author's writing style is appropriate,
academic, and clear.





E:  The study is conceptually based and theoretically





F:  The analyses are sound and appropriate.





G:  The conclusions and/or policy implications flow   
from the study's findings.





H:  Readers of AEQ will find this article       
of interest.





COMMENTS:  See Editor’s Notes, above and below

REVIEWER'S NAME:    Do not sign your name. Instead, write your 3-letter identity.


This manuscript asks a very important question in higher education: how viable are online courses for the typical (non-self-selected) student?  As the author suggests, very few studies have truly addressed this question.  Thus, this work is important in that it is an initial attempt to address the concerns and the realities of online courses in a face-to-face environment.


The manuscript shows promise but there are a number of unresolved issues that need to be more clearly addressed before acceptance.  Thus, I would recommend that the work not be accepted now but that the author be asked to resolve those difficulties and resubmit in the very near future.


There are two major areas to address:

1.         Statistical:        

            a.         An analysis of “group leavers” would resolve some of these concerns. 
            b.         It is not clear if the 2 groups include members who are not participants in the study.  If the two groups include different numbers of “non-participants,” this could bias the results.  This needs to be addressed.

            c.         Because the author performed multiple analyses, (s)he lowered the significance from .05 to .01.  Not clear whether this was based on number of post hoc analyses being performed or whether this was an arbitrary choice.


2.         Lit Review

            a.         The LR was suggestive but not complete.  The author raised a number of concerns regarding online vs f2f teaching in the body of the text that should have been clarified in the LR.

            b.         Choice of analysis instrument was not clear—A specific discussion of methodological tools was warranted in the LR


3.         Methods/Results

            a.         Better description of the specific measures on the IA is needed.  Some seemed superfluous to this study: why were they included?

            b.         Richer discussion of the value of some the findings


All in all, this work discusses an important topic that needs a fuller airing in the academic community.  I encourage the author to rework this paper and resubmit.