The Value of Sampling in Assessment

I just returned from the biennial ABA Associate Deans’ Conference, which is a fun and rewarding gathering of associate deans of academics, student affairs, research, administration, and other similar roles.  (Interestingly, more and more associate deans seem to have assessment in their titles.)

I spoke on a plenary panel about assessment, and I discussed the value of sampling in conducting programmatic assessment.  I wanted to elaborate on some of my thoughts on the subject.

Let’s say a school wants to assess the extent to which students are meeting the learning outcome of writing.  One way to do so would be to conduct what is called a “census” in which every student’s writing in a course or sequence is evaluated by an assessment committee.  In a small LL.M. or Juris Master’s program of 10 or 20 students, this might be feasible.  But in a school of, say, 900 J.D.’s, this is not workable.

A more feasible approach is to use a “sample” — a subset of the larger group.  So instead of reviewing 900 papers, perhaps the committee might look at 50 or 100.  If the sample is properly constructed, it is permissible to extrapolate the results and draw conclusions about the larger population.

Sometimes using a census is workable, even for a large group.  For example, if faculty who teach a subject all agree to embed 10 of the same multiple choice questions in their final exam, those results could be analyzed to see how the students performed on the material being tested.

Frequently, though, we are assessing something, like writing, that does not lend itself easily to embedded multiple choice questions or other easy-to-administer forms of assessment.  That’s where sampling comes in.  The key is to construct a representative sample of the larger population.  Here are some tips in doing so:

  • Consider, first, what you will be assessing.  Are you reviewing two-page papers?  Ten-page memos?  Thirty-page appellate briefs?  15-minute oral arguments in a moot court exercise?  Each of these will call for different time commitments on the part of your reviewers.  Next, take into account how many reviewers you will have.  The more reviewers, the more documents you’ll be able to assess.  Consider, also, that you’ll likely need multiple reviewers per thing that you’re assessing, and time should be allotted for the reviewers to “calibrate” expectations.  All of this will give you an idea of how much time it will take per reviewer per document or other thing that you’re looking at.
  • In general, the larger the sample size, the better.  Statistically, this has something to do with “margin or error” and “confidence interval.”  For more on picking a sample size, check out this very helpful article from Washington State University.  But, in general, a quick rule of thumb is a minimum of 10 students or 10% of the population, whichever is greater.
  • It is preferable for those doing the assessment not to be involved with picking the sample itself.  Here’s where having an assessment or data coordinator can be helpful.  Most times, a sample can be collected at random.  Online random number generators can be of help here.  There are suggestions for simplifying this process in the document I linked to above.
  • Once you have selected your sample size and identified those who will be in the sample, make sure you have a representative sample.  For example, if your population is composed of 60% women and 40% men, the sample should probably approximate this breakdown as well.  I like to look, too, at average LSAT and UGPA of the groups, as well as Law School GPA, to make sure we’ll be assessing a sample that is academic representative of the larger population.

In the assessment projects I have worked on, I have found sampling to be an effective way to make assessment easier for faculty who have a lot of competing demands on their time.  Some additional resources for sampling are:

Dean Vikram Amar on Constructing Exams

Dean Vikram Amar (Illinois) has an excellent post on Above the Law about exam writing.  He offers four thoughts based on his experience as a professor, associate dean, and dean.  First, Dean Amar talks about the benefits of interim assessments:

Regardless of how much weight I attach to midterm performance in the final course grade, and even if I use question types that are faster to grade than traditional issue spotting/analyzing questions — e.g., short answer, multiple-choice questions, modified true/false questions in which I tell students that particular statements are false but ask them to explain in a few sentences precisely how so — the feedback I get, and the feedback the students get, is invaluable.

Second, Dean Amar articulates an argument in favor of closed book exams: Continue reading