“May inadvertently penalize students for positive behaviors”: The trouble with Perusall’s algorithmic grading

Note: This post is based on a Twitter thread that I posted back in August 2021. Since there has recently been a lot of further discussion about the drawbacks of how edtech products track and surveil students, including to give them computer generated grades, I though I would turn the thread into a blog post to keep it as a more permanent resource. The text of this post is edited and expanded from the original thread.

Most often when you hear about Perusall, you hear about it as a “social annotation tool.” According to a succinct definition from Vanier College in Quebec social annotation “involves commenting on a discourse in an online collaborative environment.” Many of us are familiar with this collaborative style from Google documents that allow multiple users to comment, but Perusall exists among several tools that are made specifically for social annotation and not word processing (including Hypothesis and Diigo). Perusall is also designed specifically for educational contexts, and provides instructors with an algorithmic grading tool. This means that the software rather than the teacher gives students a grade, using factors that the instructor may modulate.

I learned about the way Perusall assigns grades to students while attempting to help a faculty member set the tool up for class assignments. She, like many other instructors, was attempting to create an engaging, collaborative activity for her students to help them work through difficult texts. I can’t recall if she had planned to give students “participation points” or no grade at all, but we started looking at the instructor settings to see how she could view basic information like how many annotations each student had contributed. What we found was a surprisingly complex and not very forthright scoring algorithm provided the company. My first impression was that many of the scoring features might actually penalize students for neutral or positive habits, such as printing out a reading and coming back later to comment (this would reduce the total time spent viewing the document, one potential scoring metric). I found that other educators had noted the same problems. In a guide from Boston College’s Center for Teaching Excellence the author notes of the default grading settings that they “may inadvertently penalize students for positive behaviors.” It is troubling to say the least that we might use such a product (under the auspices of “time-saving”) that has such a capacity. The purpose of this post is to explain how algorithmic grading works in Perusall and to make a larger point about how edtech companies encourage us to withhold information about our expectations from students so that they will not learn how to “game” their product. This stance actively interrupts any efforts to create a classroom based on trust and transparency, let alone one where students can engage with materials in the way that works best for them. Let’s take a look at how scoring works in Perusall.

How does scoring work in Persuall?

When you open up the “settings” for an assignment in Perusall, you see six different metrics that can be modulated to make up a student’s grade. They are “annotation content” (which is scored by Persuall’s “quality algorithm”), “opening assignment,” “reading to the end,” “active engagement time,” “getting responses,” “upvoting” (whether a student’s comments are upvoted by their peers), and “quizzes” (meaning that the instructor can embed quiz questions within the social annotation assignment). In case you could not already tell, students’ activity is heavily tracked within this software. These options are indeed customizable (you can change the percentage weighting of each of them and bring it down to “0%” if desired). But Perusall itself has recommendations about how to use the grading feature. They suggest using the “holistic” setting, which awards 60% to “annotation quality” (again, using their undefined algorithm), and 10-20% to the other metrics. This adds up to more than 100%, so students have lots of ways to get full credit. While a nice touch, it does not account for the basic facts that student behavior on Perusall is being tracked quite intensively, Perusall has developed algorithms for “quality” that they want to offer to instructors in lieu of their own assessments of student work, and Perusall’s clear preference is for instructors to use this algorithmic system, because it is the default.

Dishonesty as a default

In addition to the default settings which track students’ reading time, number of page openings, etc. the company’s document on scoring explains their stance on transparency: “We suggest providing students with general guidelines about scoring, without going into specifics of the metrics you have selected.” Furthermore, the company explains that they “firmly believe that defining too precisely how students’ levels of engagement are assessed sends the wrong message to students and encourages them to try to ‘game’ the grading algorithm.” While Perusall tries to frame this problem as an impediment to student learning (they want students to be “intrinsically motivated”) it is also clear that they believe the instructor should maintain some control and privileged information about grading. It is possible to be transparent with students about how the grading works to a certain degree, but you have to manually un-check a box in the backend to show them. Why is making expectations clear to students viewed so suspiciously? It seems as though the relevance of Perusall as a tool (including the algorithms) reigns supreme here, rather than student learning (which has been shown to be supported by transparency).

Ethical use is more work (or at least not time-saving)

A general theme for algorithmic technologies in education is that there’s a tension between being transparent with students and making the product susceptible to “gaming.” That’s why these products are often not as useful as advertised when used in an ethical way, if you believe the ethical way to use them is to be transparent with students about how they work and avoid penalties for variations in use that do not conform to the expectations of the algorithm.

If you take the example of Perusall, an instructor might turn off “active reading” credit if they didn’t want to penalize students for printing out their readings. If they didn’t want to penalize students who are less popular, they might turn off the “upvote” points. One might very well evaluate all of the algorithmic grading options, determine that they are potentially discriminatory, and attempt to turn off the scoring feature all together! The likely result of this will be either assigning no grades for annotations or manual scoring by the instructor. This will take time to set up in the back end, and more time to grade if you want to assign grades. Funny, as Perusall markets itself explicitly as a time-saver for instructors (it is all over their site).

Whenever it is pointed out that some feature either has unintended negative consequences (like the holistic grading feature) or better yet, the “room scan” feature of some remote proctoring tools which was recently ruled a violation of the fourth amendment right to privacy, the response of edtech companies is often some version of “well, just turn that feature off if you don’t like it or it is a liability.” If we believe their story, they provide optionality, and do not push surveillance or privacy invasion. I think this is a very weak argument, because the companies seem to see these features as key selling points. For example with Perusall, the autograding feature is likely their main advantage over competitors, as I am not aware of a competitor with a similar feature. Their statement about keeping the settings secret from students also demonstrates that they believe this secretive tracking is actually a benefit and the ideal way to use the product. A takeaway from this analysis is that while there are certainly the “big bad” academic surveillance tools (remote proctoring and plagiarism checkers) surveillance is beginning to be normalized in other technologies that are superficially neutral or helpful, like social annotation. This is something to watch and protest.

Sarah E. Silverman