#cwcon: Beyond Afterthought

This is the presentation I’m giving at #cwcon this year (part of i2, “caption, performance, code: composing accessibility across multiple interfaces”).


Given the limited time available in the composition classroom, simply teaching the basics of technologies can feel overwhelming even before we consider the rhetorical production and accessibility of digital texts. It’s perhaps unsurprising, then, that accessibility is typically included as an additionaltask if acknowledged at all.

When accessibility is addressed, it’s frequently framed within the context of disability and positioned as an accommodation to pre-existing technologies and practices. In this way, accessibility becomes a retrofit rather than a central practice. Jay Dolmage has defined a retrofit as the act of adding a component to an already-built space (20).[1] A classic example is adding a ramp to the back or side of a building, but retrofitting easily transfers beyond physical spaces. For example, if someone publishes a podcast and it’s pointed out to the creator that the podcast is inaccessible to deaf or hard of hearing users, the creator might produce a transcript. This is an example of a retrofit because the accessible component was added on. Stephanie Kerschbaum refers to this phenomenon as “multimodal inhospitality,” which “occurs when the design and production of multimodal texts and environments persistently ignore access except as a retrofit.”[2] We can also imagine this as an accommodation because the transcript now accommodates disabled audiences. Melanie Yergeau argues that retrofitting occurs when we’ve assumed normative bodies as our default audiences and designed our digital texts with those bodies in mind. The production of inaccessible digital texts reproduces ableist assumptions about who we include—and exclude—as our audiences.

Today I want to focus on how practices that are typically used to accommodate digital texts help us make rhetorical choices about how to represent content, voices, and sounds and think critically about how those choices affect real audiences. And though captioning is in the title of our panel, I’m going to focus on transcription, which I think tends to get positioned as the least sexy, most mundane accommodating practice.

There are a lot of benefits to incorporating audio into our digital texts. It allows composers to experiment with voice, sounds, music; to incorporate multiple voices and emphasize dialogue; and to convey certain moods and tones that written text can’t. As Heidi McKee notes in “Sound Matters,” although we’re immersed in sound and it’s a critical communicative mode, we tend to privilege visual design when teaching multimodal assignments (336).[3] If we exclude sound from these discussions, we also exclude critical discussions about the accessibility of sound. So while we should encourage students to think critically about visual design, we should also make sure that sound receives the same careful attention. Kerschbaum warns, “Many multimodal texts exclude disabled audiences because they are not commensurable across multiple modes, thus rendering the text inaccessible.” For example, a text that relies solely on audio—like a podcast—necessitates an accommodating component to be accessible. In most cases, this accommodation is a transcript. Transcripts have more potential than simply an afterthought, though.

I’m going to focus on podcasts, although most of what I’ll discuss can be applied to any texts with audio components. Computers and writing scholars have contributed greatly to scholarship about the potentials of using podcasts in the classroom,[4] and writing center scholars have written about their pedagogical potentials.[5] Although podcasts have been of interest to scholars for almost a decade, there’s a dearth of scholarship on transcription[6]. But if we ask students to produce projects that involve audio, like podcasts, we must also emphasize the accessibility of producing these texts, which may involve transcription.

Podcasts are accessible in the sense that they’re readily available for people to download or stream but are not necessarily accessible in terms of users easily or equally accessing that content. In “Accessible Podcasting,” Sean Zdenek notes that access in podcast discourse is commonly equated with availability.[7] A podcast we can easily access is not necessary accessible to disabled users who may benefit from transcripts, text descriptions, or captions.

There are a number of tools available that, even without much technical experience, users can immediately record and publish podcasts. With this quick, do-it-yourself appeal, taking the time to write transcripts may seem unappealing. However, as Jennifer Bowie notes, requiring students to submit transcripts with their podcasts acknowledges and includes non-hearing audiences or low-vision or blind users who use accessibility software.[8] And in terms of taking too much time, she argues, “Usually students write out transcripts before they start recording the more formal assignments anyway, so it is often little extra work for them to submit these along with podcasts.” At the very least, students often already have something written that they can use to form a richer transcript.

As someone who’s not deaf or hard of hearing, I find podcasts super inaccessible. This may be ironic because I work on This Rhetorical Life, but I don’t listen to podcasts because I don’t process auditory information well and struggle to focus on audio. That’s why I’m grateful at conferences when presenters offer full-text copies or even handouts for their presentations. It’s also why I actively live-tweet at conferences to focus my attention. Making audio accessible, then, is not only important for deaf and hard of hearing audiences but also for people who don’t process or focus on audio well, people in a time crunch for whom skimming a transcript is more useful, or for people with poor internet connections who may have an easier time downloading a PDF than streaming audio. When we imagine that disabled users are the only people who benefit from accessible practices, we conflate accessibility with accommodation.

And while we should pay attention to how transcripts benefit users, it’s also important to consider how transcription can benefit the composers.

When we position transcribing as part of composition—a process of making choices about what content is included and excluded—we shift its purpose from an accommodation to a rhetorical and creative act. Transcription allows us to critically examine the audio elements we value and those we tend to ignore. For example, fillers and silence are not typically elements that are included in a bare-bones transcript intended only to indicate verbal data. The same is true for silence. As Cheryl Glenn has argued, not all silences have rhetorical meaning, but we too often dismiss silence as passive rather than as a “tactical strategy” (xi).[9] She notes that even though the delivery of silence is always the same, “the function of specific acts, states, phenomena of silence—that is, the interpretation by and effect upon other people—varies according to the social-rhetorical context in which it occurs” (9). The indication of silence as [pause] may look the same across transcripts but may have very different discursive meaning. In the podcast I’m going to discuss next, Elaine Richardson’s pauses are filled with many types of meaning. The silence of thinking is much different than the long pauses when reflecting on the violence of black and brown bodies or her memories of being a sex worker. If we transcribe only to communicate dialogue, though, these non-verbal “fillers” are silenced.

“A One-Woman Show with Elaine Richardson”

So I want to focus now on an activity that I used in a composition pedagogy graduate class to highlight some of the many rhetorical choices involved with transcription.

We began by playing the first 1.5 minutes of the Elaine Richardson podcast. When I asked what people had observed as important, they referenced many of the things I expected: the spoken dialogue, particular words that were emphasized, and who was speaking. Students also reported that they had used italics, caps, and asterisks to try to indicate shifts in tone. Perhaps more surprising—or less standard in a transcript—were suggestions about how to indicate temporal shifts of voices layered on top of each other, which gave us an opportunity to discuss the limitations of trying to preserve content across different media.

When asked what was left out of their observations, one student remarked that she was unsure whether or not to represent the Black Vernacular English of the speakers. I chose this clip because it’s playful and performative; it includes long pauses, snaps, and hands hitting the table; and the speakers dip in and out of BVE as they engage with each other. This created space to discuss audience (How would different audiences read this representation of voice?) the politics of how we represent voices that are not our own (Is this accurate or ethical representation?), and rhetorical considerations about style.

After discussing these observations, we practiced some actual transcription. Although I think this is a stressful and generally inaccessible in-class activity—unless students can type, use headphones, or listen on their own—I wanted to illustrate the choices and challenges that factor into transcribing even just 30 seconds of audio.

Once I played the clip multiple times, we discussed the similarities and differences across people’s transcripts. I was surprised to hear that no one felt that they had accurately transcribed the audio. I knew that this would be a difficult clip, but I had assumed—maybe because they were graduate students, some of whom transcribe for This Rhetorical Life—that it would be easier for them. However, there were a number of challenges trying to represent particular sounds through text: snaps, claps, “filler” sounds, the emphasis of particular words, screams, time lapses, singing and background noise, tonal shifts, and even editing glitches.


This clip (0:35-01:10) is complex despite its brevity. This is an excerpt from the transcript for TRL, which isn’t perfect but illustrates one method for textually representing this audio:

SD: We had a chance to sit and chat about the George Zimmerman trial, the memoir itself, African American rhetoric, and even RuPaul.

SD: [Chatter] What was I going to say? I guess we can get, get st—

ER: [Chatter] We gon make it happen.

SD: [Chatter] Yeah, we gon make it happen. It’ll be all right.

ER: [Singing] Make it happen.

SD: [Chatter] So is it recording?

ER: [Singing] [claps] I know life can be so tough [snaps] and you feel like [snaps] givin up. [Directed to Seth] You remember that? [Singing] But you must be strong. [claps] Baby, [snaps] just hold on.

SD: Okay, here’s my first question. This is easy. From PHD to PhD—

ER: Oh! So we goin? We rollin?

SD: We been goin!

ER: [Screams] Oh noooo!

SD: We been goin.

ER: Oh no! This is the real beginning. [laughs]


This is an example of horizontal transcription, and we don’t have rules for how we indicate non-verbal data such as laughter or interruptions. Horizontal transcription is what we likely expect from a transcript: dialogue written like a playscript where each person speaks one at a time and, generally, only verbal content is recorded. This method can be limiting, though, because it “communicate[s] what was said, but not when or how or with what intent” (Gilewicz and Thonus 27).[10] Vertical transcription has rules for signifying temporal shifts, silences, and gestural and non-verbal data, which Gilewicz and Thonus argue is more precise for capturing the dynamic nature of a speech act:


((background music))

S: We had a chance to sit and chat to talk about the George Zimmerman trial, the memoir itself, African American rhetoric, and even RuPaul. ((to self)) what was I going to say? I guess we could get, get st…

E: We gon make it happen.

S: Yeah we gon make it happen, it’ll be aight.

E: ((sings)) Make it happen

S:                         [((to C)) so is it recording?

C:                         [((to S)) yeah

E:                                           > > I know life can be so tough ^ ^ and you feel like ^ ^ givin up. ((to S)) You remember that? ((sings)) But you must ^ ^ be strong, > > baby just ^ ^ hold on.

S: Okay here’s my first question. This is easy. From PHD >> to >> PhD

E: Oh! So we goin? We rollin?

S:                                [We been goin!

E: ((screams)) Oh no!

S:                         [We been goin! This is the, this is the…

E:                                                 [Oh no!

C:                                                  [((laughs))

E: This is the real beginning. ((laughs))

C:                                                [((laughs))


Having people transcribe even just 30 seconds of audio made clearer the number of decisions that can factor into transcription, and it raised questions about how we represent time, non-verbal utterances, and voice. For example, we discussed the different ways to represent rollin/rollin’/rolling—each subtly creating a different representation. Rollin’ creates the illusion that something is missing and, thus, isn’t the proper word, whereas rolling is the “standard” representation of the word but is not actually the word Elaine Richardson uses. Even this brief activity allowed us to talk about the different factors that can go into a task as seemingly basic as transcribing. This isn’t to say that this much effort must go into transcribing, nor should it be an activity that shames people for not considering these elements before. Rather, it’s an opportunity to highlight the functional, creative, and rhetorical choices that inform what is traditionally positioned as an accommodating process.

concluding thoughts

When we assign multimodal projects, we need to stress accessibility both in the assignment guidelines and in the evaluation criteria so that it’s clearly an integral part of the process. More than that, though, we need to emphasize accessibility along the way, offering more than a single “day of access” where we talk about the ethics of producing accessible texts. I try to model what I want from students: offering print handouts, posting class notes, captioning any videos that I use or create as examples. By folding accessible practices into our curricula, we emphasize the importance of accessibility as an ethical practice but also as a rhetorical practice. Accessibility can’t be ignored if we want to engage with responsible and ethical digital composing practices as scholars and teachers.



