The two-sides rule in teaching
listening and pronunciation

by Richard Cauldwell

1. Pronunciation: the misrepresentation of speech

I have long believed in the rule that listening and pronunciation work are two sides of the same coin – the coin being ‘the spoken language’. This is the two-sides rule which has an important implication. Every time you do a pronunciation activity, you are teaching students something about the spoken language. Therefore, like it or not, you are teaching students something of how to listen. So if pronunciation activities consist primarily of the accurate articulation of the segments of isolated words, then students are learning – like it or not – ‘facts’ about the spoken language. These ‘facts’ are, of course, not facts at all – they are misrepresentations. Unfortunately, such misrepresentations are common: vocabulary lists, and dictionaries give information about the pronunciation of the citation forms of words, and promote a view of the spoken language as a sequence of citation forms – words bounded by pauses, stressed, with falling tones. Clearly, this is a misrepresentation of the spoken language as experienced by the listener. It is also (because of the two-sides rule) a misrepresentation of speech-production – in pursuit of segmental accuracy (a worthy aim in itself), students practise disfluent speech.

The listener’s experience of normal speech (however one might choose to characterise ‘normal’) is of a stream, words flow into each other in patterns (tone-units, tone-groups, breath-groups, pause-groups) in which some words retain ressemblance to the citation form and others are pulled out of shape.

2. Ying’s dilemma and the sound-shape of words.

Pronunciation work therefore, insofar as it focuses on citation forms promotes a view of speech which is an obstacle to effective listening. Students learn ideal sound-shapes for words (citation forms), but do not learn the wide range of sound-shapes that a word can have when ‘streamed’ in normal speech. Even within the speech of one speaker, words take on different sound-shapes according to their position in the variety of patterns known as tone-units. These sound-shapes vary according to speed, volume, whether the words are prominent or not, whether or not they are the location for tones. We don’t teach this variability in our work on speech. We should – because students are asking for it, as the words of Ying’s diary indicate:

‘I believe I need to learn what the word sounds like when it is used in the sentence. Because sometimes when a familiar word is used in a sentence, I couldn't catch it. Maybe it changes somewhere when it is used in a sentence’ (Goh 1997, p. 366)

For Ying, words that are ‘known’ become ‘unkown’ in streamed speech. This is ‘Ying’s dilemma’. Our typical response to Ying’s dilemma has been to throw our hands in the air and say that the only way around this problem is immersion or osmosis: immersion by living in countries where the language is spoken (Rost, 1990) or tested osmosis, unmediated by pedagogic intervention, in the form of extensive listening.

We throw our hands in the air because we believe that the variability is so great, so unpatterned, that it is impossible generalise, and therefore impossible to teach. I disagree. There are patterns in the stream of speech: and we can use these patterns as a mode of presentation of speech for both listening and pronunciation materials. If we do, then it is possible to teach pronunciation in such a way that it represents normal speech more accurately, and practises accuracy and fluency simultaneously. In doing so it thereby promotes a picture of speech which aids, rather than obstructs, the acquisition of listening skills.

3. Streaming Speech

Streaming Speech: Listening and Pronunciation for Advanced Learners of English is an electronic publication which aims to solve the problem of the misrepresentation of speech. It does this by using normal spontaneous speech as the model to be imitated and emulated; and it rejects the ‘immersion’ and ‘extensive listening’ solutions to the problems of teaching listening, by paying close attention to the fastest stretches of speech, where Ying’s dilemma is likely to be at its most acute. It makes use of the patterns of speech identified in the work of David Brazil (Brazil, 1994, 1997) – he calls them tone units, I call them speech units – to present samples of spontaneous speech for both the teaching of listening and pronunciation. A key feature of the publication is the use of web-based multi-media technology, which allows students to click on a line of transcript, and to hear it as it was originally spoken.

3.1 The speakers

Streaming Speech is a ten-chapter electronic publication which features recordings of eight speakers of English from the United Kingdom (seven speakers) and Ireland (one). All the recordings are of unscripted, spontaneous speech, and contain many features that would be edited out of the written language: pauses to give planning time, re-starts, self-corrections, repetitions. Such features provide a discourse syllabus for both listening and pronunciation which I shall return to in 3.3 below.

All of the speakers are associated in some way with the University of Birmingham in the UK: they are either lecturers (full-time or part-time), or senior figures in the administration. Their recordings are largely (but not wholly) monologic – and this makes them suitable for the identification of samples for modelling pronunciation. All but one of them consist of biographical talk: the exception is of one person giving a lecture on early English grammar. There is one accent from Dublin in the Republic of Ireland (Chapter 8) but the other accents are close to Standard Educated British English – again this makes the recordings suitable for use for modelling pronunciation. There are however some slight differences, with two speakers having certain characteristics of a London accent, and two others have characteristics of the British West Midlands.

3.2 Two layers and a filling

The first eight chapters have a common structure, which has its origin in the two-sides rule – but is best thought of as a structure akin to a cake with two layers, with a filling in between the two layers. The top layer consists of two sections devoted to listening, the bottom layer consists of two layers devoted to pronunciation. But first, a description of the filling.

3.3 The filling – ‘Discourse Features’

The ‘Discourse Features’ section is devoted to describing the patterns of normal speech. Over the eight chapters, students are introduced to a way of viewing speech, based on the work of David Brazil, which provides ‘a window on speech’. This window is a means of observing and capturing the variability of the stream of speech, and of taming the variability so that it can be learned from – it is a syllabus for listening and pronunciation derived from an analysis of normal speech.

It provides a carefully staged introduction to the patterns of speech, and the effects these patterns have on the sound-shapes of the words they contain. In other words, this section provides the tools which can lead to solving Ying’s dilemma. It does so by paying very careful attention to the relationship between the fast and the slow forms of words (see Figure 1) – and by providing a step by step introduction to the variability of normal speech: varying speed, varying rhythms, level and falling tones, rising tones, speech-units, stress-shift, and high and low pitch.

Figure 1 Close attention to fast and slow forms of words

3.4 The Listening layer

The ‘Discourse Features’ section is preceded by the listening layer, which consists of two sections: Listening comprehension, and Focus. The Listening comprehension section provides information about the context and topic matter of the recording, and sets questions for the students to answer on screen as they listen. These questions are normally targetted at the fastest meaning-bearing stretches of speech in the recording, or those stretches which contain patterning (rhythmic, intonational, interactional) which are useful focuses for understanding the spoken language. Having answered the questions, students are invited to recall and write down those parts of the recording that provided evidence for the answers they have chosen. They then move to the Focus section, where they can see a transcript of that part of the recording that contains evidence for the answers. They can click on any line of the transcripts, and hear them as they were originally spoken (See Figure 2).

Figure 2 Transcripts in the Focus section

Note: each line represents a speech unit; the numbers down the left hand side are reference numbers to the complete transcript; the numbers down the right hand side represent the speed of speech units in words per minute.

These transcripts (two or three depending on the number of comprehension questions set) are set out in ‘speech-units’ which closely ressemble the tone-units identified by David Brazil. It is the patterning of these speech units, rather than of sentences (Ying, mistakenly, talked of ‘somewhere in the sentence’) that most affects the sound-shapes of words.

The questions, and therefore the transcripts, have been targetted at those parts of the recording that are scrutinised in section 3, the Discourse Features section.

