Noun+noun Collocations in Learner Writing...
Journal of English for Academic Purposes 20 (2015) 103e113
Contents lists available at ScienceDirect
Journal of English for Academic Purposes journal homepage: www.elsevier.com/locate/jeap
Nounenoun collocations in learner writing Jean Parkinson School of Linguistics and Applied Language Studies, Victoria University of Wellington, New Zealand
a r t i c l e i n f o
a b s t r a c t
Article history: Received 7 November 2014 Received in revised form 26 July 2015 Accepted 6 August 2015 Available online xxx
Studies of collocations to date have emphasised use and learning of nouneverb and adjectiveenoun collocations. This study uses three sub-corpora of the ICLE corpus to investigate use of nounenoun collocations by learners in their academic writing. The literature to date has focused on contexts where English is being learnt as a foreign language rather than as a second language. The study therefore compares the influence of ESL and EFL learning contexts on learner use of nounenoun collocations. Findings are that accuracy of nounenoun phrases is significantly greater in the writing of ESL learners. A second question considered is what influence the presence or absence of nounenoun phrases in the first language (L1) has on learner use of these phrases in English. For this purpose, production of nounenoun phrases in written English by L1 Mandarin writers (a language that permits nounenoun phrases) is compared to writing by L1 Spanish writers (a language that does not allow nounenoun phrases). Findings are that learners whose L1 permits nounenoun phrases produce significantly more of them in English than learners whose L1 does not. Problems that learners had in forming nounenoun phrases are discussed qualitatively, and implications for EAP teaching are suggested. © 2015 Elsevier Ltd. All rights reserved.
Keywords: Corpus analysis Collocations Nounenoun phrases Influence of L1 EFL compared to ESL
1. Introduction This study examines use of nounenoun phrases in academic writing by learners of English. Noun phrases are of interest to English for academic purposes teachers, because of the tendency in academic writing to increase conciseness by packing information into noun phrases. Condensing a clause into a nounenoun phrase is one way that such conciseness is achieved. Examples of noun phrases in which nouns pre-modify the head noun include air pollution, and electricity shortage. Previous findings concerning noun modifiers in learner writing (Parkinson & Musgrave, 2014) indicated differences between less and more proficient student writers. Certain noun modifiers, including nouns, prepositional phrases, and appositive noun phrases were used significantly more frequently by more proficient student writers. Less proficient writers relied more heavily on adjectives to modify nouns in their writing. The focus in the present study is to investigate the use of nounenoun phrases by non-native writers more closely, and to consider what possible factors besides proficiency may influence their use of nounenoun phrases. To do this, I use three sub-corpora of the International Corpus of Learner English (ICLE) (Granger, Dagneaux, Meunier, & Paquot, 2009) to compare three groups of similar proficiency. In doing so I seek to consider the relevance firstly of whether English is being learnt as a foreign or second language, and secondly the influence of the nature of the noun phrase in the learners' L1.
E-mail address:
[email protected]. http://dx.doi.org/10.1016/j.jeap.2015.08.003 1475-1585/© 2015 Elsevier Ltd. All rights reserved.
104
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
2. Literature review I begin this section by reviewing studies of collocation, before looking at restrictions on formation of nounenoun phrases in English. 2.1. Collocations Collocations have been considered from a phraseological perspective and from a frequency-based perspective. From a phraseological perspective, word combinations are considered along a continuum from fixed idioms through words that combine with a restricted set of other words (e.g. television set/programme/viewers) to words that appear to combine freely with a range of other words. Nesselhauf's (2003) phraseology-based study of verbenoun combinations in the written English of the German subcorpus of ICLE found that learners made considerably more mistakes with combinations without word for word correspondence in the German and English combinations. This is of relevance to my study in which one of the three data sets concerns a language (Mandarin) which allows nounenoun phrases. The L1 in such a case may influence use in the L2. Nesselhauf (2005) for example found that around 50% of inappropriate verbenoun collocations could be traced to the learners' L1, and Laufer and Waldman (2011), studying verbenoun collocations, also found L1 influence in about 50% of atypical verbenoun collocations. Laufer and Waldman (2011) found that learners used fewer different collocations than native speakers. Advanced learners used more collocations than less proficient learners but, in both more advanced and less proficient learners, around one third of collocations were atypical of NS usage. However, Thewissen (2013) reported that advanced learners produced a larger number of near-hits compared to intermediate learners. This supports a suggestion by Schmitt and Carter (2004, p. 5) that learning of these chunks is not ‘all or nothing’, but that they can initially be learnt incompletely. This notion of incomplete, gradual learning is also found a review by Boers and Lindstromberg (2012), which suggests that uptake of collocations as a result of meaning-focused input alone is incremental, requiring many encounters with the same phrase. From the opposite perspective, Wray (2008) suggests that formulaic chunks may initially be learnt whole and only analysed into their component parts later if necessary. Evidence of the learning of formulaic language holistically is suggested by Boers and Lindstromberg (2009) who found that L2 learners doing dictation exercises hear and write unfamiliar lexical phrases as single non-words. Although my study is not about learning collocations, I review these studies here because they shed light on certain erroneous collocations in my data sets. In contrast with a phraseological definition of collocation, a frequency-based definition identifies collocations as combinations in which two words are more likely to co-occur than would be expected based on the statistical frequency of each word. In a study of adjectiveenoun collocations in writing by both native and non-native speakers (NNS), Siyanova and Schmitt (2008) found in NNS writing a mixture of collocations that are frequent (>5 times) in the BNC, those infrequent in the BNC ( 3 is suggested by Hunston (2002) as a significant collocation threshold.
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
105
As I show below, the data indicate that the L2 writers in the study had a number of problems in producing nounenoun phrases. To assist in describing and in explaining these problems, I briefly consider an aspect of English nounenoun phrases: the case of plural pre-modifying nouns. 2.2. Plural nouns as pre-modifiers in nounenoun phrases in English One problem experienced by writers in this study is that of using a plural pre-modifying noun when a singular noun is appropriate. Most pre-modifying nouns in English are singular (Biber, Johansson, Leech, Conrad, & Finegan, 1999), making it difficult for learners to know when a plural pre-modifier is appropriate. In deciding this, it is possible that a writer's L1 may influence them. Languages differ as to how plural meaning is marked. Some languages mark all nouns for number using grammatical morphemes. In many such languages, including Spanish, noun modifiers agree with the head noun for number and gender (e.g. in Spanish el chico alto the tall boy; la chica alta the tall girl; las chicas altas the tall girls). Similarly, in Tswana, adjectives must show concord with nouns both in class and number (e.g. monna yo moleele ‘the man is tall’ or ‘the tall man’ but banna ba baleele ‘the men are tall’ or ‘the tall men’). When learners whose L1 shows number agreement between noun modifiers and headnoun produce phrases such as *bombs blasts, we might hypothesise that they have been influenced by their L1 to make the pre-modifier “agree” in number with the head noun. However languages such as Mandarin have no nominal endings for singular and plural nouns and plurality is not indicated with inflection, so for L1 speakers of these languages, explaining phrases like *bombs blasts as an attempt to make the noun modifier and headnoun agree in number is less convincing. Based on this difference, we might hypothesise that writers in whose L1 noun modifiers must agree with the nouns they modify (e.g. Tswana and Spanish) might try to make pre-modifying nouns ‘agree’ in number with the noun they modify. My research design allows investigation of this possibility. A corpus by L1 speakers of Mandarin (in which there is no number agreement between nouns and their modifiers), is compared to corpora of writing by L1 speakers of Spanish and another by L1 Tswana writers. In both of these, noun modifiers must agree in number with the headnoun. The comparison of corpora produced by writers from Spain (an EFL context) and L1 Tswana writers from South Africa (an ESL context) is designed to indicate the influence, if any, of an EFL compared to an ESL learning context. 3. Methods Argument essays from the ICLE corpus written by L1 speakers of Mandarin, Spanish and Tswana form the three subcorpora for this study. This corpus consists entirely of argumentative essays in response to essay titles suggested by the catholique de Louvain (Paquot, 2012a). The proficiency level of writing Centre for English Corpus Linguistics at the Universite in these corpora is reported as being higher intermediate to advanced level (Granger et al., 2009, p. 11). The Mandarin and Spanish writers acquired English as a foreign language, while the Tswana writers acquired it as a second language. It should be mentioned that the ESL context of the Tswana writers is one that is typical in contexts such as South Africa, India, Nigeria and other ex-colonies of Britain, in which the influence of the colonial language is still very substantial. The colonial language is used in education, commerce and administration. In such contexts English arguably plays a similar role to Latin in medieval Europe. Discussing different understandings of what the term ESL means, Nayar (1997) distinguishes the above ESL context from the ESL contexts which may be more familiar to some readers in which migrants to an English country (e.g. Australia) learn English. In South Africa, English, the L1 of only 10% of the population (Census in brief, 2012) is the language of government, business, and most significantly, of most schools. As van Rooy (2009) notes, although the Tswana writers of this subcorpus used English as a medium of instruction at school and university, they live in a part of South Africa where exposure to L1 speakers of English is minimal, and use of English outside of the classroom or official contexts may be limited. This is typical of Nayar (1997) first ESL context. To attain sub-corpora of similar sizes, the age of the writers of the Spanish sub-corpus was selected as 21 or less, while the age of the writers of the Tswana sub-corpus were selected as 22 or less. The first 400 words of one hundred essays were included in each sub-corpus, making each of the three sub-corpora 40 000 words in length. The purpose of ensuring that all texts were the same length is because part of what I wish to measure is lexical diversity, and this is sensitive to the length of the text; the longer the text the more likely the writer is to repeat a particular word or phrase. All three sub-corpora drew on writing on a limited range of topics. Table 1 shows a mean per-text count of nounenoun combinations in each sub-corpus.
Table 1 Mean nounenoun combinations per text in the sub-corpora.
Mean Mean Mean Mean
total nounenoun combinations per 400 word text unique nounenoun combinations per 400 word text unique appropriate combinations per 400 word text unique inappropriate combinations per 400 word text
Mandarin sub-corpus
Spanish sub-corpus
Tswana sub-corpus
t-Test M/Sp
t-Test M/T
t-Test Sp/T
6.43 5.10 4.06 1.05
2.36 2.13 1.57 0.56
4.52 3.64 3.03 0.61
p < 0.0001 P < 0.0001 P < 0.0001 p ¼ 0.0009
p ¼ 0.0013 P ¼ 0.0012 p ¼ 0.0089 p ¼ 0.0021
p < 0.0001 P < 0.0001 p < 0.0001 p ¼ 0.6364 ns
SD SD SD SD
4.52 3.52 3.03 1.22
SD SD SD SD
2.05 1.76 1.52 0.78
SD SD SD SD
3.72 2.71 2.45 0.71
106
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
The three sub-corpora were tagged using the CLAWS5 tagset using the automatic tagging service provided by the University Centre for Computer Corpus Research on Language at the University of Lancaster (CLAWS part-of-speech tagger for English, n.d.). Instances of nounenoun phrases were identified using WordSmith 5. Proper nouns such as Animal Farm (the novel) and Richards Bay Minerals were omitted from the count. Following Durrant and Schmitt (2009), each nounenoun combination in the three sub-corpora was compared to its use in large native-speaker corpora: the BNC and the COCA. I elected to use both corpora, as it is not clear that either British or American English has a predominant influence in the three contexts. A mutual information (MI) score of 3 (as calculated by the Brigham Young University Corpora webpage (Davies, n.d.) or more was used as a significant collocation threshold (Hunston, 2002), and all nounenoun combinations with an MI score less than three are referred to as nounenoun phrases. The nounenoun combinations were thus divided into five categories: 1. Frequent collocations. These were nounenoun phrases that have an MI score greater than 3 and are also frequent in English. For the purposes of this study ‘frequent in English’ was taken as being attested more than 5 times in the BNC or more than 25 times in the COCA. 2. Infrequent collocations. These were nounenoun phrases that have a MI score greater than 3, but are relatively less frequent (1e5 times in the BNC or 1e25 times in the COCA). 3. Nounenoun phrases found in the BNC or COCA, with a MI < 3. As a high MI score reflects a pair of words ‘for which the frequency of occurrence is a high proportion of the overall frequency of either of the pair’ (Collins wordbanks online, 2008), nounenoun pairs with MI < 3 may be pairs in which the individual words collocate with a range of other words. In this category I included all nounenoun phrases with MI < 3, no matter how frequent they were. As Siyanova and Schmitt (2008) note, however, merely because a lexical phrase is not attested in a large corpus, this does not mean that it is unacceptable. Transient formations may also be acceptable. Therefore, all nounenoun combinations not confirmed by the BNC or COCA, were examined in context by eleven raters, and rated as either appropriate or inappropriate. A further two categories are: 4. Nounenoun phrases not attested in the BNC or COCA, but which were judged acceptable by 8 out of 11 raters who were L1 English speakers. Examples of lexical phrases rated as acceptable but which were unattested in the BNC or COCA are cartoon language, tourism student, and hippy ideology. 5. Nounenoun phrases not attested in the BNC or COCA and which were judged unacceptable by four or more raters out of 11. The eleven raters (of whom the author was one) included speakers of New Zealand English (7), British English (2), Canadian English (1) and South African English (1). Raters were all either teachers of writing, or Linguistics students. Using Randolph's (2008) online kappa calculator, inter-rater reliability was calculated at kappa ¼ 0.694, just short of the 0.7 that Randolph (2008) views as adequate agreement. Because nounenoun combinations are used in Mandarin, translation from the L1 may influence use of nounenoun combinations in English. To quantify this influence, a Mandarin rater judged whether or not the inappropriate combinations used by the Mandarin writers are translations or at least influenced by existing Mandarin nounenoun combinations. Log-likelihood values (LL) were calculated (using Rayson's (n.d.) online calculator) in order to test for significance of differences between the three sub-corpora (see Tables 2e5). Log-likelihood was chosen over a chi-squared test following Rayson and Garside (2000, p. 2) who report that the chi-squared value ‘becomes unreliable when the expected frequency is less than 5 ’, which is the case with some of the comparisons I make. 4. Quantitative results This section presents a quantitative comparison between the three sub-corpora. Table 1 indicates variation between and within sub-corpora in the number of nounenoun combinations per 400 word text. Table 2 shows the variation in frequency of nounenoun combinations in the three sub-corpora. Table 3 considers the different (unique) nounenoun combinations in each of the three sub-corpora (lexical diversity); it categorises variation in frequency in the five categories of nounenoun phrase outlined in the previous section. Again considering the different nounenoun combinations (lexical diversity) in each of the three sub-corpora, Table 4 quantifies grammatically and lexically inappropriate nounenoun combinations in each subcorpus. Table 5 considers a particular type of grammatically inappropriate nounenoun phrase: those that show problems of inappropriate “agreement” between noun pre-modifier and head noun. Table 1 indicates mean number of total nounenoun combinations per 400 word text. The standard deviation (SD) indicates fairly wide differences between texts in each corpus. Some writers produced no nounenoun combinations, while a minority produced as many as 19. Table 1 also shows that the writing by L1 Mandarin writers was significantly the most lexically diverse with regard to nounenoun combinations; on average they used 5.1 unique nounenoun combinations per 400 word text compared to 3.6 by the L1 Tswana writers, who in turn used significantly more than the Spanish writers (2.1). This trend of Mandarin > Tswana > Spanish was in evidence also for use of unique appropriate nounenoun combinations. However,
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
107
Table 2 Nounenoun combinations in three sub-corpora.
Words in sub-corpus Total nounenoun combinations Lexical diversity: Total unique nounenoun combinations Unique appropriate combinations Unique inappropriate combinations
Mandarin corpus
Spanish corpus
Tswana corpus
Raw
Raw
Raw
40 000 633 397
40 000 233 178
40 000 445 289
298 99
126 52
232 57
LL M/Sp
LL M/T
LL Sp/T
191.96 p < 0.0001 85.55 p < 0.0001
32.95 p < 0.0001 17.07 p < 0.0001
67.41 p < 0.0001 26.64 p < 0.0001
71.83 p < 0.0001 14.83 p < 0.001
8.24 p < 0.01 11.45 p < 0.001
31.86 p < 0.0001 0.23 ns
Table 3 Appropriacy of unique nounenoun phrases in three corpora (lexical diversity). Mandarin corpus
Spanish corpus
Tswana corpus
Raw
%
Raw
%
Raw
%
75 3
178 126 1
71 1
289 232 20
80 7
0.31 ns 3.61 ns
1.31 ns 1.55 ns 6.29 P < 0.05 13.08 P < 0.001
17 7
31 12
17 7
38 36
13 12
0.01 ns 0.00 ns
1.74 ns 1.33 ns 5.73 P < 0.05 3.72 ns
48
82
46
138
48
0.14 ns
0.01 ns
0.07 ns
25
52
29
57
20
0.84 ns
2.03 ns
4.14 P < 0.05
Total unique nounenoun combinations 397 1. Unique appropriate combinations Of which: 298 1a . Combinations absent from BNC/COCA but 11 judged appropriate by L1 raters 1b . Combinations attested in BNC or Coca, MI < 3 68 1c . Infrequent collocations: 1e5 times in BNC or 27 1e25 times in COCA; MI > 3 1d . More frequent collocations: >5 times in BNC 192 or >25 times in COCA; MI > 3 2. Unique inappropriate nounenoun combinations 99
LL M/Sp LL M/T
LL Sp/T
Table 4 The nature of inappropriate nounenoun combinations in the three sub-corpora.
Total unique nounenoun combinations Unique inappropriate NeN combinations Of which: 1. Lexically inappropriate phrases 2. Grammatically problematic phrases: 2a Adj-N would be more appropriate 2b Possessive noune noun phrase would be more appropriate 2c N-PP would be more appropriate 2d Inappropriately singular or plural pre-modifier or head noun (see Table 5)
Mandarin
Spanish
Tswana
LL M/Sp
LL M/T
LL Sp/T
397 99
178 52
289 57
0.84 ns
2.03 ns
4.14 P < 0.05
38 61 32 15
23 29 10 5
27 30 10 10
1.25 ns 0.07 ns 1.05 0.35 ns
0.01 3.22 6.19 0.05
1.29 2.96 1.16 0.15
5 9
7 7
5 5
3.82 ns 1.15 ns
0.25 ns 0.24 ns
ns ns P < 0.05 ns
% Inter-rater agreement
ns ns ns ns
90%
2.00 ns 2.00 ns
81% 85%
89% 87%
although the mean inappropriate combinations per 400 word text by the Mandarin writers was significantly greater than both the other groups, the mean use of inappropriate combinations by the Spanish and Tswana writers was not significantly different from each other. In sum then, the Mandarin writers produced a significantly greater mean number of appropriate and inappropriate combinations than the other two groups. The Tswana writers in turn produced more appropriate combinations per text than the Spanish writer, but did not produce more inappropriate combinations than did the Spanish writers.
Table 5 Number agreement between inappropriately plural or singular pre-modifying nouns and head nouns.
Total unique nounenoun combinations in three 40 000 word corpora. Of which: Plural rather than singular pre-modifying noun used Singular rather than plural pre-modifying noun used Singular head noun is used instead of plural Plural head noun is used instead of singular Total inappropriately singular or plural pre-modifier or head noun Total inappropriately singular or plural phrases in which the nouns ‘agree’ in number
Mandarin corpus
Spanish corpus
Tswana corpus
397
178
289
LL M/Sp
2 2 4 1 9
5 0 2 0 7
4 0 1 0 5
4.83 1.48 0.02 0.74 1.15
7
5
4
0.61 ns
p < 0.05 ns ns ns ns
LL M/T
LL Sp/T
1.47 2.19 1.10 1.09 0.24
ns ns ns ns ns
1.12 ns 0.00 1.00 ns 0.00 2.00 ns
0.15 ns
1.12 ns
108
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
Table 2 distinguishes total number of nounenoun phrases in each corpus and also distinguishes the number of unique nounenoun phrases (lexical diversity). The second of these is a better reflection of writer usage in each sub-corpus, because the essays in the three corpora are on a limited selection of topics, and these cue multiple uses of the same nounenoun phrases. Considering diversity of nounenoun combinations, frequency of unique nounenoun phrases (compared to words in the sub-corpus) is significantly higher in the Mandarin sub-corpus than in either of the other two sub-corpora. In summary, relative frequency of nounenoun phrases is Mandarin > Tswana > Spanish. Table 2 also shows that L1 Mandarin writers produce both a greater frequency of appropriate and of inappropriate nounenoun phrases than the other two groups. In addition, appropriate nounenoun combinations were significantly more frequent in the Tswana corpus than in the Spanish corpus. In summary, for appropriate nounenoun combinations, Mandarin > Spanish; Tswana > Spanish. For inappropriate nounenoun combinations Mandarin > Spanish; Mandarin > Tswana. Table 3 considers the lexical diversity in each of the sub-corpora. As a proportion of the different nounenoun phrases in each corpus, 80% in the Tswana combinations were appropriate in some degree, the proportion being 75% for the Mandarin and 71% for the Spanish writers. These differences are not however significant. Table 3 also categorises these phrases according to their frequency in the BNC or COCA. Of the different nounenoun phrases they produce, all three groups produce similar frequencies of appropriate nounenoun combinations (category 1, Table 3). The L1 Tswana writers produce a significantly greater frequency than do either the Mandarin or Spanish writers of nounenoun phrases that are not in the BNC or COCA but which are judged appropriate by L1 raters (1a, Table 3). In addition, Tswana writers produce a significantly greater frequency of appropriate combinations which are infrequent nounenoun collocations (MI > 3, 1e5 times in the BNC or 1e25 times in the COCA) than the Mandarin writers (1c, Table 3). All three groups produce similar proportions (i.e., ±47%) of nounenoun collocations that are frequent in the BNC or COCA (1d). Tswana writers produce a significantly lower frequency of inappropriate nounenoun combinations (category 2) than the Spanish writers. In sum, the Tswana writers produced a significantly greater proportion than either of the other two groups of acceptable combinations absent from the BNC/COCA; they also produce more nounenoun collocations infrequent in the BNC/COCA than do the Mandarin writers. Tswana writers also produced a lower proportion of inappropriate combinations than the other two groups, but only in the case of the Spanish writers was this difference significant. The inappropriate nounenoun combinations fell into a range of types, as reflected in Table 4. Some were lexically inappropriate (Table 4, category 1), and in NS English they would be replaced either with existing collocations (e.g. crimes/criminal acts rather than the inappropriate crime doings in my data) or with another word (e.g. homicides rather than blood crimes). Table 4 shows that fewer than half of the inappropriate phrases in each sub-corpus were lexically inappropriate. In coding, the lexically inappropriate phrases were identified as ones where most raters reformulated the phrase lexically (e.g. home violence / domestic violence) rather than changing it grammatically. Other inappropriate nounenoun combinations were ones in which appropriate lexis had been selected, but they deviated grammatically from those in native English. Table 4 shows the different grammatical problems. These include use of a premodifying noun where most raters judged a pre-modifying adjective to be more appropriate (category 2a, Table 4) (e.g. democracy revolution), use of a pre-modifying noun where a possessive noun was more appropriate (2b, Table 4) (e.g. people needs), use of a pre-modifying noun where a post-modifying prepositional phrase was more appropriate (2c, Table 4) (e.g. imagination capacity), and inappropriately singular or plural pre-modifying noun or head noun (2d, Table 4) (e.g. traffic condition; tattoo equipments). Most inappropriate nounenoun combinations have more than one possible appropriate realisation in English. For example, in place of the inappropriate phrase, religion alienation, which was found in the data, in English, acceptable phrases would be both the adjectiveenoun combination, religious alienation, and the noun-prepositional combination, alienation from religion. To categorise them, I relied on the 11 raters to suggest what phrase they would use to replace nounenoun phrases they viewed as inappropriate. Inter-rater percentage agreement is also reflected in Table 4. Table 4 is based on the total unique nounenoun combinations and the frequency of different types of inappropriate nounenoun phrase within these. There is little variation in these frequencies between the 3 groups of writers when it comes to either lexical or grammatical inappropriacy, but overall the frequency of problematic combinations in the Tswana subcorpus is significantly lower than that in the Spanish corpus. In addition, the Mandarin writers had a significantly greater tendency than did the Tswana writers to use a nounenoun combination when native raters judged an adjectiveenoun phrase to be more appropriate. The issue of inappropriate use of a plural or singular pre-modifying noun (2d Table 3) has relevance to the question of whether the presence or absence in the L1 of agreement between head nouns and premodifiers influences use in English. If this is the case, we would expect Tswana and Spanish writers to produce more such combinations than the Mandarin writers, whose L1 does not do this. Table 5 considers such inappropriate nounenoun phrases that show problems of inappropriate “agreement” between noun pre-modifier and head noun. In general there were no significant differences between the three groups in the number of these phrases in which the nouns had erroneously been made to agree in number. However, as Table 5 shows, the L1 Mandarin writers were significantly less likely than the L1 Spanish writers to inappropriately make premodifying nouns plural. As Table 5 shows, though, this is the single piece of evidence that writers whose L1 does not allow nounenoun combinations were more likely to make the nouns inappropriately ‘agree’ in number. The number of instances of inappropriate number agreement were probably too low in this corpus to reach any conclusion on this issue. In the next section, I consider the implications of these results. I then provide a qualitative discussion of the nounenoun usage regarded as inappropriate in English by the L1 raters.
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
109
5. Discussion of quantitative results This section explores possible explanations for the differences between the three sub-corpora which are reflected in Tables 1e5 In general, Table 1 (nounenoun combinations per 400 word text) demonstrates that Mandarin writers produce more appropriate and more inappropriate unique nounenoun combinations per 400 word text than do the other two groups. Tswana writers produce more appropriate nounenoun combinations than Spanish writers. I explore each of these findings below. Comparing the two EFL groups, nouns as pre-modifiers are found in Mandarin but are absent from Spanish. Because the Mandarin writers are used to using this resource in Mandarin, they may be more sensitive to its use in what they hear and read in English; in addition their greater use (output) in English of nounenoun combinations may assist them in learning the rules associated with English use. Existence of nounenoun phrases in Mandarin may not be the only factor in this difference, but it is likely to be a relevant one. Nounenoun phrases in the L1 may also be a factor in the significantly greater frequency of inappropriate nounenoun phrases the Mandarin group produced compared to the other two groups. Judgment by an L1 Mandarin rater indicated that 71% of inappropriate nounenoun combinations in the Mandarin sub-corpus showed influence of L1 nounenoun phrases; examples are window closet (rather than shop window), spirit stanchion (spiritual support) and meteor rain (meteor shower). This coincides with the finding of L1 influence on production of atypical collocations by Laufer and Waldman (2011). Comparing the EFL and ESL context of learning English, Tswana, like Spanish, does not permit nounenoun phrases. Tswana writers' greater production of appropriate nounenoun phrases compared to that of Spanish writers indicates that Tswana writers' ESL language learning context, a context in which English has been consistently used as the language of instruction in all school subjects and in which English is the dominant language of the media, business and officialdom, has given the Tswana writers greater exposure to English and thus predisposed them by comparison with Spanish writers to the use of nounenoun phrases. The production of a significantly lower overall frequency of inappropriate nounenoun phrases by the Tswana writers than by the Mandarin writers (see Tables 1 and 2) and a significantly lower proportion of nounenoun combinations that are inappropriate than the Spanish writers (see Table 3) is likely to be strongly influenced by the context of their learning of English. Table 2 (overall production of nounenoun combinations in each sub-corpus) supports the findings of Table 1 in that both appropriate and inappropriate nounenoun combinations are significantly more frequent in the Mandarin sub-corpus as a whole than in the other two sub-corpora. In addition, the appropriate nounenoun combinations are significantly more frequent in the Tswana sub-corpus as a whole than in the Spanish sub-corpus, but the Tswana and Spanish writers produce similar frequencies of inappropriate combinations. This again supports my above claim that presence of nounenoun phrases in the L1 prompts their use in the L2, and that the greater exposure to English consequent on learning in an ESL context similarly prompts their use in the L2. Thus the finding of variation across three sub-corpora in the frequency of nounenoun phrases produced by three groups of equal proficiency suggests that both L1 and context of learning predispose writers to use of nounenoun combinations. Of the nounenoun phrases they produce, the Tswana writers produce a greater frequency than Spanish or Mandarin writers of phrases that are not in the BNC or COCA but which are judged appropriate by L1 raters (category 1a, Table 3). The Tswana writers thus appear to be better able to invent or use flexibly their own appropriate nounenoun phrases than are the other two groups. A possible reason for this is that the Tswana writers, possibly because of their greater exposure to English than either of the other two groups, are better able to judge the appropriacy of their own invented nounenoun combinations. Tswana writers also produced a significantly greater proportion of nounenoun collocations that are infrequent in NS than did the Mandarin writers (see category 1c, Table 3). Again this may relate to the ESL context of their learning, suggesting that they have been exposed to these less frequent nounenoun collocations more often than have the other two groups. In sum, although both groups learnt English in an EFL context, it seems that Mandarin writers are more aware of nounenoun phrases as a possible resource in expressing meaning than are the Spanish writers. Although nounenoun combinations are absent from the L1 of both groups the ESL Tswana writers use nounenoun combinations more than Spanish (but less than Mandarin writers). The Tswana writers are also able to judge the appropriacy of their own apparently invented combinations somewhat better than the other two groups. Regarding nounenoun collocations (MI > 3) that are frequently produced by native speakers (i.e. >5 times in the BNC or >25 times in the COCA), Table 3 (category 1d) shows that the proportions produced by the Mandarin (48%), Spanish (46%) and Tswana writers (48%) are not significantly different. It is notable that these rates are similar to the findings of 45% by Siyanova and Schmitt (2008) for frequent adjectiveenoun collocations. However, although my means of identifying collocations (appearance in the BNC or COCA) was similar to theirs (appearance in BNC), my inclusion of the COCA as a reference corpus is likely to identify a greater proportion of nounenoun phrases as collocations than theirs. It therefore seems likely that this similarity may be coincidental. This again suggests the usefulness of further research comparing appropriate usage of collocations of different types by the same group of writers. 6. Results and discussion of qualitative analysis As shown in Table 4, of the inappropriate nounenoun combinations in my sub-corpora, some had grammatical problems while others were lexically problematic. I will consider the grammatical problems first. More than half of the inappropriate
110
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
nounenoun phrases in each sub-corpus are grammatically inappropriate. Table 4 shows the different grammatical problems. These include use of a pre-modifying noun where the raters judged that a pre-modifying adjective or a possessive noun would be more appropriate, use of a pre-modifying noun where a post-modifying prepositional phrase would be more appropriate and inappropriately singular or plural pre-modfying noun or head noun. In the first category a head noun was pre-modified by a noun, when a pre-modifying adjective phrase would be used to express this meaning in English (example 1). 1. We are subject to capitalism manipulation thanks to television (Sp) (capitalist manipulation) Another category was headnouns modified by a noun when a pre-modifying possessive noun was appropriate (example 2). 2. However it is a real life in many people view (Ma) (peoples' view) Another sub-type of inappropriate combinations of the grammatical sort are nounenoun combinations where the noun modifier was inappropriately plural (3), or inappropriately singular (4). These shed some light on whether the presence or absence of number agreement in noun phrases in the L1 erroneously leads learners to expect that there should be agreement in number between the nouns in a nounenoun phrase. 3. Moreover, most of the employees who have universities degrees agree that what they learn is seldom used. (Ma) (university degrees) 4. Many companies advertise on television in order to increase the sale volume (Ma) (sales volume) My hypothesis was that the Spanish and Tswana writers would be more inclined to make the noun modifier ‘agree’ with the head noun (the case for noun modifiers in Spanish and Tswana) than the writers whose L1 is Mandarin, where such agreement does not occur. As I note above in my discussion of Table 5, there was a significantly lower tendency for Mandarin compared to Spanish writers to inappropriately make pre-modifying nouns plural. As I discuss above this is the only evidence that writers whose L1 does not allow nounenoun combinations were more likely to make the nouns inappropriately ‘agree’ in number; the low incidence of this inappropriate usage in the three corpora and the moderate size of the corpora used make a conclusion on this issue tentative. Also included in Table 5 are frequencies of nounenoun phrases in which the noun modifier was inappropriately singular. These all concerned use of nouns as pre-modifiers which are invariably plural in English. For the Mandarin writers, these phrases included sport channel and sport match. Because plural noun pre-modifiers are rare, it makes sense for writers to use a singular pre-modifier as the default choice. Table 5 also included the small number of cases where the head noun was inappropriately plural (5) or singular (6). 5. This notion is too narrow and does no present many invaluable aspects of university lives (Ma) (university life) 6. Women were supposed to take care of the family, to take care of the ‘home fire’. (Sp) (home fires) Moving on to lexically inappropriate combinations, it is useful firstly to consider all lexically new formulations produced by writers of the 3 sub-corpora, whether appropriate or inappropriate. Such new formulations are ones that do not appear in the BNC or COCA. Some were considered appropriate by the raters (category 1a, Table 3). Other new formulations were considered lexically inappropriate by raters (last row of Table 4); raters made lexical (rather than grammatical) changes to one or both nouns. Comparing these two categories of lexically new formulations, more were rated unacceptable in English than acceptable. So inventing nounenoun collocations is risky for learners because they are more likely to be inappropriate than appropriate. In lexically inappropriate combinations, one of the nouns in an existing nounenoun collocation is replaced with a word that does not usually collocate with the other noun. In example 7, the pre-modifying adjective in an English collocation, domestic violence, is replaced with a synonym, home, producing a phrase that is not a collocation in English; in example 8 it is the head noun in rehabilitation programme/facility that is replaced, producing rehabilitation school. 7. Nearly 30 percent women are facing home violence or quarrels with family members (Ma) (domestic violence) 8. Another way of counselling may be to attend rehabilitation school to be taught about how they should behave. (Ts) (rehabilitation programmes) Other inappropriate phrases were formulations that either use highly infrequent words (9) or invent another phrase for an existing collocation (10). As discussed above, both of these examples, from the Mandarin sub-corpus, are likely to be the result of translation from Mandarin. 9. Royalty are still a vital part of Britain society … And most people's heart, it still the spirit stanchion of the Britain (Ma) (spiritual support)
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
111
10. Sometimes maybe you don't know what thing is when you find the new type of goods which appear on the shop's window closet. (Ma) (shop window) Some lexically problematic combinations are interesting in their ability to shed light on the process of learning of collocations. In these it appeared that the writer had gone some way towards learning an existing nounenoun chunk and that these writers partly know the collocation. Such instances exemplify Schmitt and Carter (2004) incompletely learnt collocations; with more exposure it is possible that writers will learn the collocation more fully. However it is also possible that such erroneously learnt collocations may be retained; Wray (2012, p. 248) gives the examples of (native speaker) confusion between streets/streaks ahead and off his own back/bat. Some of these imperfectly learnt collocations are ones where either the modifying or head noun are replaced by words that sound like the intended word. In example 12, numbers and members sound similar, as do diary and daily in 13, which also look similar. As discussed above, Boers and Lindstromberg (2009) argue that this is evidence for holistic learning of collocations. Although these few examples cannot be said to provide substantial evidence for this claim, they do suggest that further research with a bigger corpus might be valuable. 11. The murderer's family numbers can go to prison and look in them instead of crying for their death (Ma) (family members) 12. Television has become an important factor in our diary life (Sp) (daily life) 13. Numbers of educational programmes are small in contrast with a large number of advertisements and soup operas (Ch) (soap operas) Soup operas in 13, although probably a spelling mistake, is included as it shows one example of the collocation soap opera having been imperfectly learnt. Another instance of imperfect learning of the same collocation by another learner is reflected in example 14. It is possible that soda show (also ‘bubbly’/soapy) in 15 may also refer to soap operas. 14. When they watch the bubble operas they think that they are in need of relaxing (Ma) (soap operas) 15. We relax ourselves by watching the entertainment programmes on television … such as soda show, television series and movies. (Ma) (soap opera) In contrast with family numbers which suggests the possibility of holistic learning of collocations, bubble operas and soda show provide evidence of analytical learning. The writers of these remember that soap operas have something to do with soap/bubbles, but choose the nouns bubble and soda to represent a characteristic of soap (bubbles) instead of soap itself. Bubble operas has not been retrieved whole from memory by the writer but rather constructed analytically from the partially remembered characteristic of the phrase. Similarly, if elephant hide was intended by elephant peel in example 16, the writer has constructed this nounenoun phrase analytically by using a synonym (peel) for skin/hide (as in banana peel). 16. He changed his elephant peel for a hen without thinking if maybe the peel could have more value (Sp) (elephant hide?) The Tswana writers also showed evidence of phrases that are found in the L2 variety of English that is widely spoken in Southern Africa. I would speculate that these are stable collocations in this variety, but this would need further comparison with a large South African corpus. These combinations included cousin sister, veld fire, ID book, and Technikon campus. These combinations were all rejected as acceptable by the raters, but are known to the author as acceptable South African usage. 7. Conclusion This study has produced statistically significant quantitative findings about use of nounenoun phrases by three groups of non-native writers of English; the findings shed light on the influence of the presence of nounenoun phrases in the L1 and also the influence of the context in which English has been learnt. The study has supplemented these quantitative findings with a qualitative discussion of the different categories of nounenoun phrase that are inappropriate in English. Before examining the main findings from these two parts of this study, I note that this study is limited to a focus on use rather than learning of nounenoun collocations. Nevertheless this focus provides a number of insights, which I summarise below. Some of these insights are of interest to EAP teachers. The study sought to test the possibility firstly that the nature of the L1 influences production of nounenoun phrases in English, and secondly that context of learning (ESL or EFL) will have an influence on the learning of these phrases. In testing these, the frequency of production of nounenoun phrases per 400 word text, the proportion of phrases of different categories in each sub-corpus as a whole, their accuracy, and the extent to which nounenoun combinations produced are collocations, i.e. frequent in the language of native speakers, were considered. With regard to per text frequency, this study found significant differences in the frequencies produced by writers of different L1s. The L1 Mandarin writers, an EFL group whose L1 allows nounenoun combinations, produced the greatest frequency of nounenoun phrases, followed by the L1 Tswana writers, an ESL group whose L1 does not allow nounenoun combinations. In the writing by L1 Spanish writers, an EFL group whose L1 does not allow these phrases, their frequency was
112
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
the lowest. I have argued that the presence of nounenoun phrases in the L1 makes learners more inclined to use them in English. My findings suggest too that the greater exposure afforded by learning in an ESL context also increases use in the L2. Although the L1 Mandarin writers produced more nounenoun phrases than the L1 Tswana writers, who produced more than the Spanish writers, the proportion of appropriate phrases was not significantly different in the three groups; however the L1 Mandarin writers produced a significantly greater proportion of inappropriate nounenoun phrases than the other two groups. Direct translation from the L1 appear to have played some role here; an L1 Mandarin rater judged L1 influence in 71% of the inappropriate combinations. The proportion of nounenoun collocations (MI > 3) that are frequently used by native speakers (>5 times in BNC or >25 times in COCA) was about the same in the writing of three groups. L1 Tswana writers produced a significantly greater proportion of nounenoun collocations (i.e. MI > 3) which are used less frequently by native English speakers than did the Mandarin writers. Given their English-medium education, the Tswana learners are likely have been exposed more often to each of the nounenoun collocations, giving them an advantage in acquiring them. Notable too is the significantly greater ability of the Tswana writers to produce their own acceptable combinations absent from the BNC or COCA. Influence of the L1 was apparent in the greater production of nounenoun combinations by the L1 Mandarin writers. More evidence of influence of the L1 was the greater tendency of L1 writers of Spanish (which marks pre-modifiers to agree with the noun in number) compared to L1 writers of Mandarin (which does not) to inappropriately make pre-modifying nouns plural. I tentatively conclude that, by comparison with L1 Mandarin writers, their L1 has led to the Spanish writers overmarking the plural. This has relevance for EAP teachers. Another finding of relevance to EAP teachers whose students' L1 is Mandarin is their somewhat greater tendency (by comparison with the Tswana writers) to use a nounenoun phrase when an adjectiveenoun phrase would be more appropriate. This difficulty may stem from the fact that nouns are often used as adjectives (e.g. blueberry muffin). Also of relevance to EAP is that learners did not always signal explicitly the possessive relationship between the nouns in a nounenoun phrase. This may be because the meaning relationship between nouns in nounenoun phrases is usually implicit (Biber et al., 1999, p.590) and not always signalled using a possessive noun (e.g. gun power; motor vehicles), making this confusing for learners. For the EAP teacher, the fact that new nounenoun formulations by learners are more likely to be inappropriate than appropriate is difficult to respond to. Should learners be encouraged only to use nounenoun phrases that they know, rather than inventing their own? Or does using them, even if inappropriately, promote sensitivity to them and a greater level of noticing of these forms in future reading? The fact that Mandarin writers used a significantly greater number than Spanish writers, but with little increase in the proportion used accurately, suggests not. That the Tswana writers, who had most exposure to nounenoun phrases, used them significantly more accurately, suggests that repeated exposure is the best way to increase accuracy. This issue would benefit from further research. As discussed above, further research that compares use of different types of collocations (e.g. adjenoun compared to nounenoun) within the same population would also be of value. My comparison of the present findings with previous studies of adjectiveenoun and verbenoun collocations has not been conclusive, because of differences in the ways in which collocations are identified in different studies, and variation between studies in the L1 of writers. Acknowledgements My thanks to Jill Musgrave, Anna Siyanova, and two anonymous reviewers for their valuable comments on an earlier version of this article. References Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21, 101e114. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Edinburgh: Pearson Education Limited. Boers, F., & Lindstromberg, S. (2009). Optimizing a lexical approach to instructed second language acquisition. Basingstoke, UK: Palgrave Macmillan. Boers, F., & Lindstromberg, S. (2012). Experimental and intervention studies on formulaic sequences in a second language. Annual Review of Applied Linguistics, 32, 83e110. Census in brief. (2012). Statistics South Africa. Retrieved on 9 March 2014 from http://www.statssa.gov.za/Census2011/Products/Census_2011_Census_in_ brief.pdfCLAWS part-of-speech tagger for English n.d http://ucrel.lancs.ac.uk/claws/. Retrieved from. Collins Wordbanks Online. (2008). A guide to statistics: t-Score and mutual information. Retrieved on 3rd April 2014 from http://wordbanks.harpercollins.co. uk/Docs/Help/statistics.html. Davies, M. n.d. Corpus.byu.edu. Retrieved on 26 July 2015 from http://corpus.byu.edu/. Durrant, P., & Schmitt, N. (2009). what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics in Language Teaching, 47(2), 157e177. Durrant, P., & Schmitt, N. (2010). Adult learners' retention of collocations from exposure. Second Language Research, 26(2), 163e188. Granger, S., Dagneaux, E., Meunier, F., & Paquot, M. (2009). In International corpus of learner English (pp. 198e204). Presses Universitaires de Louvain. Hunston, S. (2002). Corpora in applied linguistics. Cambridge University Press. Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: a corpus analysis of learners' English. Language Learning, 61(2), 647e672. Nayar, P. B. (1997). ESL/EFL dichotomy today: language politics or pragmatics? TESOL Quarterly, 31(1), 9e37. Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24, 223e242. Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins Publishing. Paquot, M. (2012a). Corpus collection guidelines. Retrieved on 26th July 2015 from https://www.uclouvain.be/en-317607.html. Paquot, M., & Granger, S. (2012b). Formulaic language in learner corpora. Annual Review of Applied Linguistics, 32, 130e149.
J. Parkinson / Journal of English for Academic Purposes 20 (2015) 103e113
113
Parkinson, J., & Musgrave, J. (2014). Development of noun phrase complexity in the writing of English for Academic Purposes students. Journal of English for Academic Purposes, 14, 48e59. Randolph, J. J. (2008). Online kappa calculator. Retrieved September 20, 2014, from http://justusrandolph.net/kappa/. Rayson, P. (n.d.) Log-likelihood calculator. Retrieved May 3, 2014 from http://ucrel.lancs.ac.uk/llwizard.html. Rayson, P., & Garside, R. (2000, October). Comparing corpora using frequency profiling. In Proceedings of the workshop on comparing corpora (pp. 1e6). Association for Computational Linguistics. van Rooy, B. (2009). The status of English in South Africa. In S. Granger, E. Dagneaux, F. Meunier, & M. Paquot (Eds.), International corpus of learner English (pp. 198e204). Presses Universitaires de Louvain. Schmitt, N., & Carter, R. (2004). Formulaic sequences in action. Formulaic sequences: acquisition, processing and use. In N. Schmitt (Ed.), Formulaic sequences (pp. 1e22). Amsterdam: John Benjamins Publishing. Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: a multi-study perspective. Canadian Modern Language Review/La Revue canadienne des langues vivantes, 64(3), 429e458. Thewissen, J. (2013). Capturing L2 accuracy developmental patterns: insights from an error-tagged EFL learner corpus. The Modern Language Journal, 97(S1), 1e25. Wray, A. (1999). Formulaic language in learners and native speakers. Cambridge University Press. Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford University Press. Wray, A. (2012). What do we (think we) know about formulaic language? an evaluation of the current state of play. Annual Review of Applied Linguistics, 32, 231e254. Jean Parkinson teaches Applied Linguistics and TESOL at Victoria University of Wellington in New Zealand. Her research interests are academic writing, spoken and written genres in Science, and Corpus Linguistics. (
[email protected]).