Robert Buckmaster Consulting

Home   Projects   ESP   Corpus Linguistics    Training    ILA    TextEdit    Publications   Conferences   Employ Me   Links

Kok

Kiek in de Kök, Tallinn

Download the CJS Key Word List as a .txt file

 

Corpus Linguistics

In 2004 I developed a one and a half million word corpus of texts related to the work of police officers and other criminal justice system professionals.

The Criminal Justice System Key Word List was developed from an analysis of this corpus to answer the question of which vocabulary to teach senior police officers and other criminal justice system professionals.

Vocabulary Size Research

Research into vocabulary size and coverage has established that the most common words in the language account for most text. The table below, cited in Nation (2001), shows that the most common two thousand words cover 81.3 % of text. Thereafter the learner is left with a huge task of learning thousands more words to achieve significant gains in coverage.

Different words
Percent of word tokens in average text
86,741
100 %
43,831
99.0
12,448
95.0
5,000
89.4
4,000
87.6
3,000
85.2
2,000
81.3
1,000
71.1
10
23.7

Research suggests that guessing word meaning from context is only effective when the person already knows about 95% of the running words of a text. To achieve this the learner would have to learn another 10 500 words. Clearly this is an almost impossible task for most learners.

The alternative is to choose a more restricted set of words from a particular genre of English. This will give high coverage within that specific genre. One attempt to do this is the Academic Word List.

The Academic Word List

The AWL was developed by Averil Coxhead at the School of Linguistics and Applied Language Studies at Victoria University of Wellington and is a set of words which are found in a wide variety of academic texts. The key factors for their selection were their coverage (as % of text) and their range. The AWL consists of 750 word families. Combined with the first 2000 words, the AWL provides over 90% coverage of a wide variety of academic texts.

The CJS Key Word List

The CJS list was created by a similar process as the AWL: an analysis of the corpus and the elimination of words found in the first 2000 words, proper nouns and words with very limited coverage. The result of this process is a list of 850 word families (2716 words in total) which provide 10 - 15% coverage of texts of interest to criminal justice professionals.

Word Families

By word families I mean words which can be derived from a word by the addition of suffixes and prefixes etc. In the CJS list there is this example:

COOPERATE
CO-OPERATE
COOPERATED
CO-OPERATED
COOPERATES
CO-OPERATES
COOPERATING
CO-OPERATING
COOPERATION
CO-OPERATION
COOPERATIVE
CO-OPERATIVE

As can be seen from the table there many variant spellings also included as part of the word family.

CJS List Coverage

The table below shows the coverage of the first 1000 and 2000 words and the AWL and CJS lists of texts from different genres and the same CJS text.

Levels
Conv.
Fiction
News
Acad
CJS Text
CJS Text
1st 1000
84.3%
82.3%
75.6%
73.5%
72.3%
72.3%
2nd 1000
6%
5.1%
4.7%
4.6%
7.6%
7.6%
Acad
1.9%
1.7%
3.9%
8.5%
9.3%
14.4%
(CJS List)
Other
7.8%
10.9%
15.7%mm
13.3%
10.0%
5.7%

Nation 2001 and Buckmaster 2004

The Paul Nation Program

Paul Nation of Victoria University has developed a text analysis program which analyses texts using the first and second 1000 words and the AWL. It is a very easy program to use and the AWL list can be replaced with the CJS Key Word List.

The program can be downloaded from here.

The CJS list can be downloaded (top left) as a .txt file ready to be used with the Paul Nation program. (Save as .txt file)

With this program you can analyse texts and see how much of the text is covered by the first and second thousand words and how much by the AWL and CJS list. The program also tells you which words do not appear on any list. This can help you with such things as estimating the level of difficulty of the text and its usefulness for your students.

Using the CJS Key Word List

The CJS list can be used with the Paul Nation program as mentioned above but can also be used to develop vocabulary learning exercises and tests which are focused on important words that your students really need to learn.

References and Links

AWL Site

Nation, P. (2001). Learning vocabulary in another language. New York: Cambridge University Press.

© Robert Buckmaster 2016               Contact Me