The Academic Corpus

A written corpus of academic English was developed for the purpose of finding out which words occurred in a wide range of academic texts from a variety of subject areas. The Academic Corpus contained approximately 3,500,000 running words. It was divided into four faculty sections: Arts, Commerce, Law and Science. Each of these faculty sections contained approximately 875,000 running words. Each faculty section was divided into seven subject areas of approximately 125,000 running words.

Subject areas in the Faculty Sections of the Academic Corpus

  • Arts
    • Education
    • History
    • Linguistics
    • Philosophy
    • Politics
    • Psychology
    • Sociology
  • Commerce
    • Accounting
    • Economics
    • Finance
    • Industrial Relations
    • Management
    • Marketing
    • Public Policy
  • Law
    • Constitutional Law
    • Criminal Law
    • Family Law and Medico-Legal
    • International Law
    • Pure Commercial Law
    • Quasi-Commercial Law
    • Rights and Remedies
  • Biology
    • Chemistry
    • Computer Science
    • Geography
    • Geology
    • Mathematics
    • Physics

The Academic Corpus contained journal articles, book chapters, course workbooks, laboratory manuals, and course notes. The texts were selected according to whether they were of suitable length (over 2,000 running words long) and were representative of the academic genre in that they were written for an academic audience. Any text not meeting these selection criteria was not included in the Academic Corpus.

There were 414 texts in the Academic Corpus. Where possible, a balance was maintained between the number of texts in the four faculty sections.

Where possible, the texts were kept at their original length, although their bibliographies were removed. Whole texts provide greater opportunities for words to reoccur and longer texts allow for greater frequency of occurrence as well as variety of vocabulary. Where possible, a balance between the number of short texts (2,000 - 5,000 running words), medium length texts (5,000 - 10,000 running words) and long texts (over 10,000 running words) was kept between the four faculty areas. This was to ensure that the faculty sections were as similar in make up as possible.

The following page from an academic text shows how many of the words in such texts are from the Academic Word List.

The words from the Academic Word List are boldened in the text.

Dating New Zealand business cycles


Dating the turning points and duration of business cycles has long been associated with the construction of aggregate reference cycle indexes, and their associated leading, coincident and lagging indicators.

This was along lines originally developed by Burns and Mitchell (1946), and subsequently by colleagues at the National Bureau of Economic Research (NBER), e.g. Klein (1990). More recently, identifying the turning points and duration of business cycles has been an important aspect of two further areas of business cycle research: the evaluation of theoretical and associated empirical business cycle models, e.g. King and Plosser (1994), Simkins (1994); and the analysis of the time varying characteristics of business cycles, e.g. Diebold and Rudebusch (1992), Watson (1994).

The Burns and Mitchell technique of dating business cycles relied primarily on two sorts of information: the descriptive evidence from business publications and general business conditions indices, and the "specific cycles" found in many individual series and the tendency for turning points to sometimes cluster at certain dates.

Based on this information, a set of reference cycle dates were selected that specified the turning points in "aggregate economic activity". A key feature of the Burns and Mitchell approach was to focus on the amount of cyclical co-movement or coherence among a large number of economic variables.

This co-movement is the prime characteristic of their definition of the business cycle: "...a cycle consists of expansions occurring at about the same time in many economic activities, followed by similarly general recessions, contractions, and revivals which merge into the expansion phase of the next cycle; duration business cycles vary from more than one year to ten or twelve years..." (Burns and Mitchell, 1946, p 3).

The NBER approach is based on the view that there is no unique way of combining all these activities, and accordingly the business cycle cannot be fully depicted by a single measure, e.g. Burns (1969, p 13).

Burns and Mitchell, and subsequent NBER researchers, intended therefore, before the computer age, to provide a standard technique with a set of decision rules for deriving business cycle turning points based on these two sorts of information.

In practice, this involved the application of a standard format of filtering procedures to extract the turning points in each data series, and then combining this information in a judgemental way to determine a single turning point date.

Other procedures, notably reference cycle indexes and coincident indexes, subsequently emerged as supplementary procedures for combining a large number of data series including various measures of output, production inputs, price series, monetary aggregates, etc, into a single composite index which have also been used to identify turning points.

Source: Buckle, R., Kim, K. and Hall, V. B. (1994) 'Dating New Zealand business cycles' GSBGM Working Paper 6 Wellington: Victoria University of Wellington