本文发表在 rolia.net 枫下论坛The Corpus of Contemporary American English* (not to be confused with the American National Corpus) is the first large, balanced corpus of contemporary American English. It is freely available online, and it is related to other large corpora that we have created.
The corpus contains more than 385 million words of text, including 20 million words each year from 1990-2008, and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. The corpus will also be updated every six to nine months from this point on, and will therefore serve as a unique record of linguistic changes in American English.
The interface allows you to search for exact words or phrases, wildcards, lemmas, part of speech, or any combinations of these. You can search for surrounding words (collocates) within a ten-word window (e.g. all nouns somewhere near chain, all adjectives near woman, or all verbs near key).
The corpus also allows you to easily limit searches by frequency and compare the frequency of words, phrases, and grammatical constructions, in at least two main ways:
*
By genre: comparisons between spoken, fiction, popular magazines, newspapers, and academic, or even between sub-genres (or domains), such as movie scripts, sports magazines, newspaper editorial, or scientific journals
*
Over time: compare different years from 1990 to the present time
You can also easily carry out semantically-based queries of the corpus. For example, you can contrast and compare the collocates of two related words (little/small, democrats/republicans, men/women), to determine the difference in meaning or use between these words. You can find the frequency and distribution of synonyms for nearly 60,000 words and also compare their frequency in different registers, and also use these word lists as part of other queries. Finally, you can easily create your own lists of semantically-related words, and then use them directly as part of the query.
Please feel free to take a five minute guided tour, which will show the major features of the corpus. A simple click for each query will automatically fill in the form for you, search through the 385 million words of text, and then display the results.更多精彩文章及讨论,请光临枫下论坛 rolia.net
The corpus contains more than 385 million words of text, including 20 million words each year from 1990-2008, and it is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. The corpus will also be updated every six to nine months from this point on, and will therefore serve as a unique record of linguistic changes in American English.
The interface allows you to search for exact words or phrases, wildcards, lemmas, part of speech, or any combinations of these. You can search for surrounding words (collocates) within a ten-word window (e.g. all nouns somewhere near chain, all adjectives near woman, or all verbs near key).
The corpus also allows you to easily limit searches by frequency and compare the frequency of words, phrases, and grammatical constructions, in at least two main ways:
*
By genre: comparisons between spoken, fiction, popular magazines, newspapers, and academic, or even between sub-genres (or domains), such as movie scripts, sports magazines, newspaper editorial, or scientific journals
*
Over time: compare different years from 1990 to the present time
You can also easily carry out semantically-based queries of the corpus. For example, you can contrast and compare the collocates of two related words (little/small, democrats/republicans, men/women), to determine the difference in meaning or use between these words. You can find the frequency and distribution of synonyms for nearly 60,000 words and also compare their frequency in different registers, and also use these word lists as part of other queries. Finally, you can easily create your own lists of semantically-related words, and then use them directly as part of the query.
Please feel free to take a five minute guided tour, which will show the major features of the corpus. A simple click for each query will automatically fill in the form for you, search through the 385 million words of text, and then display the results.更多精彩文章及讨论,请光临枫下论坛 rolia.net