American National Corpus
From Wikipedia, the free encyclopedia
American National Corpus (ANC) is a paid membership-based collaboratory with the aim of creating an electronic collection of American English. The collection will include text and transcripts of spoken data produced from 1990, with the goal of a 100 million word corpus.
ANC Consortium members include publishers, software companies, and academic members. Consortium members have exclusive access throughout the development period and for five years after the first installment of the corpus. The First Release of the American National Corpus (ANC) was made available in mid-fall, 2003. The data includes approximately 11 million words of American English, including written and spoken data and a variety of text types annotated for part of speech and lemma. The corpus is provided in XML format conformant to the XML Corpus Encoding Standard (XCES).