Hi, everyone!!!! Welcome to Pa'Kwan's Web
Blog again. Today, I will talk about "Corpus". I am very exciting to
share you about this program because it is a good tool to help us enhance our
English language skills. I can say that this is my most favorite program. So,
let' see what a corpus is.
A corpus is a collection of texts of written (or
spoken) language presented in electronic form. It provides the evidence of how
language is used in real situations, from which lexicographers can write
accurate and meaningful dictionary entries. And sometimes we used in the plural
form that called "corpora" There are many sources of copora. Here is
the link example for copora;
http://www.arts.chula.ac.th/~ling/TNCII/corp.php (general corpus; Thai)
Moreover, We can design it by using A
freeware corpus analysis toolkit for concordancing and text analysis which is
called AntConc. for the Sources of specialized texts such as; printed
materials, word document texts, CD-ROMs, texts on the web, and online
databases, etc. In order to designing specialized corpus there are a few things
to be consider including; size, text, medium, subject and text type,
authorship, language, and publication date. Let's see item by item.
Firstly, there are no fixed rules for corpus
size. It is depending on research purposes, availability of data and time.
Large, general corpora may be less useful than the small one. However, there
are limitations of too small corpora for instant not enough hits to make decent
generalization, not covering enough concept, terms, or patterns under
investigation.
Secondly, the text that use for corpus can be
text extracts or full text. It is depending on the aim of corpus compilation.
However, whole text offers more coverage because words or terms to be looked at
may be randomly distributed throughout the text. On the other hand, specific
sections maybe helpful if we are looking for words or phrases under particular
content areas or want to create purposeful sub-corpora.
Thirdly, number of text is depending on your
research focus. For medium, It can be spoken or written texts or mixed. Subject
and text type should mainly focus on the specialized text under investigation,
although this is less clear-cut in multidisciplinary subjects.
Fourthly, authorship means that texts should
written by experts in a field tend to present more reliable and authentic
examples of specialized language. language mean that specialized texts can be
stored and retrieved in the form of monolingual, comparable, or parallel
corpora. The last one is publication date. It concerns with texts should come
from recent publications unless queries are made in relation to particular
periods of time.
I hope you enjoys reading my blog. See you
soon...
P.s. If you have any questions, please leave them in the comment as
below. I will answer all your question on every Friday.
A corpus is a collection of texts of written (or spoken) language presented in electronic form. It provides the evidence of how language is used in real situations, from which lexicographers can write accurate and meaningful dictionary entries. And sometimes we used in the plural form that called "corpora" There are many sources of copora. Here is the link example for copora;
No comments:
Post a Comment