Wednesday, August 15, 2018

Week#5 Using Corpus Analysis Software to Analyse Specialised Texts

Hi, everyone!!!! Welcome to Pa'Kwan's Web Blog again. Today, I will talk about "Corpus". I am very exciting to share you about this program because it is a good tool to help us enhance our English language skills. I can say that this is my most favorite program. So, let' see what a corpus is.
A corpus is a collection of texts of written (or spoken) language presented in electronic form. It provides the evidence of how language is used in real situations, from which lexicographers can write accurate and meaningful dictionary entries. And sometimes we used in the plural form that called "corpora" There are many sources of copora. Here is the link example for copora;
    https://corpus.byu.edu/coca/     
    http://www.arts.chula.ac.th/~ling/TNCII/corp.php   (general corpus; Thai)

Moreover, We can design it by using A freeware corpus analysis toolkit for concordancing and text analysis which is called AntConc. for the Sources of specialized texts such as; printed materials, word document texts, CD-ROMs, texts on the web, and online databases, etc. In order to designing specialized corpus there are a few things to be consider including; size, text, medium, subject and text type, authorship, language, and publication date. Let's see item by item.

Firstly, there are no fixed rules for corpus size. It is depending on research purposes, availability of data and time. Large, general corpora may be less useful than the small one. However, there are limitations of too small corpora for instant not enough hits to make decent generalization, not covering enough concept, terms, or patterns under investigation.

Secondly, the text that use for corpus can be text extracts or full text. It is depending on the aim of corpus compilation. However, whole text offers more coverage because words or terms to be looked at may be randomly distributed throughout the text. On the other hand, specific sections maybe helpful if we are looking for words or phrases under particular content areas or want to create purposeful sub-corpora.    

Thirdly, number of text is depending on your research focus. For medium, It can be spoken or written texts or mixed. Subject and text type should mainly focus on the specialized text under investigation, although this is less clear-cut in multidisciplinary subjects.

Fourthly, authorship means that texts should written by experts in a field tend to present more reliable and authentic examples of specialized language. language mean that specialized texts can be stored and retrieved in the form of monolingual, comparable, or parallel corpora. The last one is publication date. It concerns with texts should come from recent publications unless queries are made in relation to particular periods of time.

I hope you enjoys reading my blog. See you soon...

P.s. If you have any questions, please leave them in the comment as below. I will answer all your question on every Friday. 

No comments:

Post a Comment