
Bachelor thesis:Knowledge-Oriented Gene Expression Data Processing ( PDF )
Author:Sixta Tomáš
Supervisor:doc. Ing. Jiří Kléma Ph.D.
Abstract:Microarray technology is a very helpful tool for geneticists as it allows them to measure expression levels of thousands of genes simultaneously. However the resulted data are usually noised and their classification is hard, because they contain too many attributes (probes) and usually too low number of samples. Due to these reasons, it is convenient to reduce dimension of the data. It can be done by various ways, e.g. by clustering. This work describes fuzzy clustering algorithm currently implemented in web-accessible program DAVID of National Institute of Allergy and Infectious Diseases, which clusters gene lists by their associated annotation terms rather than the expression levels. Using this approach the clusters are then well biologically interpretable. Our target was to rewrite the algorithm to language R in order to surpass the restrictions of the web interface (a limited length of a gene list and non-adjustable annotation data). Using this code we have studied the influence of various annotation data on clustering results. We have clustered genes from Motol and ALL/AML data and created metagenes based on their gene expression matrices. Metagenes have been then analysed by two different classifiers.
Submited:Jun 2009
More info: