Module 9: Gene Expression Analysis
Module 9: Gene Expression Analysis
Script
Exercise 1:
Make the grouping (clustering) of the yeast gene expression data. The data are available in the text book website, in Matlab format. yeast2.txt is a text file, which can be read by python.
Group the data into 10, 16 and 20 partitions. Reporte the number of samples in each partition (cluster). Make a graph similar to the one in Figure 9.10 (p 157) from the text book for each partition.
Observations:
● The table containing the yeast data have missing values, represented by NA. Delete those lines.
Exercise 2:
Golub et al. (1999) created a database of different types of leukemias from oligonucleotide microarrays. The available file leukemia.txt contains this database. Make the grouping of that base using 2 and 5 partitions and analyze the results.
Observations:
● The file leukemia.txt has a label of the classes in the last column (to which each example of database belongs). To make the grouping, this columnmust be removed;
● Use the column that has the class label of each sample to compare with the partitions in which these examples were allocated. Was there any difference between the results obtained?