Open the following files:
1.Lab2_handout_52525.pdf
2.Lab2.html
3. Shiny app for K-mean&PCA on RNA sequencing.
Lab 2 has 3 parts.
Part 1 Simulation-Study explores the behavior of running K-means with or without extracting the top 3 PCs:
Explore the behavior of running K-means with or without extracting the top 3 PCs, using a simulated dataset.
We would like to see whether clustering can accurately recover the original
distribution of each of the points.
The goal is to compare the accuracy and the speed of two algorithms, for
different dimension
I compare K-means with or without extracting the top 3 PCs, using a simulated dataset.
Part 2 Comparing demographic and election data:
In this part, we will explore how socio-economical similarity between cities relates to similarity in voting
patterns Based on Hierarchical clustering and dendrogram trees.
Finally, we create and compare two dendrogram tree hierarchical trees for the elections data. and hierarchical tree for the demographic data.
We will use Baker's Gammaa similarity score for two trees.
Part 3 exploratory analysis of RNA seq data with Shiny apps:
Apply PCA and K-mean on the data set 'gtex' that contains Gene Expression estimates collected by the Genotype-Tissue Expression (gtex-
portal.org).
Each row sign a gene, and every column a tissue type (e.g. Heart, Exposed Skin, Unexposed Skin).
The value measures the median expression level of the gene across multiple samples of the same tissue..
Values are positive, with zero meaning there is no indication of the gene in the tissue.