Microarray Data Analysis

Document Type

Abstract

Publication Date

Spring 5-1-2019

Abstract

The human genome, or complete set of genes encoded in human DNA, contains approximately 21,000 genes. At any moment, each cell in the human body has a combination of these genes turned on or off depending on the cell’s structure and function. By determining which genes are turned on or off in particular cells or biological samples, scientists can make conclusions about those samples and the importance of particular genes. Microarrays are one experimental method used to measure gene expression levels (turned on or off). Microarrays are capable of measuring thousands of genes in a sample simultaneously. A collection of unique, microscopic DNA spots is attached to a solid glass chip. DNA from a sample of interest is labeled with fluorescent markers and then allowed to hybridize, or attach, to the DNA found on the chip. A laser measures the fluorescence levels at each DNA spot which can then be converted to numbers that represent the intensity of the fluorescent markers. The resulting data is a table with rows representing samples and thousands of columns representing each gene measured. Data mining techniques are often used on microarray datasets to conduct studies such as differential gene expression analysis, cluster analysis, and classification. Famously, these techniques were first used on a dataset of human tumor cells to differentiate between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) (Golub et al.). This project seeks to mimic the Golub study in using the leukemia dataset to classify tumors into AML and ALL categories

Comments

Completed as part of the Computer Science Senior Capstone Project.

This document is currently not available here.

Share

COinS