Title: A method of discovering influential variables
Speaker: Prof. Shaw-Hwa Lo
Time: April 26(Thursday), 10:00-11:00am
A trend in all scientific disciplines, based on advances in technology, is the increasing availability of high dimensional data in which are buried important information. A current urgent challenge to statisticians is to develop effective methods of finding the useful information from the vast amounts of messy and noisy data available, most of which are noninformative. We present a general computer intensive approach, based on a method proposed by Lo and Zheng for detecting which, of many potential explanatory variables, have an influence on a dependent variable Y . This approach is suited to detect influential variables, where causal effects depend on the confluence of values with other variables. It has the advantage of avoiding a difficult direct analysis involving possibly thousands of variables, by dealing with many randomly selected small subsets. We review the special case, using family-trio data and several disease models, followed by a demonstration of applying this method to IBD data. The outcomes of this practice suggested the potentials of this approach in drawing substantial joint information in dealing with high dimensional problems. The main objective is to discover the influential variables, rather than to measure their effects. Once they are detected, the problem of dealing with a much smaller group of influential variables should be vulnerable to appropriate analysis. In a sense, we are confining our attention to locating a few needles in a haystack.