Vita
of Susmita Datta
(last updated
July, 2008)
Address:
Department of Bioinformatics
& Biostatistics
School of Public Health and
Information Sciences
University of Louisville
Louisville, KY 40292
(502) 852 0081 (phone)
(502) 852 3294 (fax)
E-mail: susmita.datta@louisville.edu
Education:
- Ph.D. Statistics, 1995, University of
Georgia, Athens, USA.
Dissertation Title: Dynamics of Cytonuclear
Disequilibria and Related Statistical Tests for The Neutrality of Mitochondrial
DNA markers for Hybrid
Zone Data (under the direction of Prof. Jonathan Arnold, Department
of Genetics, University of Georgia, Athens)
- M.S. Statistics, University of Georgia,
Athens, USA.
- B.S. Physics major, University of Calcutta,
India.
Positions Held:
- 2005- present, Associate Professor (tenured),
Department of Bioinformatics & Biostatistics, University of
Louisville, Louisville.
- 2002 - 2005, Associate Professor (tenured),
Department of Mathematics and Statistics and Department of Biology,
Georgia State University, Atlanta.
- 1997 - 2002, Assistant Professor, Department of
Mathematics and Statistics, Georgia State University, Atlanta.
- 1995 - 1997, NRSA Post Doctoral Fellow,
Department of Biostatistics, Emory University, Atlanta.
- Fall 2000-Summer 2001: Visiting Assistant
Professor, Department of Genetics, University of Georgia, Athens.
Research
Interests:
Bioinformatics, Proteomics, Infectious Disease
Modeling, Statistical Genetics, Statistical Issues in Population Biology,
Survival Analysis.
Professional/Editorial:
Honor:
- Elected member of International Statistical
Institute, 2007 -
Member:
- International Society for Computational Biology.
- American Statistical Association.
- Institute of Mathematical Statistics.
- International Biometric Society (ENAR).
- International Indian Statistical Association.
- American
Association for the Advancement of Science.
Editorial Services:
- Associate Editor, BMC Research Notes, 2008 -
- Associate Editor, Bioinformation,
2007-
- Editorial Board Member, Bioinformation,
2006-
- Special Issue Editor (Gene Expression Analysis), Bioinformation, 2007
- Associate Editor, Statistical
Methodology, 2007-
- Associate Editor, Statistics
& Probability Letters, 2007-
Reviewer:
- National Science Foundation, Biology Program,
May 2008.
- National Institute of Health, Bio-defense Study
Section, April 2003.
- (Invited) Emtech Bio
Scientific Advisory Board members and Seed Grant Reviewers: Georgia Tech,
Atlanta, October 2002.
- Member, Advisory Panel for MRI Program, National
Science Foundation, 2001-2002.
- National Institute of Health proposal review.
- Referee: Journal of Proteome Research, Journal of Applied Statistics, Computational
Statistics & Data Analysis, Journal of Multivariate Statistics, Scandinavian Journal of Statistics, Bioinformatics,
BMC Bioinformatics, Biometrics, Biotechnology, Genomics, Mathematical
Biosciences, Communications in Statistics, Pattern
Recognition, Proceedings of National Academy of
Sciences, Statistics in Medicine, Nucleic Acids
Research, International Journal of Data mining and
Bioinformatics, Journal
of Statistical Planning and Inference etc.
- Mathematical Review.
- Book review for Statistics and Medicine, 2000.
- National Science Foundation proposal review.
Other:
- CAMDA 2008
Conference Scientific Committee, Vienna, Austria, December 2008.
- Invited session organizer at JSM 2008, A New
Paradigm of Statistical Data Analysis: Omics
Data, Denver, August 2008.
- Program Committee member, Frontiers of
Probability and Statistical Science, Connecticut-Storrs, May 2008.
- CAMDA 2007
Conference Scientific Committee, Valencia, Spain, December 2007.
- Program Committee Member, ISMB 2007, Vienna,
Austria, July 2007.
- Chair, Invited session at JSM 2007, Inference
for Multistate Data under Complex Censoring
Structures , Salt Lake City, July - August, 2007.
- Invited session organizer, Statistics in Genomics and Proteomics, International Biometric
Society Conference IBC 2006, Montreal, Canada, July, 2006.
- Program Committee Member, ISMB 2005, Michigan,
July 2005.
- Invited session organizer Statistics in
Genomics, JSM Toronto, August 2004.
- Invited session organizer, Genetic Data Analysis,
International Conference on Statistics in Health Sciences, Nantes, France,
June 2004.
- Co-organizer, student paper competition for the
IISA conference, Athens, GA, May 2004.
- Organized (and chaired) an invited session
titled "Recent Contributions in Bioinformatics" at JSM San
Francisco, August, 2003.
- Executive Board Member and President of Young
Professional Statisticians, IISA.
- Organized an invited session on Bioinformatics
at SCRA 2002-FIM IX: Ninth International Conference of Forum for
Interdisciplinary Mathematics on Statistics Combinatorics
and Related Areas, Department of Statistics and Department of Mathematics:
University of Allahabad, Allahabad, UP 211 002, India, December 21-23,
2002.
- Organized an invited session titled Survival Skills for Young Statisticians at
the IIISA International Conference on Statistics, Probability and Related
Areas, Dekalb, Illinois, June 2002.
- Organized an invited session on Statistics in Bioinformatics at the
International Conference on Statistics, Combinatorics
and Related Areas and the Eighth International Conference of the
Forum for Interdisciplinary Mathematics, Wollongong, Australia, December
2001.
- Chair, (invited session) Bioinformatics:
Statistical Perspectives and Controversies' at International Conference on
Statistics, Combinatorics and Related Areas and the Eighth International Conference of the Forum for
Interdisciplinary Mathematics December 2001.
- Invited Session Organizer, ENAR, 2001 Joint
Statistical Meeting, 2001, Atlanta, Georgia..
- Session Chair, Statistical Genetics, ENAR Spring
Meeting, 1999, Atlanta, Georgia.
- Local Organizing Committee, ENAR Spring Meeting,
1999, Atlanta, Georgia.
- Session Chair, Applications of State-Space
Modeling in the Science, Special Contributed Session, Joint
Statistical Meeting, 1999, Baltimore, Maryland.
Publications:
Refereed Publications:
- Datta, S., Fu, Y. X., Arnold, J. (1996). Dynamics and equilibrium behavior
of cytonuclear disequilibria under genetic
drift, mutation, and migration, Theoretical Population Biology,
50, 298-324.
- Datta, S. and Arnold, J. (1996). Diagnostics and a statistical test of
neutrality hypothesis using the dynamics of cytonuclear
disequilibria, Biometrics, 52, 1042-1054.
- Datta, S., Rand, D. M., and Arnold, J. (1996). A statistical test of a
neutral model using the dynamics of cytonuclear
disequilibria, Genetics, 144, 1985-1992.
- Longini, I. M., Datta, S., and Halloran, E. (1996). Measuring vaccine efficacy for
both susceptibility to infection in infectiousness for prophylactic HIV-1
vaccines, Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology, 13, 440-447.
- Datta, S., Longini, I. M., and Halloran,
E. (1997). Measuring vaccine efficacy for different HIV vaccine trials, Statistics
in Medicine, 17, 185-200.
- Datta, S. and Arnold, J. (1998). Dynamics of cytonuclear
disequilibria in subdivided populations, Journal of Theoretical
Biology, 192, 99-111.
- Scribner, K. T., Datta, S., Arnold, J.,
and Avise, J. C. (1999). Empirical evaluation of
cytonuclear models incorporating genetic drift
and tests for neutrality of mtDNA variants: data
from experimental Gambusia hybrid zones, Genetica, 105, 101-108.
- Datta, S., Halloran, E. M. and Longini,
I. M. (1999). Efficiency of estimating vaccine efficacy for susceptibility
and infectiousness: randomization by individual versus household, Biometrics,
55, 792-798.
- Datta, S. (2000). Some statistical aspects of cytonuclear
disequilibria. In Statistics in Molecular Biology and Genetics,
Ed: Francoise Seillier-Moiseiwitsch, IMS Lecture
Notes-Monograph Series, 33, 21-37.
- Datta, S., Satten, G.
A. and Datta, S. (2000). Nonparametric estimation for the three
stage irreversible illness-death model, Biometrics, 56,
841-847.
- Datta, S. (2000). Some statistical issues involving multi-generation cytonuclear data, In Advances on Methodological and
Applied Aspects of Probability and Statistics, N. Balakrishnan, Ed., Gordon and Breach, 525-546.
- Datta, S., Satten, G.
A. and Datta, S. (2000). Estimation of stage occupation
probabilities in multistage models, In Advances on Theoretical and
Methodological Aspects of Probability and Statistics, N. Balakrishnan, Ed., Gordon and Breach, 493-506.
- Datta, S. (2001). Estimation of selection parameters using multi-generation cytonuclear data, Biometrical Journal,
43, 219-233.
- Datta, S. (2001). Exploring relationships in gene expressions: A partial
least squares approach, Gene
Expression, 9, 257-264.
- Datta, S. (2001). Testing neutrality of mtDNA
using multigeneration cytonuclear
data, Selected Proceedings of the Symposium on Inference for
Stochastic Processes, Eds.: I. V. Basawa,
C. C. Heyde and R. L. Taylor, IMS Lecture
Notes - Monograph Series, 37, 173-184, IMS, Beachwood, OH.
- Datta, S. and Arnold, J. (2002). Some comparisons of clustering and
classification techniques applied to transcriptional profiling data. In Advances in Statistics, Combinatorics
and Related Areas, Eds.: C. Gulati,
Y-X. Lin, S. Mishra, and J. Rayner,
World Scientific, 63-74.
- Datta, S. (2003). Statistical techniques for microarray
data: A partial overview, Communications in Statistics-Theory and
Methods, 32, 263-280.
- Datta, S. and Datta, S.(2003) Comparisons and validation of statistical
clustering techniques for microarray gene
expression data, Bioinformatics, 19, 459-466 (2003). Web
Supplement
- Arnold, J., Schuttler,
H.-B.,Logan, D., Griffith, J., Arpinar, B. Datta, S., Kochut,
K. J., Kraemer, E., Miller, J. A., Sheth, A., Aleman-Meza, B., Doss, J., Harris, L. and Nyong, A. (2003). Metabolomics,
In Handbook of Industrial Mycology, Chapter 23.
Marcel-Dekker, New York, NY, (2003).
- G., Brehm, S., Datta,
S., and Adams, M. W. W. (2003). Whole Genome DNA microarray
of a hyperthermophile and an archaeon:
Pyrococcus furious grown on peptides and
carbohydrate, Journal of Bacteriology, 185, 3935-3947.
- Datta, S., Satten, G. A., Benos, D. J., Xia, J.,
Heslin, M., and Datta, S. (2004). An empirical Bayes adjustment to increase the sensitivity of
detecting differentially expressed genes in microarray
experiments, Bioinformatics, 20, 235-242.
- Datta, S. and Datta, S. (2004). An
empirical Bayes adjustment to multiple p-values
for the detection of differentially expressed genes in microarray
experiments. In Bioinformatics 2004, Conferences in Research
and Practice in Information Technology - Second Asia-Pacific
Bioinformatics Conference, 29,Y-P. P. Chen, Ed., 155-159,
Australian Computer Society, Sydney.
- Warrenfeltz, Z., Pavlik, S., Datta,
S., Kraemer, E., Benedict, B. Mcdonald, J.
F. (2004). Gene expression
profiling of epithelial ovarian tumors corelated
with malignant potential. Molecular
Cancer, 2004, 3:27.
- Datta, S. and
Datta, S. (2005). Empirical Bayes screening (EBS) of many p-values with
applications to microarray studies, Bioinformatics, 21, 1987-1994.
- Weinberg, M. V., Schut, G. J., Brehm, S., Datta,
S., and Adams, M. W. W.
(2005). A hyperthermoplilic cold shock response: the archaeon Pyrococcus
furiosus synthesizes novel membrane-bound glycoproteins at a sub-optimal growth temperature. Journal
of Bacteriology, 187, 336-348.
- Datta, S. and de Padilla, L.M. (2006). Feature selection and machine
learning with mass spectrometry data for distinguishing cancer and
non-cancer samples, Statistical Methodology (Special Issue on Bioinformatics), 3, 79-92.
- Datta, S. and Datta, S. (2006). Validation
measures for clustering algorithms incorporating biological information, IEEE
Proceedings of International Multi-Symposiums on Computer and Computional Sciences (IMSCCS|06), (J.
Ni, J. Dongarra, Y. Zheng,
G. Gu, G. Wolfgang and H. Jin, Eds.), 1, 131-135.
- Datta, S. and Datta, S. (2006). Evaluation of clustering algorithms for gene
expression data, BMC Bioinformatics, 7 (Suppl
4): S17.
- Datta, S. and Datta, S. (2006). Methods for evaluating clustering algorithms for
gene expression data using a reference set of functional classes,
BMC
Bioinformatics, 7, 397.

- Boratyn, G. M., Datta,
S. and Datta, S. (2006). Biologically supervised hierarchical
clustering algorithms for gene expression data, Proceedings of the 28th IEEE
EMBS Annual International Conference, New York City, USA,
5515-5518.
- Datta, S., Le-Rademacher, J.
and Datta, S. (2007). Predicting patient survival from microarray
data by accelerated failure time modeling using partial least squares and
LASSO, Biometrics, 63,
259-271.
- Datta, S., Datta, S., Parrish, R. S. and Thompson, C. M. (2007). Microarray data analysis. In Computational Methods in Biomedical Research, R. Khatree and D. Naik, eds., Chapman & Hall/CRC Biostatistics Series, Volume 24, 1-43.
- Boratyn, G. M., Datta, S.
and Datta, S. (2007). Incorporation of biological
knowledge into distance for clustering genes. Bioinformation,
1, 396-405.
- Pihur, V., Datta, S.
and Datta, S. (2007). Weighted rank aggregation of cluster
validation measures: A Monte Carlo cross-entropy approach. Bioinformatics,
23, 1607-1615.
- Pihur, V., Datta, S.
and Datta, S. (2008). Finding cancer genes through meta-analysis of microarray experiments: Rank aggregation via the cross
entropy algorithm. Genomics, to appear.
doi:10.1016/j.ygeno.2008.05.003
- Pihur, V., Datta, S. and Datta, S. (2007). Understanding Chronic Fatigue Syndrome
(CFS) from CAMDA data: A systems biology approach. Proceedings
of CAMDA 2007, full paper, online @
http://camda.bioinfo.cipf.es/camda07/agenda/detailed.html.
- Pihur, V., Brock, G., Datta, S. and Datta, S. (2008). Cluster
validation for microarray data: An appraisal. In Multivariate Statistical Methods, ( A. SenGupta, ed), ISI Platinum Jubilee series, Vol 5, World
Scientific Press, to appear (2008).
- Brock,
G., Pihur, V., Datta, S. and Datta,
S. (2008). clValid ,
an R package for cluster validation. Journal of Statistical Software,
25, 4.
- Pihur, V., Datta,
S. and Datta,
S. (2008). Reconstruction
of genetic association networks from microarray
data: A partial least squares approach. Bioinformatics, 24, 561-568.
- Datta,
S., Turner, D., Singh, R., Ruset, B., Pierce,
W. M., and Knudsen, T. B. (2008).
Fetal alcohol syndrome in mice detected through proteomics screening of
the amniotic fluid. Birth
Defects Research Part A: Clinical and Molecular Teratology, 82, 177-186.
- Datta, S. and Pihur, V. (2008). Feature selection and machine
learning with mass spectrometry data, R. Matthiesen,
ed., In Clinical Proteomics: Methods, Applications and Tools,
Humana Press, to appear.
Other Publications:
- Datta, S. (1999). Hypotheses testing for different
selection models using multi-generation cytonuclear
data, Proceedings of American
Statistical Association, Biometrics Section, 157-161,
Alexandria, USA.
- Datta, S. (2000). Book Review: Statistics in Human
Genetics by Pak Sham. Statistics
in Medicine, 19,1384-1385.
- Datta, S. (2005). Statistics in Genetics, In Encyclopedia of Statistical Sciences,
Second edition, Wiley, New York.
- Datta, S. (2005). Statistics in Microarray
Analysis, In Encyclopedia of
Statistical Sciences, Second edition, Wiley, New York.
- Datta, S. (2005). Statistics in Vaccine Studies, In Encyclopedia of Statistical Sciences,
Second edition, Wiley, New York.
- Datta,
S. and Datta, S. (2006).
Validation of statistical clustering using biological information,
Proceedings of INTERFACE 2005 (CD-ROM).
Grants:
- PI 17%
effort, National Science Foundation, Statistics Program (DMS), Standard
Grant, Statistical peak detection, adaptive classification and
protein-protein network construction using mass spectra, DMS-0805559, 2008-2011.
- Biostatistics
Group Leader, Bioinformatics, Biostatistics and Computational
Biology Core, Center for Environmental Genomics and Integrative Biology
(K. Ramos, PI, Louisville), 10% effort, NIEHS-NIH, 2007-2011.
- PI 30% effort, Proteomics Based Approach for Early
Detection of Fetal Alcohol Syndrome, P20-RR/DE17702, NIH COBRE (PI, R.
Green), 2006-2007.
- Co-I (M. J.
Kennedy, PI, Louisville) 5% effort, Aminoglycoside
Urinary Proteomics, 2007-2009.
- Biostatistician (J. Klein, PI, Louisville) 10% effort,
Pediatric Clinical Proteomics Center, Department of Energy, 2005-2008.
- PI U of L
subcontract (E. Voit, PI, Georgia Tech.) 10% effort, The Trehalose Cycle as Paradigm, National Science
Foundation, 2005-2008.
- Co-I (P. Epstein, PI, Louisville) 10% effort, NIH
R01, Podocytes and Oxidative stress in diabetic
Kidney, 2006-2007.
- Co-PI
(K. B. Grant, PI) Brains and Behavior Seed Grant, GSU, $25414, 2005-2006.
- Co-PI
(I. Weber, PI) Research Program Enhancement Award, GSU, student support,
$36000, 2004-2007.
- Investigator, Student & Travel support for five years,
$75000, Georgia Cancer Coalition (Michael Eriksen
PI), 2004-2009.
- Statistician (2 months of summer salary), BimCore, Emory University, summer 2004.
- Co PI (M.
Brinton, PI) Biomedical Computing Center Seed
Grant, GSU, $13467, summer 2004.
- Consultant on a NSF funded project in Structural Biology
(B. C. Wang, PI), University of Georgia, $8087, summer 2003.
- Co-PI
(J. Arnold, PI) Genomics and Computational Biology: A REU Site, National
Science Foundation, Joint Program between UGA, GA State and Clarke Atlanta
University, $210,000, 2003-2005.
- Co-PI (G.
Chen, PI) Tech Fee Grant , GSU, $58002, 2003.
- PI
(no co-PI) Statistical Analysis of Microarray
Gene Expression Data, National Science Foundation, $127,671, 2000-2002.
- PI
(no co-PI), Research Experience for Undergraduates in Fungal Genomics and
Computational Biology: GSU VPRSP grant, $18,666, Summer 2001.
- PI
(no co PI) A Pilot Project for Developing Statistical Tools for
Bioinformatics. GSU faculty initiation grant, $5000, 2000-2001.
- Co PI
(D. Vidacovic, PI) Instructional Improvement Grant,
GSU Center for Teaching & Learning, $5000, 2000-2001.
- Co PI
(E. Dubinsky, PI) IPCURT Project Course and
Curriculum Development, National Science Foundation, $100,000,
1998 - 1999.
Honors/Awards/Press:
- Elected Member of International Statistical
institute, October 2007.
- Nominated for Provost's Award for exemplary
advising, May 2007.
- Appeared in Fox News Atlanta, May 2004.
- Featured Research faculty in College of Arts and
Sciences, Feb., 2003.
- Co-recipient of the CURO Excellence in
Undergraduate Research Mentoring Award from University of Georgia, April
2002.
- Press coverage Atlanta Business Chronicle, April, 2002.
- Press coverage Georgia State University Magazine, Fall, 2002.
- NCI travel award for "Workshops for Junior
Biostatisticians, 2001 ENAR", Charlotte, N. Carolina.
- Phi Kappa Phi honor society, April 2000.
- Outstanding Junior Faculty Award nomination,
Georgia State University, Atlanta, Georgia, April 2000.
- NSF Travel Award for IBC98, Cape Town, South
Africa, December 1998.
- NSF Travel Award for Pathways to the Future
workshop, Dallas, Texas, August 1998
- Student paper award in SRCOS/ASA summer
conference, Melbourne, Florida, June 1995.
- Best Theoretical Student Award, Department of
Statistics, University of Georgia, Athens, Georgia, 1994.
Presentations:
Invited Talks at Professional/Research
Meetings:
- UT-ORNL-KBRIN Bioinformatics Summit 2008, “Determination of optimal clustering algorithm
by weighted rank aggregation: Cross entropy algorithm”, March 28, 2008, Cadiz, KY.
- International Conference on Statistics,
Probability and Related Areas by IISA, January 2-5, 2007, Cochin, India.
- International Conference on Multivariate
Statistical Methods, Dec 28-29, 2006, Kolkata, India.
- International Multi-Symposiums on Computer and
Computational Sciences (IMSCCS|06), “Combining functional information in
validation of statistical clustering”, June 20-24, 2006, Zhejiang University, Hangzhou,
China.
- UT-ORNL-KBRIN Bioinformatics Summit 2006,
“Clustering Microarray Data”, April 21-23, 2006, Cadiz, Kentucky.
- SCMA 2005 / FIM XII, International Conference on
Statistics, Combinatorics, Mathematics and
Applications: 12th Annual Conference of the Forum for Interdisciplinary
Mathematics, “Feature Selection in Mass Spectrometry Data for Cancer
Classification”, December 2-4, 2005, Auburn University, AL, USA.
- Joint Annual Meeting of the Interface and the
Classification Society of North America, “Selecting an appropriate
clustering algorithm for analyzing microarray
data”, June 8, 2005 - June 12, 2005, Washington University School of
Medicine, St. Louis, Missouri.
- International
Conference on Future of Statistical Theory, Practice and Education,
December 29, 2004 - January 1, 2005, Hyderabad, India.
- Eleventh International Conference on Interdisciplinary
Mathematical and Statistical Techniques, SCRA 2004, December 27-29, 2004, Lucknow, India.
- Joint Statistical Meeting, “Parametric and
Nonparametric Empirical Bayes Adjustments to
Multiple P-values for the Detection of Differentially Expressed Genes in Microarray Experiments”, August 7-12, 2004, Toronto,
Canada
- Microarray Data Analysis Conference arranged by Infocast Inc. Conference arranged “Empirical Bayes Screening of Many P-values with Applications to Microarray Studies”, June 28-29, 2004, Rockville,
MD, USA.
- International Conference on Statistics in Health
Sciences, “Empirical Bayes Screening of Many
P-values with Applications to Microarray
Studies”, June 23-25, 2004, Nantes, France.
- IISA 2004 Meeting, “Empirical Bayes Analyses of Multiple p-values For the Detection
of Differentially Expressed Genes in Microarray
Experiments”, May 7-9, 2004, Athens, Georgia, USA.