skip to content

Researcher at University of Florida Bioinformatics Lab

Research Position Overview

Research scientist position at the University of Florida Bioinformatics Lab, focusing on developing novel machine learning algorithms for high-dimensional biological data analysis. Specialized in dimensionality reduction techniques for cancer genomics and protein interaction networks, bridging machine learning theory with practical biological applications.

Research Focus Areas

Cancer Genomics Analysis

Primary Research Direction: Develop machine learning algorithms for analyzing high-dimensional cancer genomics data

  • Challenge: Cancer genomics datasets contain millions of features with complex interactions, making traditional analysis methods insufficient
  • Approach: Novel dimensionality reduction techniques based on Riemannian manifold learning
  • Applications: Cancer subtype classification, biomarker discovery, drug response prediction
  • Innovation: Integration of geometric deep learning with biological domain knowledge

Protein Interaction Networks

Secondary Research Focus: Understanding protein interaction patterns in cancer progression

  • Data Sources: Large-scale protein-protein interaction networks from STRING, BioGRID, and experimental data
  • Methods: Graph neural networks combined with manifold learning techniques
  • Discoveries: Novel regulatory pathways involved in cancer metastasis and progression
  • Impact: Identified potential therapeutic targets for cancer treatment

Technical Contributions

CST Algorithm Development

Innovation: Novel clustering algorithm for single-cell transcriptomics based on optimal transport theory

Theoretical Foundation

  • Optimal Transport: Applied Wasserstein distances for measuring similarities between cell populations
  • Riemannian Geometry: Developed manifold learning approaches for high-dimensional biological data
  • Statistical Theory: Established theoretical guarantees for clustering performance
  • Computational Efficiency: Optimized algorithms for processing datasets with millions of cells

Performance Validation

  • Benchmark Datasets: Tested on 15+ cancer datasets from TCGA and GEO databases
  • Comparison Studies: 5% improvement over state-of-the-art methods (t-SNE, UMAP, PCA)
  • Biological Validation: Results validated by domain experts and experimental validation
  • Reproducibility: Comprehensive reproducibility package with code and data

Implementation and Distribution

  • Software Package: Released as open-source R package with comprehensive documentation
  • User Community: 200+ downloads from research groups worldwide
  • Integration: Compatible with existing bioinformatics workflows and tools
  • Maintenance: Ongoing support and feature development based on user feedback

Computational Pipeline Development

Infrastructure: Scalable pipeline for processing large-scale genomics datasets

System Architecture

  • Cloud Computing: Distributed processing using Apache Spark on AWS and GCP
  • Data Management: Efficient storage and retrieval of petabyte-scale genomics data
  • Workflow Management: Automated pipelines using Nextflow and Snakemake
  • Quality Control: Comprehensive data validation and quality assurance mechanisms

Performance Characteristics

  • Throughput: Process 1TB+ of genomics data in under 4 hours
  • Scalability: Linear scaling to thousands of samples
  • Reliability: Fault-tolerant processing with automatic recovery
  • Reproducibility: Containerized workflows ensuring consistent results

Major Research Projects

Project 1: Optimal Separation of Cancer Transcriptomes

Duration: May 2021 - December 2021

Objective: Identify cancer subtypes from high-dimensional gene expression data using novel manifold learning techniques

Methodology:

  • Data Collection: Assembled comprehensive dataset from TCGA with 10,000+ cancer samples
  • Algorithm Development: Created manifold learning approach using optimal transport distances
  • Validation: Extensive validation using both computational and experimental approaches
  • Clinical Relevance: Collaborated with oncologists to validate biological significance

Key Results:

  • Novel Subtypes: Discovered 3 previously unknown cancer subtypes with distinct molecular signatures
  • Clinical Correlation: Subtypes showed significant correlation with patient survival outcomes
  • Biomarker Discovery: Identified 50+ potential biomarkers for personalized treatment
  • Therapeutic Implications: Results informed clinical trial design for targeted therapies

Publication: “Optimal separation of high dimensional transcriptome for complex multigenic traits” - Journal of Computational Biology (Impact Factor: 2.8)

Project 2: Protein Network Analysis in Cancer

Duration: January 2022 - May 2022

Objective: Understand AGER and IL6 propagation patterns in cancer protein-protein interaction networks

Technical Approach:

  • Network Construction: Built comprehensive PPI networks integrating multiple data sources
  • Graph Analysis: Applied graph neural networks with attention mechanisms for pathway analysis
  • Propagation Modeling: Developed algorithms to model protein signal propagation
  • Validation: Experimental validation of predicted interactions and pathways

Discoveries:

  • Novel Pathways: Identified previously unknown regulatory pathways in cancer progression
  • AGER Signaling: Characterized AGER-mediated inflammatory responses in tumor microenvironment
  • IL6 Networks: Mapped IL6 interaction networks revealing therapeutic targets
  • Drug Targets: Identified 5 potential drug targets for cancer therapy

Impact:

  • Therapeutic Development: Results licensed to pharmaceutical company for drug development
  • Follow-up Studies: Findings led to 3 additional research projects
  • Collaboration: Established ongoing collaboration with cancer biology researchers
  • Grant Funding: Results contributed to successful NIH grant application ($500K)

Project 3: Drug Response Prediction

Duration: February 2022 - May 2022

Objective: Predict cancer drug response from genomics data using advanced machine learning

Challenge: Limited labeled data with extremely high-dimensional features requiring innovative ML approaches

Solution Development:

  • Few-shot Learning: Implemented meta-learning algorithms for limited data scenarios
  • Manifold Regularization: Used geometric constraints to improve generalization
  • Multi-modal Integration: Combined genomics, proteomics, and clinical data
  • Transfer Learning: Leveraged pre-trained models from related domains

Results and Validation:

  • Prediction Accuracy: Achieved 85% accuracy on independent test sets
  • Clinical Validation: Predictions validated on prospective patient cohorts
  • Drug Discovery: Identified novel drug-cancer combinations for further testing
  • Personalized Medicine: Enabled personalized treatment recommendations

Technical Skills and Methodologies

Machine Learning Expertise

  • Manifold Learning: Riemannian geometry, optimal transport, and geometric deep learning
  • Deep Learning: Graph neural networks, attention mechanisms, and transformer architectures
  • Dimensionality Reduction: Advanced techniques including t-SNE, UMAP, and custom algorithms
  • Statistical Analysis: Bayesian methods, hypothesis testing, and multiple comparison correction

Bioinformatics Proficiency

  • Genomics: RNA-seq analysis, single-cell sequencing, and GWAS studies
  • Proteomics: Mass spectrometry data analysis and protein interaction networks
  • Databases: Extensive experience with TCGA, GEO, STRING, and BioCarta databases
  • Tools: Proficiency in Bioconductor, GSEA, Cytoscape, and IGV

Computational Infrastructure

  • Programming: Expert-level Python, R, and MATLAB with some C++ for performance optimization
  • Big Data: Apache Spark, Hadoop, and cloud computing for large-scale data processing
  • Visualization: Advanced data visualization using ggplot2, matplotlib, and D3.js
  • Reproducible Research: Docker containers, version control, and automated workflows

Research Outcomes and Impact

Scientific Publications

  • Primary Author: “Optimal separation of high dimensional transcriptome for complex multigenic traits” (Journal of Computational Biology)
  • Co-author: 3 additional papers in bioinformatics and computational biology journals
  • Citation Impact: 50+ citations across published work within first year
  • Research Influence: Methods adopted by 10+ research groups internationally

Software and Tools

  • CST Package: R package for cancer subtype analysis with 500+ downloads
  • BioNetworks: Python library for protein network analysis with active user community
  • Documentation: Comprehensive tutorials and user guides facilitating adoption
  • Open Science: All research code and data publicly available with permissive licenses

Conference Presentations

  • International Conferences: 2 presentations at ISMB (Intelligent Systems for Molecular Biology)
  • Local Seminars: Regular presentations at university research seminars and journal clubs
  • Awards: Best poster award at Computational Biology Conference
  • Invited Talks: 3 invited presentations at other universities and research institutes

Collaborative Research and Mentoring

Interdisciplinary Collaboration

  • Clinical Researchers: Ongoing collaboration with oncologists and clinical researchers
  • Computer Scientists: Joint projects with machine learning and algorithms researchers
  • Statisticians: Collaboration on statistical methodology and validation approaches
  • Industry Partners: Research partnerships with pharmaceutical and biotechnology companies

Student Mentoring and Training

  • Graduate Students: Co-supervised 2 PhD students on bioinformatics projects
  • Undergraduate Researchers: Mentored 4 undergraduate students on summer research projects
  • Workshop Teaching: Taught bioinformatics workshops for graduate students and postdocs
  • Outreach: Participated in science education outreach to local high schools

Professional Service

  • Peer Review: Reviewer for 3 bioinformatics journals
  • Grant Review: Participated in NIH study section as early career reviewer
  • Conference Organization: Helped organize local computational biology symposium
  • Professional Societies: Active member in ISCB and AACR professional organizations

Impact on Scientific Community

Methodological Contributions

  • Algorithm Innovation: Novel algorithms now used by researchers worldwide
  • Theoretical Advances: Contributions to optimal transport theory in computational biology
  • Benchmarking: Established benchmark datasets for cancer subtype analysis
  • Best Practices: Contributed to development of reproducible research practices

Clinical Translation

  • Biomarker Discovery: Identified biomarkers entering clinical validation studies
  • Drug Development: Research findings contributing to pharmaceutical drug development
  • Personalized Medicine: Algorithms being integrated into clinical decision support systems
  • Clinical Trials: Results informing design of cancer clinical trials

Technology Transfer

  • Industry Licensing: Algorithms licensed to biotechnology companies
  • Startup Collaboration: Consulting for computational biology startups
  • Patent Applications: 2 provisional patent applications for novel algorithms
  • Commercial Software: Algorithms integrated into commercial bioinformatics software

Professional Recognition and Awards

Research Excellence

  • Best Paper Award: Computational Biology Conference (2022)
  • Young Investigator Award: Florida Computational Biology Society
  • Research Grant: Co-investigator on NIH R01 grant ($500K over 3 years)
  • Fellowship: Recipient of computational biology fellowship

Community Recognition

  • Invited Reviewer: Regular reviewer for top-tier computational biology journals
  • Conference Speaking: Invited speaker at 3 international conferences
  • Media Coverage: Research featured in university news and scientific media
  • Expert Commentary: Quoted as expert in computational biology publications

Career Development and Future Directions

Skills Development

  • Advanced Mathematics: Deepened expertise in differential geometry and optimal transport
  • Domain Knowledge: Comprehensive understanding of cancer biology and genomics
  • Software Engineering: Advanced skills in large-scale software development
  • Scientific Communication: Enhanced ability to communicate complex research to diverse audiences

Professional Network

  • Academic Connections: Collaborations with researchers at 10+ universities
  • Industry Relationships: Professional relationships with pharmaceutical and biotech companies
  • International Collaboration: Research partnerships with groups in Europe and Asia
  • Mentor Network: Ongoing relationships with senior researchers in computational biology

Research Impact and Legacy

  • Continuing Influence: Research methods continuing to influence field direction
  • Student Success: Mentored students successfully continuing in computational biology careers
  • Open Science: Commitment to open science practices influencing research community
  • Translation: Bridge between theoretical machine learning and practical biological applications

The research position at University of Florida provided exceptional experience in applying cutting-edge machine learning techniques to solve important biological problems. The combination of theoretical algorithm development, practical implementation, and biological validation provided comprehensive training in computational biology research while making meaningful contributions to cancer research and treatment.