Data Scientist - Meta
New York, NY · Jan 2022 - Present
Established & owned the infra cost translation data science initiative among data science team to produce high-fidelity hardware translation factors and presented achievements to director’s review session. Developed and aligned with engineer teams on end-to-end ROI framework for ML pipelines across the pipelines’ lifecycle states to normalize previously overinflated heuristics ROI estimates. Built an estimator to predict storage cost of new ML pipelines based on their configuration files that was integrated into infra team’s automated admission control system. Formulated ROI-based admission plan to onboard new ML pipelines based on storage demands and supplies from multiple pillars (Facebook, Instagram, Ads Ranking)
Data Science Intern - Capital One
McLean, VA · Jun 2021 - Aug 2021
As a summer intern in the Credit Risk organization, I built a review toolkit for credit risk GBM models with a focus on feature importance drift to determine necessary credit policy adjustments for new customer acquisitions. These models are built to predict risk of client default based on application data & FICO report attributes. During developement, I used the toolkit to evaluate GBM models that were trained on 22M+ card applications via AWS EC2 clusters. Through this, I uncovered interesting trends about importance drift of certain input features that warrant adjustments to existing credit policy and presented these insights to the Senior VP of Credit Review.
Data Scientist - Stealth Startup
Part-Time & Remote · Aug 2020 - Feb 2021
As an early employee in the backend team, I used Flask and AWS Elastic Beanstalk to deploy our deep learning model as a REST API to interface with front-end mobile application and handle NBA game prediction queries. Our startup was recently accepted into Velocity Accelerator’s 2021 cohort.
Research Analyst - Columbia University Medical Center
New York, NY · Sep 2018 - Jul 2020
Under Dr. Chin Hur’s direction, I used Python to (a) train residual nets in PyTorch and assess their performance with scikit-learn to identify esophageal precancerous tissues on volumetric laser endomicroscopic images (along with Dr. Nicholas Tatonetti’s technical supervision);
(b) develop a Markov model for cost-effectiveness analysis of surgical treatments in early-stage colorectal cancer that became a JAMA Open publication; (c) deploy a GoFundMe web crawler to automatically collect campaign info from cancer patients to explore geospatial and socioeconomic trends; and (d) incorporate confidential hospital performance data into a simulation model to optimize prehospital triage of stroke patients.
Summer Intern - Lawrence Livermore National Laboratory
San Francisco Bay Area, CA · Jun 2018 - Aug 2018
Under Dr. Jay Salmonson’s guidance, I built a PyTorch-based reinforcement learning module to optimize hyperparameters for automated mesh management in HYDRA, a multiphysics fusion ignition simulation package used by many research scientists at the lab, from scratch. The module was designed so that modifications to the neural network’s architecture could be done with ease. I used my module to train and test neural networks and concluded the internship with an exit presentation. My module is still actively used and contributed to by Dr. Salmonson and his new intern, with plans to be released as open-source in the future.
Abstract was accepted to APS 2018.
Research Assistant - College of William & Mary
Williamsburg, VA · Jun 2015 - May 2018
Under Dr. John Delos’ oversight, I worked on 3 main projects with MATLAB: (a) refactor and add new tools to an outdated graphical user interface used to plot bedside signals; (b) redesign the storage structure of our ICU bedside signals and streamline the data query process in order to cut reading and writing time by 50% or more depending on the query;
and (c) my senior thesis that I presented to pediatrists & biostatisticians at the University of Virginia School of Medicine where I developed a parallel-processing algorithm to collect physiological events associated with sleep apnea in preterm infants and used logistic regression to predict the infants' hospitalization outcomes.