Han Truong

Han Truong

Data Scientist

About Me

I'm currently a Data Scientist at Meta, previously graduated from Harvard University with a Master's in Data Science. With two years of experience working closely with engineers in Meta's Ads Infra, I have experience in extracting actionable insights from ML systems data to boost efficiency of ads ranking models and maximize revenue potential.

Work Experience

Data Scientist - Meta

New York, NY · Jan 2022 - Present

Established & owned the infra cost translation data science initiative among data science team to produce high-fidelity hardware translation factors and presented achievements to director’s review session. Developed and aligned with engineer teams on end-to-end ROI framework for ML pipelines across the pipelines’ lifecycle states to normalize previously overinflated heuristics ROI estimates. Built an estimator to predict storage cost of new ML pipelines based on their configuration files that was integrated into infra team’s automated admission control system. Formulated ROI-based admission plan to onboard new ML pipelines based on storage demands and supplies from multiple pillars (Facebook, Instagram, Ads Ranking)

Data Science Intern - Capital One

McLean, VA · Jun 2021 - Aug 2021

As a summer intern in the Credit Risk organization, I built a review toolkit for credit risk GBM models with a focus on feature importance drift to determine necessary credit policy adjustments for new customer acquisitions. These models are built to predict risk of client default based on application data & FICO report attributes. During developement, I used the toolkit to evaluate GBM models that were trained on 22M+ card applications via AWS EC2 clusters. Through this, I uncovered interesting trends about importance drift of certain input features that warrant adjustments to existing credit policy and presented these insights to the Senior VP of Credit Review.

Data Scientist - Stealth Startup

Part-Time & Remote · Aug 2020 - Feb 2021

As an early employee in the backend team, I used Flask and AWS Elastic Beanstalk to deploy our deep learning model as a REST API to interface with front-end mobile application and handle NBA game prediction queries. Our startup was recently accepted into Velocity Accelerator’s 2021 cohort.

Research Analyst - Columbia University Medical Center

New York, NY · Sep 2018 - Jul 2020

Under Dr. Chin Hur’s direction, I used Python to (a) train residual nets in PyTorch and assess their performance with scikit-learn to identify esophageal precancerous tissues on volumetric laser endomicroscopic images (along with Dr. Nicholas Tatonetti’s technical supervision); (b) develop a Markov model for cost-effectiveness analysis of surgical treatments in early-stage colorectal cancer that became a JAMA Open publication; (c) deploy a GoFundMe web crawler to automatically collect campaign info from cancer patients to explore geospatial and socioeconomic trends; and (d) incorporate confidential hospital performance data into a simulation model to optimize prehospital triage of stroke patients.

Summer Intern - Lawrence Livermore National Laboratory

San Francisco Bay Area, CA · Jun 2018 - Aug 2018

Under Dr. Jay Salmonson’s guidance, I built a PyTorch-based reinforcement learning module to optimize hyperparameters for automated mesh management in HYDRA, a multiphysics fusion ignition simulation package used by many research scientists at the lab, from scratch. The module was designed so that modifications to the neural network’s architecture could be done with ease. I used my module to train and test neural networks and concluded the internship with an exit presentation. My module is still actively used and contributed to by Dr. Salmonson and his new intern, with plans to be released as open-source in the future. Abstract was accepted to APS 2018.

Research Assistant - College of William & Mary

Williamsburg, VA · Jun 2015 - May 2018

Under Dr. John Delos’ oversight, I worked on 3 main projects with MATLAB: (a) refactor and add new tools to an outdated graphical user interface used to plot bedside signals; (b) redesign the storage structure of our ICU bedside signals and streamline the data query process in order to cut reading and writing time by 50% or more depending on the query; and (c) my senior thesis that I presented to pediatrists & biostatisticians at the University of Virginia School of Medicine where I developed a parallel-processing algorithm to collect physiological events associated with sleep apnea in preterm infants and used logistic regression to predict the infants' hospitalization outcomes.

Latest Projects

  Stroke  

GoFundMe Webscraper

GoFundMe webscraper and analysis tools for examining the associations between Internet literacy and socioeconomic status with the amount of money raised from online cancer crowdfunding. Our results indicate that those in high-SES counties as well as those with certain keywords in their campaign narratives raised significantly more money than those in low-SES counties or without these keywords respectively. I wrote the scraper, cleaned the parsed campaign stories and metadata. Together with my co-worker at Columbia University, I analyzed the final dataset consisted of 100,000+ campaigns and drafted the manuscript.

Libraries utilizied: BeautifulSoup, Requests, Google Maps API, US Census API, Pandas, Numpy, Jupyter, Scikit-learn, Seaborn

Publication in JAMA Open (Co-First-Author)

Code

Stroke

Stroke Hospital Triage Recommendation

A simulation-based Markov-chain model used to determine optimal hospital destination based on patients’ characteristics and stroke severity. I worked on this project during my time at Columbia University and a manuscript is currently being drafted for publication submission to an academic journal.

Libraries utilized: Pandas, Numpy, JSON, Google Maps API, Plotly, Mapbox API, Bash, Git

Abstract for International Stroke Conference 2020

Code