Yu Yang
Ph.D. in Statistics
I am a Ph.D. in Statistics. I am always hungry to learn. I am interested in causality, financial modeling, and natural language processing.
My motto is: Respect Life!
Featured Projects
Here are some selected projects that I have done. More projects can be found at my Github.
FlowSUM: Boosting Summarization with Normalizing Flows and Aggressive Training
This is one of my thesis projects. It focuses on improving summarization with normalizing flows. In this work, we proposed FlowSUM as the model structure and CAAT as the training strategy. The paper is accepted in EMNLP 2023.
Github repo EMNLP 2023 Slides PosterA hierarchical ensemble causal structure learning approach for wafer manufacturing
This is a project collaborating with Seagate Technology. In this work, we proposed a hierarchical ensemble method to unveil the causal structure of the wafer manufacturing assembly line.
J Intell Manuf (2023)Topic-Aware Text Summarization
This is a demo project to investigate the effectiveness of using text RBM to insert topic information to summarization models.
Github repo Final report SlidesRetro-BiDAF: A Retrospective Reader Over BiDAF
For the SQuAD 2.0 Challenge, I combined the idea of retrospective reading and BiDAF and proposed the Retro-BiDAF model, which improved both the EM and F1 score in the non-PCE scenario.
Github repo Final report SlidesKaggle: Lyft Motion Prediction for Autonomous Vehicles
This Kaggle competition was supported by Lyft and the goal was to build a motion prediction model for self-driving vehicles. We built an ensembled model with ResNet, DenseNet, and EfficientNet, and ranked top 6% in the end.
Github repoWells Fargo Campus Analytics Challenge 2020
This challenge was a binary classification problem. Our shiny point was the proposal of a novel method called Sparse Grouping Pursuit to discover the sparseness and grouping structure among features, which led to a tremendous dimension reduction. Our solution was selected as one of the Grand Prize Winners of the year.
Github repo Final reportMinneMUDAC 2019 Student Data Science Challenge
The objective of this challenge was to predict soybean price in the commodity market. Our work was highly regarded by the judges in both academia and industry. And we won the Analytic Acumen Award in the end.
Github repo Blog post More about the projectKaggle: Travelers Claim Fraud Detection
This was an in-class project supported by Travelers. The goal was to detect claim fraud. Our team won 2nd place.
Github repo Blog postExperience
Applied AI ML Associate Sr
JPMorgan Chase & Co.
Jun. 2023 - present
I am now working as an AI & ML research scientist at Machine Learning Center Of Excellence.
Teaching Assistant
University of Minnesota
Jan. 2022 - May 2023
I have worked as a teaching assistant for the course STAT 3021H Introduction to Probability and Statistics Honors.
Graduate Instructor
University of Minnesota
Sep. 2022 - Jan. 2023
I have worked as a graduate instructor for the course STAT 3011 Introduction to Statistical Analysis.
AI & Data Science Summer Associate
JPMorgan Chase & Co.
Jun. 2022 - Sep. 2022
I have worked as an intern on machine learning projects.
Research Assistant
Seagate Technology
Sep. 2019 - Apr. 2022
I have worked as a research assistant on projects collaborating with Seagate Technology.
Teaching Assistant
University of Minnesota
Sep. 2018 - May 2019
I have worked as a teaching assistant for the course STAT 3011 Introduction to Statistical Analysis for two semesters.
Education
University of Minnesota - Minneapolis, USA
Ph.D. in Statistics, Aug 2018 - June 2023
My research focuses on text summarization and causal discovery.
Shanghai University of Finance and Economics - Shanghai, China
B.S. in Statistics, Sep 2014 - June 2018
I had a great time during my undergraduate. I played softball in the college and I miss my teammates and the training time on the fields so much!