Shuai Li

alt text 

Shuai Li
Assistant Professor
John Hopcroft Center
Shanghai Jiao Tong University

CV GoogleScholar ResearchGate DBLP LinkedIn GitHub

About me

I am a tenure-track assistant professor in John Hopcroft Center of Shanghai Jiao Tong University.

I received my PhD degree in the Chinese University of Hong Kong under the supervision of Prof. Kwong-Sak Leung. During the PhD, I received Google PhD Fellowship of year 2018 in the field of machine learning. Before that, I obtained my bachelor degree in Mathematics from Zhejiang University and my master degree in Mathematics from University of the Chinese Academy of Sciences.

My research interest lies at reinforcement learning and multi-armed bandits and focuses on the algorithm design and the regret analysis. My current major research topics include reinforcement learning algorithms, online matching markets, online learning to rank, bandits with graph feedback and online clustering of bandits. I am also interested in deep learning theory, general theoretical learning problems and the applications for these algorithms in recommender systems. I am looking forward to research collaborations with industry and academia. Please contact me if you are interested.


  • 04/2022: We have a new paper accepted at ICML 2022 which is the first work to simultaneoulsy learn bandits with general graph feedback in both stochastic and adversarial environments.

  • 04/2022: We have a new paper accepted at IJCAI 2022 which analyzes the convergence of Thompson sampling algorithm for the bandit learning in the matching markets.

  • 12/2021: We have a new paper accepted at AAAI 2022 which is the first work that studies the best-of-both-worlds problem in online learning to rank, under the position-based model.

  • 09/2021: Two papers are accepted at NeurIPS. One talks about a new graph quantity that could be more tight for graph bandits with weakly observable graphs. And the other one gives the first positive result for Thompson sampling algorithm in combinatorial bandits with approximation (greedy) oracle.

  • 04/2021: We have a new paper accepted in SIGIR 2021. The work improves the current bandit algorithm for conversational recommendation system by considering comparison feedback.

  • 09/2020: We have a new paper accepted in NeurIPS 2020. The work provides the first regret bound on online influence maximization under linear threshold model.