Description: Arguably, every entity in this universe is networked in one way or another. With the prevalence of network data collected, such as social media and biological networks, learning from networks has become an essential task in many applications. It is well recognized that network data is intricate and large-scale, and analytic tasks on network data become more and more sophisticated. In this tutorial, we systematically review the area of learning from networks, including algorithms, theoretical analysis, and illustrative applications. Starting with a quick recollection of the exciting history of the area, we formulate the core technical problems. Then, we introduce the fundamental approaches, that is, the feature selection based approaches and the network embedding based approaches. Next, we extend our discussion to attributed networks, which are popular in practice. Last, we cover the latest hot topic, graph neural based approaches. For each group of approaches, we also survey the associated theoretical analysis and real-world application examples. Our tutorial also inspires a series of open problems and challenges that may lead to future breakthroughs. The authors are productive and seasoned researchers active in this area who represent a nice combination of academia and industry.
Tutorial Outline & Slides
- Motivations
- Ubiquity and importance of networks
- Challenges of analyzing and learning from networks
- Traditional ways of network analysis and limitations
- Recent trends
- Learn from Networks with Feature Selection
- High-dimensional node features and curse of dimensionality
- Feature selection and its categorization
- Conventional feature selection w/o explicit node features
- Linked feature selection w/ explicit node features
- Network Embedding
- PStructure-preserved network embedding
- Property-preserved network embedding
- Dynamic network embedding
- Robustness, explainability and applicability
- Network embedding for biomedical applications
- Attributed Network Embedding
- Motivations & challenges
- Mining attributed networks with shallow embedding
- Mining attributed networks with deep embedding
- Human-centric network analysis
- Graph Neural Networks
- Graph neural networks
- Dynamic & heterogeneous GNN
- Billion-scale systems, applications, & challenges
Tutors & Contributors
Xiao Huang is an assistant professor at the Department of Computing, the Hong Kong Polytechnic University. He received Ph.D. from Texas A&M University in 2020, M.S. from Illinois Institute of Technology in 2015, and B.S. from Shanghai Jiao Tong University in 2012. He has actively published in several prestigious conferences and journals including KDD, WSDM, AAAI, SDM, IJCAI, ICDM, and TKDD. His publications are well recognized by the community and have received about 600 citations with h-index 8 according to Google Scholar. He won the 2019 INFORMS QSR best student paper finalist and Doctoral Forum Best Poster Runner-Up Award in the SDM 2017. He is the program committee member of WSDM 2021, IJCAI 2020, KDD 2019/2020, ICKG 2020, and CIKM 2019/2020. He is the reviewer of many international journals.
Peng Cui is an Associate Professor with tenure in Tsinghua University. He got his PhD degree from Tsinghua University in 2010. His research interests include social dynamic modeling, network representation learning, as well as causal inference and stable prediction. He has published more than 100 papers in prestigious conferences and journals in data mining and multimedia. His recent research got 5 paper awards from top-level international conferences and journals. He is the Associate Editors of IEEE TKDE, IEEE TBD, ACM TIST and ACM TOMM etc. He was the recipient of CCF-IEEE CS Young Scientist Award and ACM China Rising Star Award.
Yuxiao Dong is a Senior Applied Scientist at Microsoft Research, Redmond. He received his Ph.D. from University of Notre Dame and has been a visiting scholar at Tsinghua University, U.S. Army Research Lab, and AMiner.org. His research focuses on data mining, network science, and applied machine learning, with an emphasis on applying computational models to addressing problems in large-scale graph systems. His work on network representation leaning are the most cited papers in KDD’17 and WSDM’18, respectively (as of March 2019). In addition, his research also won the 2017 ACM SIGKDD Doctoral Dissertation Award Honorable Mention. He served as co-chair of the inaugural KDD 2018 Deep Learning Day (attracting over 1000 attendees) as well as its KDD 2019 edition, and co-organized the network representation learning workshops at WWW 2019, 2018, and 2017.
Jundong Li is an Assistant Professor in the Department of Electrical and Computer Engineering, with a joint appointment in the Department of Computer Science, and School of Data Science. He received his Ph.D. degree in Computer Science at Arizona State University in 2019, M.Sc. degree in Computer Science at University of Alberta in 2014, and B.Eng. degree in Software Engineering at Zhejiang University in 2012. His research interests are in data mining, machine learning, and social computing. As a result of his research work, he has published more than 50 papers in high-impact venues (including KDD, WWW, IJCAI, AAAI, WSDM, CIKM, ICDM, SDM, ECML-PKDD, CSUR, TKDD, TIST, etc), with over 1,000 citation count. He also leads the development of an open-source feature selection repository (scikit-feature) which has been reported by several news articles and blogs. He regularly serves on program committees for major international conferences and reviews for multiple journals.
Huan Liu is a professor of Computer Science and Engineering at Arizona State University. He obtained his Ph.D. in Computer Science at University of Southern California and B.Eng. in Computer Science and Electrical Engineering at Shanghai JiaoTong University. Before he joined ASU, he worked at Telecom Australia Research Labs and was on the faculty at National University of Singapore. He was recognized for excellence in teaching and research in Computer Science and Engineering at Arizona State University. His research interests are in data mining, machine learning, social computing, and artificial intelligence, investigating problems that arise in many real-world, data-intensive applications with high-dimensional data of disparate forms such as social media. His well-cited publications include books, book chapters, encyclopedia entries as well as conference and journal papers. He serves on journal editorial boards and numerous conference program committees, and is a founding organizer of the International Conference Series on Social Computing, Behavioral-Cultural Modeling, and Prediction. He is an ACM Fellow, an AAAI Fellow, an AAAS Fellow, and an IEEE Fellow.
Jian Pei is a Professor in the School of Computing Science, Simon Fraser University, Canada. He is renowned for his creative and productive research in the general areas of data science, big data, data mining, and database systems. He has published prolifically and his publications have been cited over 84 thousand times. He is recognized as an ACM Fellow and an IEEE Fellow and several prestigious awards. In his recent leave-of-absence from the university, he acted as a Vice President of JD.com and a Technical VP of Huawei Technology. He has extensive experience in industry R&D, strategy consulting, and business operation, particularly in the areas of enterprise data platform, health-informatics, supply chain, and fintech.
Le Song is a Principal Engineer of Ant Financial, an Associate Professor in the College of Computing, and an Associate Director of the Center for Machine Learning, Georgia Institute of Technology. Before he joined Georgia Institute of Technology in 2011, he was postdoc in the Department of Machine Learning, Carnegie Mellon University, and a research scientist at Google. His principal research direction is machine learning, especially nonlinear models, such as kernel methods and deep learning, probabilistic graphical models, and optimization. He is the recipient of the NSF CAREER Award14, and many best paper awards, including the NIPS17 Materials Science Workshop Best Paper Award, the Recsys16 Deep Learning Workshop Best Paper Award, AISTATS’16 Best Student Paper Award, IPDPS’15 Best Paper Award, NIPS13 Outstanding Paper Award, and ICML10 Best Paper Award. He served as the area chair or senior program committee for many leading machine learning and AI conferences such as ICML, NIPS, AISTATS, AAAI and IJCAI, and the action editor for JMLR and IEEE TPAMI.
Jie Tang is the Full Professor and the vice chair of the Department of Computer Science and Technology at Tsinghua University. He was also visiting scholar at Cornell University, Hong Kong University of Science and Technology, and Southampton University. His interests include social network analysis, data mining, and machine learning. He has published more than 200 journal/conference papers and holds 20 patents, attracting more than 10,000 citation counts. He served as PC Co-chair of CIKM’16 and WSDM’15, Associate General Chair of KDD 2018, and acting Editor-in-Chief of ACM TKDD. He leads the project AMiner.org for academic social network analysis and mining, which has attracted more than 8 million independent IP accesses from 220 countries/regions in the world. He was honored with UK Royal Society-Newton Advanced Fellowship Award, NSFC Distinguished Young Scholar, and 2018 SIGKDD Service Award.
Fei Wang is an Associate Professor in Division of Health Informatics, Department of Healthcare Policy and Research, Weill Cornell Medicine, Cornell University. His major research interest is data mining, machine learning and their applications in health data science. He has published more than 200 papers on the top journals and conferences of related areas. His papers have received over 8,400 citations so far with an H-index 48. His (or his students’) papers won 6 best paper awards in international academic conference. He also won the NIPS/Kaggle Challenge on Classification of Clinically Actionable Genetic Mutations in 2017 and Parkinson's Progression Markers' Initiative data challenge organized by Michael J. Fox Foundation in 2016. He is the recipient of the NSF CAREER Award. He is also the chair of the KDDM working group in AMIA. He is the general co-chair of ICHI 2018 track chair for Medinfo 2017 and program co-chair for CHASE 2018 and ICHI 2015. His research has been supported by NSF, NIH, ONR, NMRC, PCORI, MJFF, AHA and industries such as Amazon. He is an action editor of the journal Data Mining and Knowledge Discovery, an associate editor of IEEE Transactions on Neural Networks and Learning Systems, Journal of Health Informatics Research, Smart Health, Pattern Recognition, Knowledge and Information Systems. He has applied more than 40 US patents, among which 15 are granted.
Hongxia Yang is working as the Senior Staff Data Scientist and Director in Alibaba Group. Her interests span the areas of Bayesian statistics, time series analysis, spatial-temporal modeling, survival analysis, machine learning, data mining and their applications to problems in business analytics and big data. Current on-going projects in her team include huge dynamic multi-level heterogeneous graphical model for user profiling system, large-scale distributed knowledge graph and its efficient inference for data enabling platform and general ensemble prediction framework for various revenue and costs forecasting, among several others. She used to work as the Principal Data Scientist at Yahoo! Inc and Research Staff Member at IBM T.J. Watson Research Center respectively and got her PhD degree in Statistics from Duke University in 2010. She has published over 40 top conference and journal papers and held 9 filed/to be filed US patents and is serving as the associate editor for Applied Stochastic Models in Business and Industry. She has been been elected as an Elected Member of the International Statistical Institute in 2017.
Wenwu Zhu is with Computer Science Department of Tsinghua University as Professor of "1000 People Plan" of China. Prior to his current post, he was a Senior Researcher and Research Manager at Microsoft Research Asia. He was the Chief Scientist and the Director at Intel Research China from 2004 to 2008. He worked at Bell Labs New Jersey as Member of Technical Staff during 1996-1999. He is an IEEE Fellow, SPIE Fellow and ACM Distinguished Scientist. He has published over 200 referred papers in the areas of multimedia computing, communications and networking. He is inventor or co-inventor of over 40 patents. His current research interests are in the area of social media computing and multimedia communications and networking. He served(s) on various editorial boards, such as Guest Editor for the Proceedings of the IEEE, IEEE T-CSVT, and IEEE JSAC; Associate Editor for IEEE Transactions on Mobile Computing, IEEE Transactions on Multimedia, and IEEE Transactions on Circuits and Systems for Video Technology. He served as TPC co-chair of IEEE ISCAS 2013 and serves as TPC Co-chair for ACM Multimedia 2014.