Tutorials

TUTORIAL 1: Fairness in Rankings and Recommenders

Abstract

This tutorial pays special attention to the concept of fairness in rankings and recommender systems. By fairness, we typically mean lack of bias. It is not correct to assume that insights achieved via computations on data are unbiased simply because data was collected automatically or processing was performed algorithmically. Bias may come from the algorithm, reflecting, for example, commercial or other preferences of its designers, or even from the actual data, for example, if a survey contains biased questions. In this tutorial, we review a number of definitions of fairness that aim at addressing discrimination, bias amplification, and ensure transparency. We organize these definitions around the notions of individual and group fairness. We also present methods for achieving fairness in rankings and recommendations, taking a cross-type view, distinguishing them between pre-processing, in-processing and post-processing approaches. We conclude with a discussion on new research directions that arise.

Presenters

Evaggelia Pitoura is a Professor at the Univ. of Ioannina, Greece, where she also leads the Distributed Management of Data Laboratory. She received her PhD degree from Purdue Univ., USA. Her research interests are in the area of data management systems with a recent emphasis on social networks and responsible data management. Her publications include more than 150 articles in international journals (including TODS, TKDE, PVLDB) and conferences (including SIGMOD, ICDE) and a highly-cited book on mobile computing. She has served or serves on the editorial board of ACM TODS, VLDBJ, TKDE and as a group leader, senior PC member, or co-chair of many international conferences.

Georgia Koutrika is Research Director at Athena Research Center in Greece. She has more than 15 years of experience in multiple roles at HP Labs, IBM Almaden, and Stanford, building innovative solutions for recommendations, data analytics and exploration. Her work has been incorporated in commercial products, described in 9 granted patents and 18 patent applications in the US and worldwide, and published in more than 80 papers in top-tier conferences and journals. She is an ACM Distinguished Speaker and associate editor for TKDE and PVLDB. She has served or serves as PC member or co-chair of many conferences.

Kostas Stefanidis is an Associate Professor on Data Science at the Tampere University, Finland. He got his PhD in personalized data management from the Univ. of Ioannina, Greece. His research interests lie in the intersection of databases, information retrieval and the Web, and include personalization and recommender systems, and large-scale entity resolution and information integration. His publications include more than 80 papers in peer-reviewed conferences and journals, including SIGMOD, ICDE, and ACM TODS, and a book on entity resolution in the Web of data.

Further information and tutorial materials

TUTORIAL 2: Resolution: Past, Present and Yet-to-Come - From Structured to Heterogeneous, to Crowd-sourced, to Deep Learned

Abstract

Entity Resolution (ER) lies at the core of data integration, with a bulk of research focusing on its effectiveness and its time efficiency. Most past relevant works were crafted for addressing Veracity over structured (relational) data. They typically rely on schema, expert and external knowledge to maximize accuracy. Part of these methods have been recently extended to process large volumes of data through massive parallelization techniques, such as the MapReduce paradigm. With the present advent of Big Web Data, the scope moved towards Variety, aiming to handle semi-structured data collections, with noisy and highly heterogeneous information. Relevant works adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on Velocity, i.e., processing data collections of a continuously increasing volume.

In this tutorial, we present the ER generations by discussing past, present, and yet-to-come mechanisms. For each generation, we outline the corresponding ER workflow along with the state-of-the-art methods per workflow step. Thus, we provide the participants with a deep understanding of the broad field of ER, highlighting the recent advances in crowd-sourcing and deep learning applications in this active research domain. We also equip them with practical skills in applying ER workflows through a hands-on session that involves our publicly available ER toolbox and data.

Presenters

George Papadakisis an internal auditor of information systems and a research fellow at the University of Athens. He also worked at the NCSR "Demokritos", National Technical University of Athens (NTUA), L3S Research Center and "Athena" Research Center. He holds a PhD in Computer Science from University of Hanover and a Diploma in Computer Engineering from NTUA. His research focuses on web data mining.

Ekaterini Ioannou is an Assistant Professor at Tilburg University, the Netherlands. Prior, she worked as an Assistant Professor at Eindhoven University of Technology, as a Lecturer at the Open University of Cyprus, an adjunct faculty at EPFL in Switzerland, a research collaborator at the Technical University of Crete, and as an Independent Expert for the European Commission. Her research focuses on information integration with an emphasis on the challenges of managing data with uncertainties, heterogeneity or correlations, and, more recently, on achieving a deeper integration of information extraction tasks within databases, and on efficiently retrieving analytics over graphs/hypergraphs with evolving data.

Themis Palpanas is Senior Member of the French University Institute (IUF), and Professor of computer science at the University of Paris (France), where he is the director of diNo, the data management group. He is the author of nine US patents, three of which have been implemented in world-leading commercial data management products. He is the recipient of three Best Paper awards, and the IBM Shared University Research (SUR) Award. He is serving as Editor in Chief for BDR Journal, Associate Editor for PVLDB 2019 and TKDE journal, and Editorial Advisory Board member for IS journal.

Further information and tutorial materials

TUTORIAL 3: Declarative Languages for Big Streaming Data

Abstract

The Big Data movement proposes data streaming systems to tame velocity and to enable reactive decision making. However, approaching such systems is still too complex due to the paradigm shift they require, i.e., moving from scalable batch processing to continuous data analysis and pattern detection. Recently, declarative Languages are playing a crucial role in fostering the adoption of Stream Processing solutions. In particular, several key players introduce SQL extensions for stream processing. These new languages are currently playing a central role in fostering the stream processing paradigm shift. In this tutorial, we give an overview of the various languages for declarative querying interfaces big streaming data. To this extent, we discuss how the different Big Stream Processing Engines (BigSPE) interpret, execute, and optimize continuous queries expressed with SQL-like languages such as KSQL, Flink-SQL, and Spark SQL. Finally, we present the open research challenges in the domain.

Presenters

Riccardo Tommasini is a research fellow at the University of Tartu, Estonia. Riccardo did his PhD at the Department of Electronics and Information of the Politecnico di Milano. His thesis, titled "Velocity on the Web", investigates the velocity aspects that concern the Web environment. His research interests span Stream Processing, Knowledge Graphs, Logics and Programming Languages. Riccardo's tutorial activities comprise Stream Reasoning Tutorials at ISWC 2017, ICWE 2018, ESWC 2019, and TheWebConf 2019, and DEBS 2019.

Sherif Sakr is the Head of Data Systems Group at the Institute of Computer Science, University of Tartu, Estonia. He received his PhD degree in Computer and Information Science from Konstanz University, Germany in 2007. He is currently the Editor-in-Chief of the Springer Encyclopedia of Big Data Technologies. His research interest include data and information management, big data processing systems, big data analytics and data science. Prof. Sakr has published more than 150 research papers in international journals and conferences. He delivered several tutorials in various conferences including WWW'12, IC2E'14, CAiSE'14, EDBT Summer School 2015, . The 2nd ScaDS International Summer School on Big Data 2016, The 3rd Keystone Training School on Keyword search in Big Linked Data 2017, DEBS 2019 and ISWC 2019.

Emanuele Della Valle is an Assistant Professor at the Department of Electronics and Information of the Politecnico di Milano. His research interests covered Big Data, Stream Processing, Semantic technologies, Data Science, Web Information Retrieval, and Service Oriented Architectures. His work on Stream Reasoning research filed was applied in analysing Social Media, Mobile Telecom and IoT data streams in collaboration with Telecom Italia, IBM, Siemens, Oracle, Indra, and Statoil. Emanuele presented several Stream Reasoning related tutorials at SemTech 2011, ESWC 2011, ISWC 2013, ESWC 2014, ISWC 2014, ISWC 2015, ISWC 2016, DEBS 2016, ISWC 2017 and KR 2018.

Hojjat Jafarpour is a Software Engineer and the creator of KSQL at Confluent. Before joining Confluent he has worked at NEC Labs, Informatica, Quantcast and Tidemark on various big data management projects. Hojjiat earned his PhD in computer science from UC Irvine, where he worked on scalable stream processing and publish/subscribe systems.

Further information and tutorial materials

TUTORIAL 4: NoSQL Schema Evolution and Data Migration: State-of-the-Art and Opportunities

Abstract

NoSQL database systems are very popular in agile software development. Naturally, agile deployment goes hand-in-hand with database schema evolution. The main aim of this tutorial is to present to the audience the current state-of-the-art in continuous NoSQL schema evolution and data migration. We present case studies on schema evolution in NoSQL databases and survey existing approaches to schema management and schema inference, as implemented in popular NoSQL database products, and also as proposed in academic research. Further, we present approaches for extracting schema versions and analyze different methods for efficient NoSQL data migration. Finally, open research questions and further research opportunities are represented.

Presenters

Uta Störl is a professor at Darmstadt University of Applied Sciences. Her research focuses on database technologies for Big Data and Data Science. Before, she worked for Dresdner Bank.

Meike Klettke is a professor for Data Science at the University of Rostock. She works on database evolution and reverse engineering of databases.

Stefanie Scherzinger is a professor at the University of Passau. Her research is influenced by her experience as a software engineer at IBM and Google.

Further information and tutorial materials