Keynote Speakers:

Timos Sellis, Professor, RMIT University, Australia
DATA ECOSYSTEMS: FROM VERY LARGE DATA BASES TO BIG DATA INFRASTRUCTURES
9am Tuesday 15 July
Abstract: Data ecosystems involve the coexistence of one or more data collections, typically databases, and their surrounding applications for data entry and retrieval. For decades, both data and ecosystem management have failed to address significant, costly and labor-consuming challenges which involve (a) the departure from databases focusing on alphanumeric data only, (b) their inability to be integrated and provide transparent access and composition facilities for heterogeneous data, (c) their static querying nature, which is deprived of personal, context-aware or interactive characteristics, (d) the enforcement of DBMS operation over monolithic servers, and, (e) the complete indifference to problems of evolution and adaptation over time. In this talk we address issues around the methodologies, the theoretical and modeling foundations as well as the algorithmic techniques and the necessary software architectures that will facilitate the personalization, integration, and evolution management facilities for data ecosystems that operate over a decentralized infrastructure for a large variety of data types.
About the Speaker: Timos Sellis received his diploma degree in Electrical Engineering in 1982 from the National Technical University of Athens (NTUA), Greece. In 1983 he received the M.Sc. degree from Harvard University and in 1986 the Ph.D. degree from the University of California at Berkeley, both in Computer Science. In 1986, he joined the Department of Computer Science of the University of Maryland, College Park as an Assistant Professor, and became an Associate Professor in 1992. Between 1992 and 1996 he was an Associate Professor at NTUA, where he served as a Professor till January 2013. He is currently a Professor at the School of Computer Science and Information Technology of RMIT University in Australia. Prof. Sellis was also the Director of a new research institute he founded in Greece, the Institute for the Management of Information Systems (IMIS) of the "Athena" Research Center (www.imis.athena-innovation.gr) between 2007 and 2012. His research interests include big data, data streams, personalization, data integration, and spatio- temporal database systems. He has published over 200 articles in refereed journals and international conferences in the above areas, has over 10.000 citations to his work and has been invited speaker in major international events. He has also participated and co-ordinated several national and european research projects. Prof. Sellis is a recipient of the prestigious Presidential Young Investigator (PYI) award given by the President of USA to the most talented new researchers (1990), and of the VLDB 1997 10 Year Paper Award in 1997 (awarded to the paper published in the proceedings of the VLDB 1987 conference that had the biggest impact in the field of database systems in the decade 1987-97). He was the president of the National Council for Research and Technology of Greece (2001-2003). In November 2009, he was awarded the status of IEEE Fellow, for his contributions to database query optimization, and spatial data management, and in November 2013 the status of ACM Fellow, for his contributions to database query optimization, spatial data management, and data warehousing.

Divesh Srivastava, AT&T Labs-Research, USA
SELECTING SOURCES WISELY FOR INTEGRATION
9am Wednesday 16 July
Abstract: Data integration is a challenging task due to the large numbers of autonomous data sources, which necessitates the development of techniques to reason about the costs and benefits of acquiring and integrating data. Too many sources can result in a huge integration cost, and low quality sources can be detrimental to the benefit of integration. In this talk, Dr Srivastava presents the problem of source selection, that is, identifying the subset of sources before integration that maximize the profit (benefit - cost) of integration, for static and dynamic sources. To address this problem, he proposes techniques that, inspired by the marginalism principle in economic theory, integrate a source only if its marginal benefit is higher than its marginal cost. He quantifies the integration benefit in terms of the quality of the integrated data, which is characterized using a set of data quality metrics, including coverage, freshness and accuracy, and develop statistical models for estimating these metrics. Although source selection is NP-complete, he shows that for many practical cases solutions to this problem can be found in polynomial time with approximation guarantees. Finally, he empirically establishes the effectiveness and scalability of his techniques on real-world and synthetic data.
About the Speaker: Divesh Srivastava is the head of the Database Research Department at AT&T Labs-Research. He received his Ph.D. from the University of Wisconsin, Madison, and his B.Tech from the Indian Institute of Technology, Bombay. He is a Fellow of the ACM, on the board of trustees of the VLDB Endowment, and an associate editor of the ACM Transactions on Database Systems. He has served as the associate Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering, and the program committee co-chair of many conferences, including VLDB 2007. He has presented keynote talks at several conferences, including VLDB 2010. His research interests span a variety of topics in data management.

Seminar Speakers:

ChengXiang Zhai, University of Illinois at Urbana-Champaign, USA
STATISTICAL METHODS FOR MINING BIG TEXT DATA
9am Monday 14 July
Abstract: Text data, broadly including all kinds of natural language text produced by humans (e.g., web pages, social media, email messages, news articles, government documents, and scientific literature), have been growing dramatically recently. This creates great opportunities for applying computational methods to mine large amounts of text data to discover all kinds of useful knowledge, especially knowledge about people's opinions, preferences, and behavior. Due to the difficulty in precisely understanding natural language by computers, scalable text mining algorithms tend to be based on statistical analysis and probabilistic reasoning. In this tutorial, I will systematically review the major statistical methods developed for mining text data, with a focus on covering probabilistic topic models for mining topics and topical patterns in text data, and statistical methods for integrating and analyzing scattered online opinions.
About the Speaker: ChengXiang Zhai is a Professor of Computer Science at the University of Illinois at Urbana-Champaign, where he also holds a joint appointment at the Institute for Genomic Biology, Statistics, and the Graduate School of Library and Information Science. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and a Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, and bioinformatics. He is an Associate Editor of ACM Transactions on Information Systems, and Information Processing and Management, and serves on the editorial board of Information Retrieval Journal. He is a program co-chair of ACM CIKM 2004 , NAACL HLT 2007, and ACM SIGIR 2009. He is an ACM Distinguished Scientist, and received the 2004 Presidential Early Career Award for Scientists and Engineers (PECASE), the ACM SIGIR 2004 Best Paper Award, an Alfred P. Sloan Research Fellowship in 2008, and an IBM Faculty Award in 2009.

Lei Chen, Hong Kong University of Science & Technology
CROWD SOURCING OVER BIG DATA, ARE WE THERE YET?
1.30pm Monday 14 July
Abstract: Recently, the popularity of crowdsourcing has brought a new opportunity to engage human intelligence into various data analysis tasks. Compared with computer systems, crowds are good at handling items with human-intrinsic values or features. Existing approaches develop sophisticated methods by utilizing the crowd as a new type of processor, a.k.a. HPU (Human Processing Unit). As a consequence, tasks executed on HPU are called HPU-based tasks. Now we are in the Big Data Era, a nature question arises: How about crowdsourcing over Big Data, are we there yet? In this talk, I will first briefly review the history of crowdsourcing and discuss the key issues related to crowdsourcing. Then, I will demonstrate the power of crowdsourcing in solving the well-known and very hard data integration problem, schema matching, and discuss how to migrate the power of crowdsourcing to a social media platform whose users can serve as a huge reservoir of workers. Finally, I will highlight some research challenges about crowdsourcing over Big Data.
About the Speaker: Lei Chen received the BS degree in computer science and engineering from Tianjin University, Tianjin, China, in 1994, the MA degree from Asian Institute of Technology, Bangkok, Thailand, in 1997, and the PhD degree in computer science from the University of Waterloo, Waterloo, Ontario, Canada, in 2005. He is currently an associate professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. His research interests include crowdsourcing on social media, social media analysis, probabilistic and uncertain databases, and privacy-preserved data publishing. So far, he published nearly 200 conference and journal papers. He got the best paper awards in DASFAA 2009 and 2010. He is PC Track chairs for VLDB 2014, ICDE 2012, CIKM 2012, SIGMM 2011. He has served as PC members for SIGMOD, VLDB, ICDE, SIGMM, and WWW. Currently, he serves as an associate editor for IEEE Transaction on Data and Knowledge Engineering and Distribute and Parallel Databases. He is a member of the ACM and the chairmans of ACM Hong Kong Chapter and ACM SIGMOD Hong Kong Chapter.

Jeffrey Xu Yu, Chinese University of Hong Kong
LARGE GRAPH PROCESSING
3.30pm Monday 14 July
Abstract: The real applications that need graph processing techniques to handle a large graph can be found from many real applications including online social networks, biological networks, ontology, transportation networks, etc. In this talk, we will discuss some selected research topics on graph mining and graph query processing over large graphs. For graph mining, we will focus on ranking nodes in a large graph. We will discuss ranking over trust networks, random-walk domination, and diversified ranking. For ranking nodes over trust network, we discuss how to take the trust score into consideration while ranking. For the random-walk domination, we discuss the techniques for handling item-placement in online social networks and ads-placement in advertisement networks. For diversified ranking, we discuss how to find top-k nodes that match the user query and are very different from each other. For graph query processing, we will discuss top-k structural diversity search, finding the maximal cliques in massive networks, and I/O efficient computing techniques that make a large directed graph small and simple. The other related topics may be also addressed in this talk.
About the Speaker: Dr Jeffrey Xu Yu is a Professor in the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. His current main research interests include keywords search in relational databases, graph mining, graph query processing, and graph pattern matching. Dr. Yu served/serves in over 280 organization committees and program committees in international conferences/workshops including the PC Co-chair of APWeb’04, WAIM’06, APWeb/WAIM’07, WISE’09, PAKDD’10, DASFAA’11, ICDM’12, and NDBC'13, and Conference Co-chair of APWeb'13. Dr. Yu served as an Information Director and a member in ACM SIGMOD executive committee (2007-2011), an associate editor of IEEE Transactions on Knowledge and Data Engineering (2004-2008) and VLDB Journal (2007-2013). Currently he servers as an associate editor in WWW Journal, the International Journal of Cooperative Information Systems, the Journal of Information Processing, and Journal on Health Information Science and Systems. Jeffrey Xu Yu is a member of ACM, a senior member of IEEE, and a member of IEEE Computer Society.