Conference Program:

The ADC-14 program-at-a-glance is available. Detailed program below:

Monday 14 July - SEMINARS
Location: St Leo’s College, The University of Queensland


8am – 9pm: Welcome Tea

9am – 12.30pm
Seminar 1 [3 hrs]: Statistical Methods for Mining Big Text Data
ChengXiang Zhai, University of Illinois at Urbana-Champaign, USA

Text data, broadly including all kinds of natural language text produced by humans (e.g., web pages, social media, email messages, news articles, government documents, and scientific literature), have been growing dramatically recently. This creates great opportunities for applying computational methods to mine large amounts of text data to discover all kinds of useful knowledge, especially knowledge about people's opinions, preferences, and behavior. Due to the difficulty in precisely understanding natural language by computers, scalable text mining algorithms tend to be based on statistical analysis and probabilistic reasoning. In this tutorial, I will systematically review the major statistical methods developed for mining text data, with a focus on covering probabilistic topic models for mining topics and topical patterns in text data, and statistical methods for integrating and analyzing scattered online opinions.

1.30 – 3pm
Seminar 2 [1.5 hrs]: Crowd Sourcing Over Big Data, Are We There Yet?
Lei Chen, Hong Kong University of Science & Technology

Recently, the popularity of crowdsourcing has brought a new opportunity to engage human intelligence into various data analysis tasks. Compared with computer systems, crowds are good at handling items with human-intrinsic values or features. Existing approaches develop sophisticated methods by utilizing the crowd as a new type of processor, a.k.a. HPU (Human Processing Unit). As a consequence, tasks executed on HPU are called HPU-based tasks. Now we are in the Big Data Era, a nature question arises: How about crowdsourcing over Big Data, are we there yet? In this talk, I will first briefly review the history of crowdsourcing and discuss the key issues related to crowdsourcing. Then, I will demonstrate the power of crowdsourcing in solving the well-known and very hard data integration problem, schema matching, and discuss how to migrate the power of crowdsourcing to a social media platform whose users can serve as a huge reservoir of workers. Finally, I will highlight some research challenges about crowdsourcing over Big Data.

3.30 – 5pm
Seminar 3 [1.5 hrs]: Large Graph Processing
Jeffrey Xu Yu, Chinese University of Hong Kong

The real applications that need graph processing techniques to handle a large graph can be found from many real applications including online social networks, biological networks, ontology, transportation networks, etc. In this talk, we will discuss some selected research topics on graph mining and graph query processing over large graphs. For graph mining, we will focus on ranking nodes in a large graph. We will discuss ranking over trust networks, random-walk domination, and diversified ranking. For ranking nodes over trust network, we discuss how to take the trust score into consideration while ranking. For the random-walk domination, we discuss the techniques for handling item-placement in online social networks and ads-placement in advertisement networks. For diversified ranking, we discuss how to find top-k nodes that match the user query and are very different from each other. For graph query processing, we will discuss top-k structural diversity search, finding the maximal cliques in massive networks, and I/O efficient computing techniques that make a large directed graph small and simple. The other related topics may be also addressed in this talk.

5 – 7pm RECEPTION
When:5 - 7pm Monday 14 July
Where: St Leo’s College, College Road, The University of Queensland, St Lucia Campus

Tuesday 15 July – CONFERENCE DAY 1
Location: St Leo’s College, The University of Queensland


8 – 8.30am: Welcome Tea
8.30 – 9am: Opening Address & Best Paper Awards
9 – 10.30am
Keynote 1: Data Ecosystems: From Very Large Data Bases to big Data Infrastructure
Timos Sellis, RMIT University, Australia

Data ecosystems involve the coexistence of one or more data collections, typically databases, and their surrounding applications for data entry and retrieval. For decades, both data and ecosystem management have failed to address significant, costly and labor-consuming challenges which involve (a) the departure from databases focusing on alphanumeric data only, (b) their inability to be integrated and provide transparent access and composition facilities for heterogeneous data, (c) their static querying nature, which is deprived of personal, context-aware or interactive characteristics, (d) the enforcement of DBMS operation over monolithic servers, and, (e) the complete indifference to problems of evolution and adaptation over time. In this talk we address issues around the methodologies, the theoretical and modeling foundations as well as the algorithmic techniques and the necessary software architectures that will facilitate the personalization, integration, and evolution management facilities for data ecosystems that operate over a decentralized infrastructure for a large variety of data types.

RESEARCH PAPER PRESENTATIONS

*Denotes Short Papers
11am – 12.30pm
Session 1: Data Analytics

Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution
Banda Ramadan, Peter Christen and Huizhi Liang

OSSM: the OLAP Security Specification Model
Ahmad Altamimi and Todd Eavis

Real-Time Exploration of Multimedia Collections *
Juraj Moško, Tomas Skopal, Tomáš Bartoš and Jakub Lokoc

A Functional Database Representation of Large Sets of Objects *
John Pfaltz, Ratko Orlandic and Christopher Taylor

1.30 – 3pm
Session 2: Spatiotemporal Databases

Efficient Aggregate Farthest Neighbor Query Processing on Road Networks
Haozhou Wang, Kai Zheng, Han Su, Jiping Wang, Shazia Sadiq and Xiaofang Zhou

Efficiently Retrieving Top-k Trajectories by Locations via Traveling Time
Yuxing Han, Lijun Chang and Xuemin Lin, Liping Wang

A Study on the Applications of Emerging Sequential Patterns
Vincent Mwintieru Nofong, Jixue Liu and Jiuyong Li

3.30 – 5pm
Session 3: Data Mining Applications

Scalable Gaussian Process Regression for prediction of Material Properties
Eve Belisle, Zi Huang and Aimen Gheribi

Discovering Collective Group Relationships
S. M. Masud Karim, Lin Liu and Jiuyong Li

XEdge: An Efficient Method for Returning Meaningful Clustered Results Using Multi-granularity Features for XML Keyword Search *
Wenxin Liang, Yuanyuan Gan and Xianchao Zhang

Mining the Association of Multiple Virtual Identities Based on Multi-Agent Interaction *
Le Li, Xiao Weidong, Junyi Xu, Changhua Dai and Ge Bin

6 - 10pm BANQUET
Bus departs St Leo’s College: 5.30pm
Banquet: 6 - 10pm Tuesday 15 July
Where: Summit Restaurant, Sir Samuel Griffith Drive, Brisbane Lookout, Mt Coot-tha
Bus departs Summit Restaurant: 10pm

Wednesday 16 July – CONFERENCE DAY 2
Location: St Leo’s College, The University of Queensland


9 – 10.30am
Keynote 2:Selecting Sources Wisely for Integration
Divesh Srivastava, AT&T Labs-Research, USA

Data integration is a challenging task due to the large numbers of autonomous data sources, which necessitates the development of techniques to reason about the costs and benefits of acquiring and integrating data. Too many sources can result in a huge integration cost, and low quality sources can be detrimental to the benefit of integration. In this talk, Dr Srivastava presents the problem of source selection, that is, identifying the subset of sources before integration that maximize the profit (benefit - cost) of integration, for static and dynamic sources. To address this problem, he proposes techniques that, inspired by the marginalism principle in economic theory, integrate a source only if its marginal benefit is higher than its marginal cost. He quantifies the integration benefit in terms of the quality of the integrated data, which is characterized using a set of data quality metrics, including coverage, freshness and accuracy, and develop statistical models for estimating these metrics. Although source selection is NP-complete, he shows that for many practical cases solutions to this problem can be found in polynomial time with approximation guarantees. Finally, he empirically establishes the effectiveness and scalability of his techniques on real-world and synthetic data.

RESEARCH PAPER PRESENTATIONS

*Denotes Short Papers
11am – 12.30pm
Session 4: Social Networks:
A Negative-Aware and Rating-Integrated Recommendation Algorithm Based on Bipartite Network Projection
Fengjing Yin, Xiang Zhao, Guangxin Zhou, Xin Zhang and Xuemin Lin

Sentiment Analysis on Twitter through Domain-specific Lexicon Expansion
Zhixin Zhou, Xiuzhen Zhang and Mark Sanderson

Semi-Supervised Learning for Detection of Cyberbullying in Social Networks
Vinita Nahar, Sanad Al-Maskari, Xue Li and Chaoyi Pang

1.30 – 3pm
Session 5: Data Processing on Modern Hardware
Efficient Subgraph matching using GPUs
Xiaojie Lin, Rui Zhang, Zeyi Wen, Hongzhi Wang and Jianzhong Qi

Comprehensive Analytics of Large Data Query Processing on Relational Database with SSDs
Keisuke Suzuki, Yuto Hayamizu, Daisaku Yokoyama, Miyuki Nakano and Masaru Kitsuregawa

Split Dictionaries for In-Memory Column Stores in Mixed Workload Environments *
David Schwalb, Markus Dreseler, Martin Faust, Johannes Wust and Hasso Plattner

An Effective Approach to Handling Noise and Drift in Electronic Noses *
Sanad Al Maskari, Xue Li and Qihe Liu

3.30 – 5pm
Session 6: Data Mining Algorithms
Mining Differential Dependencies: A Subspace Clustering Approach
Selasi Kwashie, Jixue Liu, Jiuyong Li and Feiyue Ye

Fast Information-Theoretic Agglomerative Co-Clustering
Tiantian Gao and Leman Akoglu

Logics for Representing Data Mining Tasks in Inductive Databases *
Hong-Cheu Liu, Millist Vincent, Jixue Liu and Jiuyong Li