2014 ADC PhD School in Big Data
12- 14 July 2014
St Leo’s College, St Lucia Campus, The University of Queensland, Brisbane

The 2014 ADC PhD School in Big Data is scheduled from 12-14 July between the ACM SIGIR conference (Gold Coast) and the Australasian Database Conference (Brisbane).

The ADC PhD School is targeted at research students interested in exploring current research topics in the area of Big Data. It also provides a foundational overview of writing and research skills required to undertake a PhD, an industry-hosted technology session, and a student research poster session. Leading computer science professors will attend the PhD School to share strategies on how to approach PhD study, as well as provide concrete advice on the direction of individual delegate research.

The ADC PhD School aims to bring research students together with experienced national and international researchers to share and discuss their research, and to start developing ideas and concepts into research papers for submission to major international journals and conferences. The School will provide opportunities for students to receive high-quality feedback, and to exchange ideas and explore possible cooperation with different research groups.

Program:

The PhD School program runs for 3 days, commencing at 9am Saturday 12 July and close at 5pm Monday 14 July 2014.
Follow this link to the final program.

Eligibility:

Current research students, as well as interested researchers and academic staff, are eligible to attend.

Confirmed Speakers and Topics:

  • 1- Big Data Integration - Dr Divesh Srivastava, AT&T Labs-Research, USA
  • Abstract: The Big Data era is upon us: data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of Big Data. DI differs from traditional data integration in many dimensions: (i) the number of data sources, even for a single domain, has grown to be in the tens of thousands, (ii) many of the data sources are very dynamic, as a huge amount of newly collected data are continuously made available, (iii) the data sources are extremely heterogeneous in their structure, with considerable variety even for substantially similar entities, and (iv) the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This tutorial explores the progress that has been made by the data integration community on the topics of schema mapping, record linkage and data fusion in addressing these novel challenges faced by big data integration, and identifies a range of open problems for the community.

  • 2- Analysing Big Trajectory Data: Theory, Algorithms and Applications – Dr Kai Zheng, The University of Queensland
  • Abstract: The prevalence of GPS sensors and mobile devices has enabled tracking the movements of almost any kind of moving objects such as vehicles, humans and animals. As a result, in the past decade we have witnessed unprecedented increase of trajectory data both in volume and variety. With some attributes such as variable lengths, uncontrolled quality, high redundancy and uncertainty and so on, trajectory data challenge the traditional methodologies and practices in many research areas including data storage and indexing, data mining and analytics, information retrieve, etc. Trajectory data management has been attracting numerous research interests from both academia and industry due to its tremendous value and benefits in a variety of critical applications like traffic analysis, fleet management, trip planning, location-based recommendation, etc. In this tutorial, we will talk about the challenges, techniques and open problems with the focus on similarity-based analytics, the foundation of trajectory management, and covering a range of topics from fundamental theory, algorithms to advanced applications.

  • 3- Boosting Methods in Machine Learning – Dr Chunhua Shen, University of Adelaide
  • Abstract: Many machine learning and data mining tasks favour fast and yet accurate classification methods. The classification speed is not only a matter of time-efficiency but is often crucial to achieve good accuracy. Standard kernel machines such as Support Vector Machine (SVM) are slow and methods for rapid classification have been pursued. Boosting classifiers have been so successful owing to its fast computation and yet comparable or sometimes better accuracy to kernel methods, being a standard method in many areas. Boosting as a representative ensemble learning method, which aggregates simple weak learners, can be seen as a flat tree structure when each learner is a decision-stump. When trees are used as weak learners, boosting methods learn a linearly weighted decision forest. We will overview the fundamental theory of boosting in the first part of this course.
    Recently, structured learning has found many applications in text analysis and computer vision. Thus far it has not been clear how one can train a boosting model that is directly optimised for predicting multivariate or structured outputs. To bridge this gap, inspired by structured support vector machines, a boosting algorithm for structured output prediction is introduced, which we refer to as StructBoost. StructBoost supports nonlinear structured learning by combining a set of weak structured learners. As structured SVM generalises SVM, the StructBoost generalises standard boosting approaches such as AdaBoost, or LPBoost to structured learning. The resulting optimization problem of StructBoost is more challenging than Structured SVM in the sense that it may involve exponentially many variables and constraints. In contrast, for Structured SVM one usually has an exponential number of constraints and a cutting-plane method is used. In order to efficiently solve StructBoost, we formulate an equivalent 1-slack formulation and solve it using a combination of cutting planes and column generation. We show the versatility and usefulness of StructBoost on a range of problems.

  • 4- Big Data Mining on SAP HANA - Dr Asadul Islam, SAP
  • Abstract: This talk will cover how the state-of-the-art in-memory computing technology, SAP HANA brings Big Data mining a reality. The talk will cover how the theories in data mining and machine learning are implemented and used in SAP HANA. In particular, Dr. Asadul will share his extensive experience in various data mining scenarios including predictive analysis on big business data, discussing about hidden insights when dealing with complex algorithms on extremely large data.

  • 5- Statistical Methods for Mining Big Text Data - Prof Chengxiang Zhai, University of Illinois Urbana-Champaign, USA
  • Abstract:Text data, broadly including all kinds of natural language text produced by humans (e.g., web pages, social media, email messages, news articles, government documents, and scientific literature), have been growing dramatically recently. This creates great opportunities for applying computational methods to mine large amounts of text data to discover all kinds of useful knowledge, especially knowledge about people's opinions, preferences, and behavior. Due to the difficulty in precisely understanding natural language by computers, scalable text mining algorithms tend to be based on statistical analysis and probabilistic reasoning. In this tutorial, I will systematically review the major statistical methods developed for mining text data, with a focus on covering probabilistic topic models for mining topics and topical patterns in text data, and statistical methods for integrating and analyzing scattered online opinions.

  • 6- Crowdsourcing over Big Data, are we there yet? – Dr Lei Chen, Hong Kong University of Science & Technology
  • Abstract:Recently, the popularity of crowdsourcing has brought a new opportunity to engage human intelligence into various data analysis tasks. Compared with computer systems, crowds are good at handling items with human-intrinsic values or features. Existing approaches develop sophisticated methods by utilizing the crowd as a new type of processor, a.k.a. HPU (Human Processing Unit). As a consequence, tasks executed on HPU are called HPU-based tasks. Now we are in the Big Data Era, a nature question arises: How about crowdsourcing over Big Data, are we there yet? In this talk, I will first briefly review the history of crowdsourcing and discuss the key issues related to crowdsourcing. Then, I will demonstrate the power of crowdsourcing in solving the well-known and very hard data integration problem, schema matching, and discuss how to migrate the power of crowdsourcing to a social media platform whose users can serve as a huge reservoir of workers. Finally, I will highlight some research challenges about crowdsourcing over Big Data.

  • 7- Large Graph Processing – Prof Jeffrey Yu, Chinese University of Hong Kong
  • Abstract:The real applications that need graph processing techniques to handle a large graph can be found from many real applications including online social networks, biological networks, ontology, transportation networks, etc. In this talk, we will discuss some selected research topics on graph mining and graph query processing over large graphs. For graph mining, we will focus on ranking nodes in a large graph. We will discuss ranking over trust networks, random-walk domination, and diversified ranking. For ranking nodes over trust network, we discuss how to take the trust score into consideration while ranking. For the random-walk domination, we discuss the techniques for handling item-placement in online social networks and ads-placement in advertisement networks. For diversified ranking, we discuss how to find top-k nodes that match the user query and are very different from each other. For graph query processing, we will discuss top-k structural diversity search, finding the maximal cliques in massive networks, and I/O efficient computing techniques that make a large directed graph small and simple. The other related topics may be also addressed in this talk.


Research Poster Session:

The ADC PhD School program will also include a Research Poster Session where delegates may discuss their research with other students, as well as with leading computer science professors who will provide concrete advice on the direction of individual student research.
Delegates are requested to display their current research in either an A2 or A3 size poster during the event. Either portrait or landscape orientation is acceptable. Please bring your research poster with you to the School.

Important Dates:

  • • 21 April 2014: Final Program Available.
  • • 28 April 2014: Registration Opens.
  • • 12-14 July 2014: ADC PhD School.

Registration:

Prices below include GST
EARLY REGISTRATION ON/BEFORE 30 JUNE
Student – ADC + PhD School $550
Student – PhD School only $275
Academic – ADC + PhD School $715
Academic – PhD School only $330
LATE REGISTRATION AFTER 30 JUNE
Student – ADC + PhD School $660
Student – PhD School only $330
Academic – ADC + PhD School $770
Academic – PhD School only $385

  • PhD School only: Includes morning/afternoon teas, lunch and banquet.
  • PhD School + ADC Conference: Includes conference proceedings, morning/afternoon teas, lunch, reception and banquet.

Online registration is now open. Attendees may register using the following link: ONLINE REGISTRATION SYSTEM
For any registration-related issues, please send an email to the following address: bigk@itee.uq.edu.au


Accommodation and Travel:

Delegates are required to organize their own accommodation and travel.

Accommodation at St Leo’s College:

The PhD School will be held at St Leo’s College. Please contact St Leo’s directly re. accommodation available:
Phone: +61 7 3878 0600
Fax: +61 7 3878 0620
Email: enquiries@stleos.uq.edu.au

Links to alternative accommodation below:
On Campus Accommodation
Off Campus Accommodation

PhD School Convenor:

Prof Heng Tao Shen
Data & Knowledge Engineering Research Group
School of Information Technology & Electrical Engineering
The University of Queensland | Brisbane QLD 4072
Tel: 07 3365 8359 | Fax: 07 3365 3248
Email: shenht@itee.uq.edu.au | Web: staff.itee.uq.edu.au/shenht

PhD School Administration:

Kathleen Williamson
Data & Knowledge Engineering Research Group
School of Information Technology & Electrical Engineering
The University of Queensland | Brisbane QLD 4072
Tel: 07 3365 1649 | Mobile: 0401 477 509 | Fax: 07 3365 3248
Email: bigk@itee.uq.edu.au | Web: itee.uq.edu.au/dke
Hours: Mon - Wed 8.00am - 4.30pm