发布时间： 2017-11-02 17:02 访问人数: 2293人
Title:Networking for Big Data: Traffic-aware Geo-distributed Big Data Analytics
Abstract:Big data generated from everything around us at an unprecedented velocity, volume and variety is changing the way we sense the world. Big data analytics has shown great potential in decision making, optimizing operations, preventing threats and capitalizing on new sources of revenues in various fields such as manufacturing, healthcare, insurance, and retail. To harness the power of big data, many research efforts have been made to develop new data programming models, e.g., MapReduce, and enhance data processing infrastructure from aspects of computation, storage and network. This talk will cover the most recent research results that address the challenges of networking for big data. First, a traffic-aware aggregation architecture will be studied for a single cluster. The all-to-all data forwarding from map tasks to reduce tasks in the traditional MapReduce framework would generate a large amount of network traffic. An aggregation architecture will be designed under the existing MapReduce framework with the objective of minimizing the data traffic during the shuffle phase. Second, for multiple clusters, a novel data-centric architecture with three key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and network coding based traffic routing, will be studied. This design leads to an optimization framework with the objective of minimizing both computation and transmission cost for running a set of MapReduce jobs in geo-distributed clouds.