CNI-ANNOUNCE@cni.org Mailing List Archive

This workshop announcement will be of interest to the CNI community. Note that as of today registration has not yet opened. Clifford Lynch Director, CNI ----------------------------------------- Large Scale Networking (LSN) Workshop on Huge Data: A Computing, Networking and Distributed Systems Perspective April 13-14, 2020 Sponsored by the National Science Foundation (NSF) Location: Chicago, IL co-located with FABRIC Community Visioning Workshop Website: http://www.netlab.uky.edu/hugedata2020 There is an ever-increasing demand in science and engineering, and arguably all areas of research, on the creation, analysis, archival and sharing of extremely large data sets - often referred to as “huge data”. For example, the blackhole image comes from 5 petabytes of data collected by the Event Horizon Telescope over a period of 7 days. Scientific instruments such as confocal and multiphoton microscopes generate huge images in the order of 10 GB per image and the total size can grow quickly when the number of images generated increases. The Large Hadron Collider generates 2000 petabytes of data over a typical 12 hour run. These data sets reside at the high end of the “big data” spectrum and can include data sets that are continuously growing without bounds. They are often collected from distributed devices (e.g., sensors), potentially processed on-site or at distributed clouds, and can be intentionally placed/duplicated in distributed sites for reliability, scalability and/or availability reasons. Data creation resulting from measurement, generation, and transformation over distributed locations is stressing the contemporary computing paradigm. Efficient processing, persistent availability and timely delivery (especially over wide-area) of huge data have become critically important to the success of scientific research. While distributed systems and networking research has well explored the fundamental challenges and solution space for a broad spectrum of distributed computing models operating on large data sets, the sheer size of the data in question today has well surpassed that assumed in prior research. To-date, the majority of computing systems and applications operate based on clear delineation of data movement and data computing. Data is moved from one or more data stores to a computing system, and then it is computed “locally” on that system. This paradigm consumes significant storage capacity at each computing system to hold the transferred data and data generated by the computation, as well as significant time for data transfer before and after the computation. Looking forward, researchers have begun to discuss the potential benefits of a completely new computing paradigm that more efficiently supports “in situ” computation of extremely large data at unprecedented scales across distributed computing systems interconnected by high speed networks, with high performance data transfer functions more closely integrated in software (e.g., operating systems) and hardware infrastructure than have been so far. Such a new paradigm has the potential to avoid bottlenecks for scientific discoveries and engineering innovations through much faster, efficient, and scalable computation across a globally distributed, highly interconnected and vast collection of data and computation infrastructure. This workshop intends to bring together domain scientists, network and systems researchers, and infrastructure providers, to understand the challenges and requirements of “huge-data” sciences and engineering research needs and explore new paradigms to address the problems associated with processing, storing, and transferring huge data. Topics of interest include, but are not limited to: ● huge data applications, requirements and challenges ● challenges of designing and working with devices for huge data generation ● storage systems for huge data ● software systems and network protocols for huge data ● in-network computing/storage for huge data ● software-defined networking and infrastructure for huge data ● infrastructure support for huge data ● debugging and troubleshooting of huge data infrastructure ● AI/ML technologies for huge data ● measuring the huge data transfer and computation ● scientific workflow of huge data ● access to (portions of) huge data sets ● protecting/securing (portions of) huge data sets Organizing Committee Kuang-Ching Wang, Clemson University James Griffioen, University of Kentucky Ronald Hutchins, University of Virginia Zongming Fei, University of Kentucky Acknowledgment: The workshop is supported in part by the National Science Foundation (NSF) under grant CNS-1747856 and by NITRD Large Scale Networking (LSN) Interworking Group.