|
This workshop announcement will be of interest to the CNI community. Note that as of today registration has not yet opened.
Clifford Lynch
Director, CNI
-----------------------------------------
Large Scale Networking (LSN) Workshop on Huge Data:
A Computing, Networking and Distributed Systems Perspective
April 13-14, 2020
Sponsored by the National Science Foundation (NSF)
Location: Chicago, IL co-located with FABRIC Community Visioning Workshop
Website: http://www.netlab.uky.edu/hugedata2020
There is an ever-increasing demand in science and engineering, and
arguably all areas of research, on the creation, analysis, archival and
sharing of extremely large data sets - often referred to as “huge data”.
For example, the blackhole image comes from 5 petabytes of data
collected by the Event Horizon Telescope over a period of 7 days.
Scientific instruments such as confocal and multiphoton microscopes
generate huge images in the order of 10 GB per image and the total size
can grow quickly when the number of images generated increases. The
Large Hadron Collider generates 2000 petabytes of data over a typical 12
hour run. These data sets reside at the high end of the “big data”
spectrum and can include data sets that are continuously growing without
bounds. They are often collected from distributed devices (e.g.,
sensors), potentially processed on-site or at distributed clouds, and
can be intentionally placed/duplicated in distributed sites for
reliability, scalability and/or availability reasons. Data creation
resulting from measurement, generation, and transformation over
distributed locations is stressing the contemporary computing paradigm.
Efficient processing, persistent availability and timely delivery
(especially over wide-area) of huge data have become critically
important to the success of scientific research.
While distributed systems and networking research has well explored the
fundamental challenges and solution space for a broad spectrum of
distributed computing models operating on large data sets, the sheer
size of the data in question today has well surpassed that assumed in
prior research. To-date, the majority of computing systems and
applications operate based on clear delineation of data movement and
data computing. Data is moved from one or more data stores to a
computing system, and then it is computed “locally” on that system. This
paradigm consumes significant storage capacity at each computing system
to hold the transferred data and data generated by the computation, as
well as significant time for data transfer before and after the
computation. Looking forward, researchers have begun to discuss the
potential benefits of a completely new computing paradigm that more
efficiently supports “in situ” computation of extremely large data at
unprecedented scales across distributed computing systems interconnected
by high speed networks, with high performance data transfer functions
more closely integrated in software (e.g., operating systems) and
hardware infrastructure than have been so far. Such a new paradigm has
the potential to avoid bottlenecks for scientific discoveries and
engineering innovations through much faster, efficient, and scalable
computation across a globally distributed, highly interconnected and
vast collection of data and computation infrastructure.
This workshop intends to bring together domain scientists, network and
systems researchers, and infrastructure providers, to understand the
challenges and requirements of “huge-data” sciences and engineering
research needs and explore new paradigms to address the problems
associated with processing, storing, and transferring huge data. Topics
of interest include, but are not limited to:
● huge data applications, requirements and challenges
● challenges of designing and working with devices for huge data generation
● storage systems for huge data
● software systems and network protocols for huge data
● in-network computing/storage for huge data
● software-defined networking and infrastructure for huge data
● infrastructure support for huge data
● debugging and troubleshooting of huge data infrastructure
● AI/ML technologies for huge data
● measuring the huge data transfer and computation
● scientific workflow of huge data
● access to (portions of) huge data sets
● protecting/securing (portions of) huge data sets
Organizing Committee
Kuang-Ching Wang, Clemson University
James Griffioen, University of Kentucky
Ronald Hutchins, University of Virginia
Zongming Fei, University of Kentucky
Acknowledgment: The workshop is supported in part by the National
Science Foundation (NSF) under grant CNS-1747856 and by NITRD Large
Scale Networking (LSN) Interworking Group.
|
|