Globus at SC13

November 17, 2013 at 2:30 PM CST – November 21, 2013 at 9:00 PM CST
  • TBA

 

SC is the year’s biggest gathering of the HPC community, and Globus is always a part of the action. If you’re going to be in Denver for this year’s conference, come and see us!

Here’s where to find Globus at SC13:

 

1. Globus booth in the Exhibit Hall

Date: Tuesday, Nov. 19 to Thursday, Nov. 21.

We’ll be in the trade show in Booth 437. Stop by to meet the Globus team and talk with us about your research data management issues. The exhibit hall is open 10 a.m. to 6 p.m. Tuesday and Wednesday, and 10 a.m. to 3 p.m. on Thursday.

 

2. Tutorial: Globus Online and the Science DMZ as Scalable Research Data Management Infrastructure for HPC Facilities

Date: Sunday, Nov. 17

Time: 8:30AM-5:00PM

Presenters: Rajkumar Kettimuthu, Vas Vasiliadis, Steve Tuecke, Eli Dart

Abstract: The rapid growth of data in scientific research endeavors is placing massive demands on campus computing centers and high-performance computing (HPC) facilities. Computing facilities must provide robust data services built on high-performance infrastructure, while continuing to scale as needs increase. Traditional research data management (RDM) solutions are typically difficult to use and error-prone, and the underlying networking and security infrastructure is often complex and inflexible, resulting in user frustration and sub-optimal use of resources. An increasingly common solution in HPC facilities is Globus Online deployed in a network environment built on the Science DMZ model. Globus Online is software-as-a-service for moving, syncing, and sharing large data sets. The Science DMZ model is a set of design patterns for network equipment, configuration, and security policy for high-performance scientific infrastructure. The combination of user-friendly, high-performance data transfer tools, and optimally configured underlying infrastructure results in enhanced RDM services that increase user productivity and lower support overhead. Guided by two case studies from national supercomputing centers (NERSC and NCSA), attendees will explore the challenges such facilities face in delivering scalable RDM solutions. Attendees will be introduced to Globus Online and the Science DMZ, and will learn how to deploy and manage these systems.

3. Birds-of-a-Feather session: Campus Bridging with XSEDE and Globus Online

Date: Tuesday, Nov. 19

Time: 5:30PM – 7:00PM

Session Leaders: Steve Tuecke, Rachana Ananthakrishnan

Room: 601/603

Abstract: As science becomes more computation- and data-intensive, computing needs often exceed campus capacity. Thus campuses desire to scale from the local environment to other campuses, to national cyberinfrastructure providers such as XSEDE, and/or to cloud providers. But given the realities of limited resources, time, and expertise, campus bridging methods must be exceedingly easy to use. This BOF will explore Globus Onlines transfer and sharing tools in the XSEDE setting, which address the important campus bridging use case of moving, sharing, and synchronizing data across institutional boundaries, achieving ease of use for researchers and ease of administration for campus IT staff.

 

4. Paper: SDQuery DSI: Integrating Data Management Support with a Wide Area Data Transfer Protocol

Session: Data Management in the Cloud

Time: 11:00AM - 11:30AM

Session Chair: Erwin Laure

Authors: Yu Su, Yi Wang, Gagan Agrawal, Rajkumar Kettimuthu

Room: 205/207

Abstract: In many science areas where datasets need to be transferred or shared, rapid growth in dataset size, coupled with lack of increase in wide area data transfer bandwidth, is making it extremely hard for scientists to analyze the data. This paper addresses the current limitations by developing SDQuery DSI, a GridFTP plug-in which supports flexible server-side data subsetting over HDF5 and NetCDF data formats. The GridFTP server is able to dynamically load this tool to download any data subset. Different queries types (query over dimensions, coordinates and values) are supported by our tool. A number of optimizations for improving indexing (parallel indexing), data subsetting (performance model) and data transfer (parallel streaming) are also applied. We have extensively evaluated our implementation. We compared our GridFTP SDQuery DSI with GridFTP default File DSI and showed that in different network environments, our method can achieve better efficiency in almost all cases.