G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

G10 integrates the host memory, GPU memory, and flash memory into a unified memory space, to scale the GPU memory capacity while enabling transparent data migrations. Based on this unified GPU memory and storage architecture, G10 utilizes compiler techniques to characterize the tensor behaviors in deep learning workloads and schedule data migrations in advance.

Learning to Drive Software-Defined SSDs

We present an automated learning-based SSD hardware configuration framework, named AutoBlox, that utilizes both supervised and unsupervised machine learning (ML) techniques to drive the tuning of hardware configurations for SSDs. AutoBlox automatically extracts the unique access patterns of a new workload using its block I/O traces, maps the workload to previous workloads for utilizing the learned experiences, and recommends an optimized SSD configuration based on the validated storage performance.

RackBlox: A Software-Defined Rack-Scale Storage System with Network/Storage Co-design

We co-design the SDN and SDF stack by re-defining the functions of their control plane and data plane, and splitting up them within a new architecture named RackBlox. RackBlox decouples the storage management functions of flash-based solid-state drives (SSDs), and allow the SDN to track and manage the states of SSDs in a rack. Therefore, we can enable the state sharing between SDN and SDF, and facilitate global storage resource management.

The First Workshop on Hot Topics in System Infrastructure

The Workshop on Hot Topics in System Infrastructure (HotInfra'23) provides a unique forum for cutting-edge research on system infrastructure and platforms. Researchers and engineers can share their recent research results and experiences and discuss new challenges and opportunities in building next-generation system infrastructures, such as AI infrastructure, software-defined data centers, and edge/cloud computing infrastructure. The topics span across the full system stack with a focus on the design and implementation of system infrastructures. Relevant topics include hardware architecture, operating systems, runtime systems, and emerging applications.

System Virtualization for Neural Processing Units

Modern cloud platforms have been employing hardware accelerators such as neural processing units (NPUs) to meet the increasing demand for computing resources for AI-based application services. However, due to the lack of system virtualization support, the current way of using NPUs in cloud platforms suffers from either low resource utilization or poor isolation between multi-tenant application services. In this paper, we investigate the system virtualization techniques for NPUs across the entire software and hardware stack, and present our NPU virtualization solution. We propose a flexible NPU abstraction named vNPU that allows fine-grained NPU virtualization and resource management.

V10: Hardware-Assisted NPU Multi-tenancy for Improved Resource Utilization and Fairness

We present V10, a hardware-assisted NPU multi-tenancy framework for improving resource utilization, while ensuring fairness for different ML services. We rethink the NPU architecture for supporting multi-tenancy. V10 employs an operator scheduler for enabling concurrent operator executions on the systolic array and the vector unit, and offers flexibility for enforcing different priority-based resource-sharing mechanisms. V10 also enables fine-grained operator preemption and lightweight context switch.

LeaFTL: A Learning-based Flash Translation Layer for Solid-State Drives

we present a learning-based flash translation layer (FTL), named LeaFTL, which learns the address mapping to tolerate dynamic data access patterns via linear regression at runtime. By grouping a large set of mapping entries into a learned segment, it significantly reduces the memory footprint of the address mapping table, which further benefits the data caching in SSD controllers.