IEEE Transactions on Computers - new TOC TOC Alert for Publication# 12
- A Machine Learning-Empowered Cache Management Scheme for High-Performance SSDsel mayo 22, 2024 a las 1:16 pm
NAND Flash-based solid-state drives (SSDs) have gained widespread usage in data storage thanks to their exceptional performance and low power consumption. The computational capability of SSDs has been elevated to tackle complex algorithms. Inside an SSD, a DRAM cache for frequently accessed requests reduces response time and write amplification (WA), thereby improving SSD performance and lifetime. Existing caching schemes, based on temporal locality, overlook its variations, which potentially reduces cache hit rates. Some caching schemes bolster performance via flash-aware techniques but at the expense of the cache hit rate. To address these issues, we propose a random forest machine learning Classifier-empowered Cache scheme named CCache, where I/O requests are classified into critical, intermediate, and non-critical ones according to their access status. After designing a machine learning model to predict these three types of requests, we implement a trie-level linked list to manage the cache placement and replacement. CCache safeguards critical requests for cache service to the greatest extent, while granting the highest priority to evicting request accessed by non-critical requests. CCache – considering chip state when processing non-critical requests – is implemented in an SSD simulator (SSDSim). CCache outperforms the alternative caching schemes, including LRU, CFLRU, LCR, NCache, ML_WP, and CCache_ANN, in terms of response time, WA, erase count, and hit ratio. The performance discrepancy between CCache and the OPT scheme is marginal. For example, CCache reduces the response time of the competitors by up to 41.9% with an average of 16.1%. CCache slashes erase counts by a maximum of 67.4%, with an average of 21.3%. The performance gap between CCache and and OPT is merely 2.0%-3.0%.
- DPU-Direct: Unleashing Remote Accelerators via Enhanced RDMA for Disaggregated Datacentersel mayo 22, 2024 a las 1:16 pm
This paper presents DPU-Direct, an accelerator disaggregation system that connects accelerator nodes (ANs) and CPU nodes (CNs) over a standard Remote Direct Memory Access (RDMA) network. DPU-Direct eliminates the latency introduced by the CPU-based network stack, and PCIe interconnects between network I/O and the accelerator. The DPU-Direct system architecture includes a DPU Wrapper hardware architecture, an RDMA-based Accelerator Access Pattern (RAAP), and a CN-side programming model. The DPU Wrapper connects accelerators directly with the RDMA engine, turning ANs into disaggregation-native devices. The RAAP provides the CN with low-latency and high throughput accelerator semantics based on standard RDMA operations. Our FPGA prototype demonstrates DPU-Direct's efficacy with two proof-of-concept applications: AES encryption and key-value cache, which are computationally intensive and latency-sensitive. DPU-Direct yields a 400x speedup in AES encryption over the CPU baseline and matches the performance of the locally integrated AES accelerator. For key-value cache, DPU-Direct reduces the average end-to-end latency by 1.66x for GETs and 1.30x for SETs over the CPU-RDMA-Polling baseline, reducing latency jitter by over 10x for both operations.
- BSR-FL: An Efficient Byzantine-Robust Privacy-Preserving Federated Learning Frameworkel mayo 22, 2024 a las 1:16 pm
Federated learning (FL) is a technique that enables clients to collaboratively train a model by sharing local models instead of raw private data. However, existing reconstruction attacks can recover the sensitive training samples from the shared models. Additionally, the emerging poisoning attacks also pose severe threats to the security of FL. However, most existing Byzantine-robust privacy-preserving federated learning solutions either reduce the accuracy of aggregated models or introduce significant computation and communication overheads. In this paper, we propose a novel Blockchain-based Secure and Robust Federated Learning (BSR-FL) framework to mitigate reconstruction attacks and poisoning attacks. BSR-FL avoids accuracy loss while ensuring efficient privacy protection and Byzantine robustness. Specifically, we first construct a lightweight non-interactive functional encryption (NIFE) scheme to protect the privacy of local models while maintaining high communication performance. Then, we propose a privacy-preserving defensive aggregation strategy based on NIFE, which can resist encrypted poisoning attacks without compromising model privacy through secure cosine similarity and incentive-based Byzantine-tolerance aggregation. Finally, we utilize the blockchain system to assist in facilitating the processes of federated learning and the implementation of protocols. Extensive theoretical analysis and experiments demonstrate that our new BSR-FL has enhanced privacy security, robustness, and high efficiency.
- BlockCompass: A Benchmarking Platform for Blockchain Performanceel mayo 22, 2024 a las 1:16 pm
Blockchain technology has gained momentum due to its immutability and transparency. Several blockchain platforms, each with different consensus protocols, have been proposed. However, choosing and configuring such a platform is a non-trivial task. Numerous benchmarking tools have been introduced to test the performance of blockchain solutions. Yet, these tools are often limited to specific blockchain platforms or require complex configurations. Moreover, they tend to focus on one-off batch evaluation models, which may not be ideal for longer-running instances under continuous workloads. In this work, we present BlockCompass, an all-inclusive blockchain benchmarking tool that can be easily configured and extended. We demonstrate how BlockCompass can evaluate the performance of various blockchain platforms and configurations, including Ethereum Proof-of-Authority, Ethereum Proof-of-Work, Hyperledger Fabric Raft, Hyperledger Sawtooth with Proof-of-Elapsed-Time, Practical Byzantine Fault Tolerance, and Raft consensus algorithms, against workloads that continuously fluctuate over time. We show how continuous transactional workloads may be more appropriate than batch workloads in capturing certain stressful events for the system. Finally, we present the results of a usability study about the convenience and effectiveness offered by BlockCompass in blockchain benchmarking.
- Relieving Write Disturbance for Phase Change Memory With RESET-Aware Data Encodingel mayo 21, 2024 a las 1:16 pm
The write disturbance (WD) problem is becoming increasingly severe in PCM due to the continuous scaling down of memory technology. Previous studies have attempted to transform WD-vulnerable data patterns of the new data to alleviate the WD problem. However, through a wide spectrum of real-world benchmarks, we have discovered that simply transforming WD-vulnerable data patterns does not proportionally reduce (or may even increase) WD errors. To address this issue, we present ResEnc, a RESET-aware data encoding scheme that reduces RESET operations to mitigate the WD problem in both wordlines and bitlines for PCM. It dynamically establishes a mask word for each block for data encoding and adaptively selects an appropriate encoding granularity based on the diverse write patterns. ResEnc finally reassigns the mask words of unchanged blocks to changed blocks for exploring a further reduction of WD errors. Extensive experiments show that ResEnc can reduce 16.8-87.0% of WD errors, shorten 5.6-39.6% of write latency, and save 7.0-43.1% of write energy for PCM.