# dm-lz4crypt **Repository Path**: mirrors_microsoft/dm-lz4crypt ## Basic Information - **Project Name**: dm-lz4crypt - **Description**: Dev-mapper target (driver) to compress/encrypt block data to facilitate better compression by cloud storage backends. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-20 - **Last Updated**: 2025-10-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # dm-lz4crypt In cloud computing environments, VM disk images are often compressed and stored by the cloud storage backend service. But if the file systems on the disk image are encrypted by the guest (e.g., by dm-crypt) the storage backend will be unable to effectively compress the data, requiring substantially more cloud disk storage. Encrypted data suffers from extremely high entropy and therefore compresses poorly. Worse, compressing encrypted data may actually cause the data to become slightly larger due to compression headers. This project introduces a dev-mapper target that overcomes this particular problem as described below. Dm-lz4crypt is a dev-mapper target for Linux that implements two-pass compression, wherein the VM guest performs the first pass, and the cloud storage backend performs the second pass (see figure below). Two-pass compression is transparent to the storage backend, where no changes are needed to support this process. ![Diagram 1](doc/dm-lz4crypt-diagram1.png) This driver sits below any file system driver and receives block-level read and write requests. For each block write, the driver attempts to compress a 4096-byte block into the allotted space. If successful, a new 4096-byte block is formed that contains (1) a compression header, (2) the compressed bytes, and (3) trailing zeros. The compression header and the compressed bytes are then encrypted using AES-XTS encryption, leaving the trailing zeros untouched. The new block is passed down to the underlying device. Compressed blocks have the following format. ![Diagram 2](doc/dm-lz4crypt-diagram2.png) If compression is unsuccessful, the block is encrypted and passed down to the underlying device as is. When a block is read, the potential compression header is decrypted. If the magic number is correct, then the block is “potentially compressed”. If so, the encrypted portion is decrypted, and the CRC is checked. The CRC covers the magic number through the end of the compressed content (the CRC may prove unnecessary and may later be removed from the design). If the CRC is correct, then the block is “potentially compressed”. If so, then the SHA-256 is checked, which covers the CRC-32 through the end of the compressed content. If the SHA-256 is correct, the block has successfully been identified as a compressed block. The block is then decompressed into a new 4096 block and passed to the upper layer. If the block is not compressed, then it is decrypted and passed as is to the upper layer. The driver handles the first pass of compression only. The storage backend handles the second pass by compressing any trailing zeros it encounters (which compress by 99% on average). The storage backend will behave as it does today by attempting to compress entire regions that contain mixed strings of encrypted data and zero data. The overall compression rate approaches what could be attained by compressing the original plain-text data. The driver has a simple design since it does not manage any auxiliary metadata. Block reads and writes are 1-to-1 mappings, so each read/write operation involves only one block. Accordingly, the theoretical performance of an optimized dm-lz4crypt driver will be comparable to dm-crypt. The performance observed so far already seems to be in line with dm-crypt, although more performance work would we welcome. A advantage of first-pass compression is that it only makes the data compressible, but it does not actually reduce the overall size of the disk. Instead, it leaves the realization of compression to the storage backend.