Skip to content

Commit aaaa26f

Browse files
Use native implementation of LZMA and XZ (#405)
* deps: Use native lzma-rust2 instead of liblzma This is a native port of the XZ's liblzma. The main advantage is, that the crate is native Rust and is also used by the 7z crate. Performance should be equal to the liblzma crate, since I spend quite a bit of time to improve the performance. I only used the single threaded version of the reader/writer, since parallelization was not enabled for liblzma. lzma-rust2 has multithreaded reader/writer though. I had to remove the old bug report fix, since lzma_rust2 doesn't have the reported behavior. The test case also was obviously AI generated and useless. * fix: Properly implement LZMA decoding The old implementation actually didn't worked. The test file uses STORE as compression. This makes sure that LZMA is properly implemented. I remembered that I had to defer the initialization of reading the properties when implementing PPMd, so I did this here too. * fix: Fix linter issue * fix: Formatting of README.md * fix: Reduce IO reads * fix: Move XZ decoder and encoder to the heap * fix: Fix code review issues * fix: Fix UnwindSafe issue by using latest lzma-rust2 version * Rephrase a comment in src/compression.rs Signed-off-by: Chris Hennick <[email protected]> --------- Signed-off-by: Chris Hennick <[email protected]> Co-authored-by: Chris Hennick <[email protected]>
1 parent 56fa5f6 commit aaaa26f

File tree

10 files changed

+134
-71
lines changed

10 files changed

+134
-71
lines changed

Cargo.toml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ zeroize = { version = "1.8", optional = true, features = ["zeroize_derive"] }
4949
zstd = { version = "0.13", optional = true, default-features = false }
5050
zopfli = { version = "0.8", optional = true }
5151
deflate64 = { version = "0.1.9", optional = true }
52-
liblzma = { version = "0.4.1", optional = true }
52+
lzma-rust2 = { version = "0.13", optional = true, default-features = false, features = ["std", "encoder", "optimization", "xz"] }
5353

5454
[target.'cfg(fuzzing)'.dependencies]
5555
arbitrary = { version = "1.4.1", features = ["derive"] }
@@ -78,18 +78,18 @@ deflate-flate2-zlib = ["deflate-flate2", "flate2/zlib"]
7878
deflate-zopfli = ["dep:zopfli", "_deflate-any"]
7979
jiff-02 = ["dep:jiff"]
8080
nt-time = ["dep:nt-time"]
81-
lzma = ["dep:liblzma"]
82-
lzma-static = ["lzma", "liblzma/static"]
81+
lzma = ["dep:lzma-rust2"]
82+
lzma-static = ["lzma"]
8383
ppmd = ["dep:ppmd-rust"]
8484
unreserved = []
85-
xz = ["dep:liblzma"]
86-
xz-static = ["xz", "liblzma/static"]
85+
xz = ["dep:lzma-rust2"]
86+
xz-static = ["lzma"]
8787
default = [
8888
"aes-crypto",
8989
"bzip2",
9090
"deflate64",
9191
"deflate",
92-
"lzma-static",
92+
"lzma",
9393
"ppmd",
9494
"time",
9595
"zstd",

README.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Info
1010
----
1111

1212

13-
A zip library for rust which supports reading and writing of simple ZIP files. Formerly hosted at
13+
A zip library for rust which supports reading and writing of simple ZIP files. Formerly hosted at
1414
https://github.com/zip-rs/zip2.
1515

1616
Supported compression formats:
@@ -21,7 +21,7 @@ Supported compression formats:
2121
* bzip2
2222
* zstd
2323
* lzma (decompression only)
24-
* xz (decompression only)
24+
* xz
2525
* ppmd
2626

2727
Currently unsupported zip extensions:
@@ -35,8 +35,8 @@ The features available are:
3535

3636
* `aes-crypto`: Enables decryption of files which were encrypted with AES. Supports AE-1 and AE-2 methods.
3737
* `deflate`: Enables compressing and decompressing an unspecified implementation (that may change in future versions) of
38-
the deflate compression algorithm, which is the default for zip files. Supports compression quality 1..=264.
39-
* `deflate-flate2`: Combine this with any `flate2` feature flag that enables a back-end, to support deflate compression
38+
the deflate compression algorithm, which is the default for zip files. Supports compression quality 1..=264.
39+
* `deflate-flate2`: Combine this with any `flate2` feature flag that enables a back-end, to support deflate compression
4040
at quality 1..=9.
4141
* `deflate-zopfli`: Enables deflating files with the `zopfli` library (used when compression quality is 10..=264). This
4242
is the most effective `deflate` implementation available, but also among the slowest.
@@ -48,9 +48,10 @@ The features available are:
4848
* `chrono`: Enables converting last-modified `zip::DateTime` to and from `chrono::NaiveDateTime`.
4949
* `jiff-02`: Enables converting last-modified `zip::DateTime` to and from `jiff::civil::DateTime`.
5050
* `nt-time`: Enables returning timestamps stored in the NTFS extra field as `nt_time::FileTime`.
51+
* `xz`: Enables the XZ compression algorithm.
5152
* `zstd`: Enables the Zstandard compression algorithm.
5253

53-
By default `aes-crypto`, `bzip2`, `deflate`, `deflate64`, `lzma`, `ppmd`, `time` and `zstd` are enabled.
54+
By default `aes-crypto`, `bzip2`, `deflate`, `deflate64`, `lzma`, `ppmd`, `time`, `xz` and `zstd` are enabled.
5455

5556
MSRV
5657
----
@@ -65,12 +66,13 @@ Examples
6566
--------
6667

6768
See the [examples directory](examples) for:
68-
* How to write a file to a zip.
69-
* How to write a directory of files to a zip (using [walkdir](https://github.com/BurntSushi/walkdir)).
70-
* How to extract a zip file.
71-
* How to extract a single file from a zip.
72-
* How to read a zip from the standard input.
73-
* How to append a directory to an existing archive
69+
70+
* How to write a file to a zip.
71+
* How to write a directory of files to a zip (using [walkdir](https://github.com/BurntSushi/walkdir)).
72+
* How to extract a zip file.
73+
* How to extract a single file from a zip.
74+
* How to read a zip from the standard input.
75+
* How to append a directory to an existing archive
7476

7577
Fuzzing
7678
-------

examples/write_dir.rs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ enum CompressionMethod {
2626
Stored,
2727
Deflated,
2828
Bzip2,
29+
Xz,
2930
Zstd,
3031
}
3132

@@ -57,6 +58,15 @@ fn real_main() -> i32 {
5758
#[cfg(feature = "bzip2")]
5859
zip::CompressionMethod::Bzip2
5960
}
61+
CompressionMethod::Xz => {
62+
#[cfg(not(feature = "xz"))]
63+
{
64+
println!("The `xz` feature is not enabled");
65+
return 1;
66+
}
67+
#[cfg(feature = "xz")]
68+
zip::CompressionMethod::Xz
69+
}
6070
CompressionMethod::Zstd => {
6171
#[cfg(not(feature = "zstd"))]
6272
{

src/compression.rs

Lines changed: 74 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -214,13 +214,22 @@ pub(crate) enum Decompressor<R: io::BufRead> {
214214
#[cfg(feature = "zstd")]
215215
Zstd(zstd::Decoder<'static, R>),
216216
#[cfg(feature = "lzma")]
217-
Lzma(liblzma::bufread::XzDecoder<R>),
217+
Lzma(Lzma<R>),
218218
#[cfg(feature = "xz")]
219-
Xz(liblzma::bufread::XzDecoder<R>),
219+
Xz(Box<lzma_rust2::XzReader<R>>),
220220
#[cfg(feature = "ppmd")]
221221
Ppmd(Ppmd<R>),
222222
}
223223

224+
#[cfg(feature = "lzma")]
225+
pub(crate) enum Lzma<R: io::BufRead> {
226+
Uninitialized {
227+
reader: Option<R>,
228+
uncompressed_size: u64,
229+
},
230+
Initialized(Box<lzma_rust2::LzmaReader<R>>),
231+
}
232+
224233
#[cfg(feature = "ppmd")]
225234
pub(crate) enum Ppmd<R: io::BufRead> {
226235
Uninitialized(Option<R>),
@@ -240,7 +249,50 @@ impl<R: io::BufRead> io::Read for Decompressor<R> {
240249
#[cfg(feature = "zstd")]
241250
Decompressor::Zstd(r) => r.read(buf),
242251
#[cfg(feature = "lzma")]
243-
Decompressor::Lzma(r) => r.read(buf),
252+
Decompressor::Lzma(r) => match r {
253+
Lzma::Uninitialized {
254+
reader,
255+
uncompressed_size,
256+
} => {
257+
let mut reader = reader.take().ok_or_else(|| {
258+
io::Error::other("Reader was not set while reading LZMA data")
259+
})?;
260+
261+
// 5.8.8.1 LZMA Version Information & 5.8.8.2 LZMA Properties Size
262+
let mut header = [0; 4];
263+
reader.read_exact(&mut header)?;
264+
let _version_information = u16::from_le_bytes(header[0..2].try_into().unwrap());
265+
let properties_size = u16::from_le_bytes(header[2..4].try_into().unwrap());
266+
if properties_size != 5 {
267+
return Err(io::Error::new(
268+
io::ErrorKind::InvalidInput,
269+
format!("unexpected LZMA properties size of {properties_size}"),
270+
));
271+
}
272+
273+
let mut props_data = [0; 5];
274+
reader.read_exact(&mut props_data)?;
275+
let props = props_data[0];
276+
let dict_size = u32::from_le_bytes(props_data[1..5].try_into().unwrap());
277+
278+
// We don't need to handle the end-of-stream marker here, since the LZMA reader
279+
// stops at the end-of-stream marker OR when it has decoded uncompressed_size bytes, whichever comes first.
280+
let mut decompressor = lzma_rust2::LzmaReader::new_with_props(
281+
reader,
282+
*uncompressed_size,
283+
props,
284+
dict_size,
285+
None,
286+
)?;
287+
288+
let read = decompressor.read(buf)?;
289+
290+
*r = Lzma::Initialized(Box::new(decompressor));
291+
292+
Ok(read)
293+
}
294+
Lzma::Initialized(decompressor) => decompressor.read(buf),
295+
},
244296
#[cfg(feature = "xz")]
245297
Decompressor::Xz(r) => r.read(buf),
246298
#[cfg(feature = "ppmd")]
@@ -291,7 +343,12 @@ impl<R: io::BufRead> io::Read for Decompressor<R> {
291343
}
292344

293345
impl<R: io::BufRead> Decompressor<R> {
294-
pub fn new(reader: R, compression_method: CompressionMethod) -> crate::result::ZipResult<Self> {
346+
pub fn new(
347+
reader: R,
348+
compression_method: CompressionMethod,
349+
#[cfg(feature = "lzma")] uncompressed_size: u64,
350+
#[cfg(not(feature = "lzma"))] _uncompressed_size: u64,
351+
) -> crate::result::ZipResult<Self> {
295352
Ok(match compression_method {
296353
CompressionMethod::Stored => Decompressor::Stored(reader),
297354
#[cfg(feature = "deflate-flate2")]
@@ -307,15 +364,14 @@ impl<R: io::BufRead> Decompressor<R> {
307364
#[cfg(feature = "zstd")]
308365
CompressionMethod::Zstd => Decompressor::Zstd(zstd::Decoder::with_buffer(reader)?),
309366
#[cfg(feature = "lzma")]
310-
CompressionMethod::Lzma => Decompressor::Lzma(liblzma::bufread::XzDecoder::new_stream(
311-
reader,
312-
// Use u64::MAX for unlimited memory usage, matching the previous behavior
313-
// from lzma-rs. Using 0 would set the smallest memory limit, which is
314-
// problematic in ancient liblzma versions (5.2.3 and earlier).
315-
liblzma::stream::Stream::new_lzma_decoder(u64::MAX).unwrap(),
316-
)),
367+
CompressionMethod::Lzma => Decompressor::Lzma(Lzma::Uninitialized {
368+
reader: Some(reader),
369+
uncompressed_size,
370+
}),
317371
#[cfg(feature = "xz")]
318-
CompressionMethod::Xz => Decompressor::Xz(liblzma::bufread::XzDecoder::new(reader)),
372+
CompressionMethod::Xz => {
373+
Decompressor::Xz(Box::new(lzma_rust2::XzReader::new(reader, false)))
374+
}
319375
#[cfg(feature = "ppmd")]
320376
CompressionMethod::Ppmd => Decompressor::Ppmd(Ppmd::Uninitialized(Some(reader))),
321377
_ => {
@@ -340,7 +396,12 @@ impl<R: io::BufRead> Decompressor<R> {
340396
#[cfg(feature = "zstd")]
341397
Decompressor::Zstd(r) => r.finish(),
342398
#[cfg(feature = "lzma")]
343-
Decompressor::Lzma(r) => r.into_inner(),
399+
Decompressor::Lzma(r) => match r {
400+
Lzma::Uninitialized { mut reader, .. } => reader
401+
.take()
402+
.ok_or_else(|| io::Error::other("Reader was not set"))?,
403+
Lzma::Initialized(decoder) => decoder.into_inner(),
404+
},
344405
#[cfg(feature = "xz")]
345406
Decompressor::Xz(r) => r.into_inner(),
346407
#[cfg(feature = "ppmd")]

src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
//! | Bzip2 | ✅ | ✅ |
2424
//! | ZStandard | ✅ | ✅ |
2525
//! | LZMA | ✅ | |
26-
//! | XZ | ✅ | |
26+
//! | XZ | ✅ | |
2727
//! | PPMd | ✅ | ✅ |
2828
//! | AES encryption | ✅ | ✅ |
2929
//! | ZipCrypto deprecated encryption | ✅ | ✅ |

src/read.rs

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -426,13 +426,18 @@ pub(crate) fn make_crypto_reader<'a, R: Read>(
426426

427427
pub(crate) fn make_reader<R: Read>(
428428
compression_method: CompressionMethod,
429+
uncompressed_size: u64,
429430
crc32: u32,
430431
reader: CryptoReader<R>,
431432
) -> ZipResult<ZipFileReader<R>> {
432433
let ae2_encrypted = reader.is_ae2_encrypted();
433434

434435
Ok(ZipFileReader::Compressed(Box::new(Crc32Reader::new(
435-
Decompressor::new(io::BufReader::new(reader), compression_method)?,
436+
Decompressor::new(
437+
io::BufReader::new(reader),
438+
compression_method,
439+
uncompressed_size,
440+
)?,
436441
crc32,
437442
ae2_encrypted,
438443
))))
@@ -1134,7 +1139,12 @@ impl<R: Read + Seek> ZipArchive<R> {
11341139

11351140
Ok(ZipFile {
11361141
data: Cow::Borrowed(data),
1137-
reader: make_reader(data.compression_method, data.crc32, crypto_reader)?,
1142+
reader: make_reader(
1143+
data.compression_method,
1144+
data.uncompressed_size,
1145+
data.crc32,
1146+
crypto_reader,
1147+
)?,
11381148
})
11391149
}
11401150

@@ -1900,13 +1910,17 @@ pub fn read_zipfile_from_stream<R: Read>(reader: &mut R) -> ZipResult<Option<Zip
19001910

19011911
let limit_reader = reader.take(result.compressed_size);
19021912

1903-
let result_crc32 = result.crc32;
1904-
let result_compression_method = result.compression_method;
19051913
let crypto_reader = make_crypto_reader(&result, limit_reader, None, None)?;
1914+
let ZipFileData {
1915+
crc32,
1916+
uncompressed_size,
1917+
compression_method,
1918+
..
1919+
} = result;
19061920

19071921
Ok(Some(ZipFile {
19081922
data: Cow::Owned(result),
1909-
reader: make_reader(result_compression_method, result_crc32, crypto_reader)?,
1923+
reader: make_reader(compression_method, uncompressed_size, crc32, crypto_reader)?,
19101924
}))
19111925
}
19121926

src/write.rs

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ enum GenericZipWriter<W: Write + Seek> {
9898
#[cfg(feature = "zstd")]
9999
Zstd(ZstdEncoder<'static, MaybeEncrypted<W>>),
100100
#[cfg(feature = "xz")]
101-
Xz(liblzma::write::XzEncoder<MaybeEncrypted<W>>),
101+
Xz(Box<lzma_rust2::XzWriter<MaybeEncrypted<W>>>),
102102
#[cfg(feature = "ppmd")]
103103
Ppmd(Box<ppmd_rust::Ppmd8Encoder<MaybeEncrypted<W>>>),
104104
}
@@ -121,7 +121,7 @@ impl<W: Write + Seek> Debug for GenericZipWriter<W> {
121121
#[cfg(feature = "zstd")]
122122
GenericZipWriter::Zstd(w) => f.write_fmt(format_args!("Zstd({:?})", w.get_ref())),
123123
#[cfg(feature = "xz")]
124-
GenericZipWriter::Xz(w) => f.write_fmt(format_args!("Xz({:?})", w.get_ref())),
124+
GenericZipWriter::Xz(w) => f.write_fmt(format_args!("Xz({:?})", w.inner())),
125125
#[cfg(feature = "ppmd")]
126126
GenericZipWriter::Ppmd(_) => f.write_fmt(format_args!("Ppmd8Encoder")),
127127
}
@@ -1801,8 +1801,12 @@ impl<W: Write + Seek> GenericZipWriter<W> {
18011801
.ok_or(UnsupportedArchive("Unsupported compression level"))?
18021802
as u32;
18031803
Ok(Box::new(move |bare| {
1804-
Ok(GenericZipWriter::Xz(liblzma::write::XzEncoder::new(
1805-
bare, level,
1804+
Ok(GenericZipWriter::Xz(Box::new(
1805+
lzma_rust2::XzWriter::new(
1806+
bare,
1807+
lzma_rust2::XzOptions::with_preset(level),
1808+
)
1809+
.map_err(ZipError::Io)?,
18061810
)))
18071811
}))
18081812
}

tests/bug398.rs

Lines changed: 0 additions & 28 deletions
This file was deleted.

tests/data/lzma.zip

2.55 MB
Binary file not shown.

tests/lzma.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ fn decompress_lzma() {
1010
let mut archive = ZipArchive::new(io::Cursor::new(v)).expect("couldn't open test zip file");
1111

1212
let mut file = archive
13-
.by_name("hello.txt")
13+
.by_name("binary.wmv")
1414
.expect("couldn't find file in archive");
15-
assert_eq!("hello.txt", file.name());
15+
assert_eq!("binary.wmv", file.name());
1616

1717
let mut content = Vec::new();
1818
file.read_to_end(&mut content)
1919
.expect("couldn't read encrypted and compressed file");
20-
assert_eq!("Hello world\n", String::from_utf8(content).unwrap());
20+
assert_eq!(include_bytes!("data/folder/binary.wmv"), &content[..]);
2121
}

0 commit comments

Comments
 (0)