Hardwood

Source: https://hardwood.dev/1.0.0.Final/

[](https://github.com/hardwood-hq/hardwood/edit/main/docs/content/index.md "Edit this page") _A lightweight Java reader for the Apache Parquet file format. Available as a Java library and a command-line tool._

Hardwood 1.0 is out!

Hardwood 1.0 is released — read the announcement blog post for the story behind the project and what it can do.

Why Hardwood[¶](https://hardwood.dev/1.0.0.Final/#why-hardwood "Permanent link")

Hardwood gives applications Parquet read support without pulling in Hadoop, Avro, or the wider parquet-java dependency tree:

- **Light-weight** — zero transitive dependencies beyond optional compression libraries (Snappy, ZSTD, LZ4, Brotli).

- **Fast** — matches or exceeds `parquet-java`'s read throughput; competitive in native-image builds and short-lived JVMs.

- **Concurrent** — multi-threaded at the core: pages decode in parallel on a shared thread pool, with cross-file prefetching for multi-file reads.

- **Compatible** — reads every file that `parquet-java` reads, with documented divergences where Hardwood applies stricter semantics (e.g. SQL three-valued `notEq`).

- **Embeddable** — usable from native CLIs, S3-only pipelines (without `hadoop-aws`), and Avro / Spark consumers via thin shim modules, including a drop-in `parquet-java` replacement.

Quick Example[¶](https://hardwood.dev/1.0.0.Final/#quick-example "Permanent link")

``` import dev.hardwood.InputFile; import dev.hardwood.reader.ParquetFileReader; import dev.hardwood.reader.RowReader;

try (ParquetFileReader fileReader = ParquetFileReader.open(InputFile.of(path)); RowReader rowReader = fileReader.rowReader()) {

while (rowReader.hasNext()) { rowReader.next();

long id = rowReader.getLong("id"); String name = rowReader.getString("name"); LocalDate birthDate = rowReader.getDate("birth_date"); Instant createdAt = rowReader.getTimestamp("created_at"); } } ```

Ready? Install Hardwood, then read your first file end-to-end.

Prefer to learn by running code? The hardwood-examples repository collects small, self-contained examples — one per concept — that you can clone and run with a single command.

Status and Limitations[¶](https://hardwood.dev/1.0.0.Final/#status-and-limitations "Permanent link")

Hardwood 1.0 is released and ready for production use.

The Hardwood library supports reading arbitrarily large Parquet files, provided individual column chunks are not larger than 2 GB (see Parquet file layout). The interactive `dive` TUI currently caps S3 files at 2 GB.

Roadmap[¶](https://hardwood.dev/1.0.0.Final/#roadmap "Permanent link")

Forward-looking items tracked for post-1.0. None are committed to a specific release.

- **Finalize `ColumnReader` API** — stabilize the API for columnar access and move it out of "Experimental" state. (#522)

- **Writer support** — write Parquet files in addition to reading; today Hardwood is reader-only. (#9)

- **Bloom filter predicate pushdown** — use per-chunk bloom filters for equality-predicate skipping on high-cardinality columns, where min/max statistics can't help. (#105)

- **Parquet Modular Encryption** — read files encrypted under the Parquet Modular Encryption spec: encrypted footer, per-column keys, AES-GCM and AES-GCM-CTR. (#128)

- **Apache Arrow interop** — `ColumnReader` output as Arrow `FieldVector` / `VectorSchemaRoot` for zero-copy handoff to DuckDB, DataFusion, Pandas-via-JNI, and other Arrow-native consumers. (#153)

Getting help[¶](https://hardwood.dev/1.0.0.Final/#getting-help "Permanent link")

- **Questions, ideas, design discussion** — GitHub Discussions. The best first stop for "how do I…", "is X possible…", or "what's the right way to…".

- **Bug reports and feature requests** — the GitHub issue tracker. Please check whether a similar issue already exists.

Talks & posts[¶](https://hardwood.dev/1.0.0.Final/#talks-posts "Permanent link")

- Hardwood: A New Parser for Apache Parquet — project announcement.

- Open Source Friday with Gunnar Morling — GitHub Open Source Friday.

- Chasing Efficient Java Development: From 1BRC to Developing Hardwood AI Natively — InfoQ podcast on building Hardwood.