Hardwood
[](https://github.com/hardwood-hq/hardwood/edit/main/docs/content/index.md "Edit this page") _A lightweight Java reader for the Apache Parquet file format. Available as a Java library and a command-line tool._
Hardwood 1.0 is out!
Hardwood 1.0 is released — read the announcement blog post for the story behind the project and what it can do.
Why Hardwood[¶](https://hardwood.dev/1.0.0.Final/#why-hardwood "Permanent link")
Hardwood gives applications Parquet read support without pulling in Hadoop, Avro, or the wider parquet-java dependency tree:
- **Light-weight** — zero transitive dependencies beyond optional compression libraries (Snappy, ZSTD, LZ4, Brotli).
- **Fast** — matches or exceeds `parquet-java`'s read throughput; competitive in native-image builds and short-lived JVMs.
- **Concurrent** — multi-threaded at the core: pages decode in parallel on a shared thread pool, with cross-file prefetching for multi-file reads.
- **Compatible** — reads every file that `parquet-java` reads, with documented divergences where Hardwood applies stricter semantics (e.g. SQL three-valued `notEq`).
- **Embeddable** — usable from native CLIs, S3-only pipelines (without `hadoop-aws`), and Avro / Spark consumers via thin shim modules, including a drop-in `parquet-java` replacement.
Quick Example[¶](https://hardwood.dev/1.0.0.Final/#quick-example "Permanent link")
``` import dev.hardwood.InputFile; import dev.hardwood.reader.ParquetFileReader; import dev.hardwood.reader.RowReader;
try (ParquetFileReader fileReader = ParquetFileReader.open(InputFile.of(path)); RowReader rowReader = fileReader.rowReader()) {
while (rowReader.hasNext()) { rowReader.next();
long id = rowReader.getLong("id"); String name = rowReader.getString("name"); LocalDate birthDate = rowReader.getDate("birth_date"); Instant createdAt = rowReader.getTimestamp("created_at"); } } ```
Ready? Install Hardwood, then read your first file end-to-end.
Prefer to learn by running code? The hardwood-examples repository collects small, self-contained examples — one per concept — that you can clone and run with a single command.
Status and Limitations[¶](https://hardwood.dev/1.0.0.Final/#status-and-limitations "Permanent link")
Hardwood 1.0 is released and ready for production use.
The Hardwood library supports reading arbitrarily large Parquet files, provided individual column chunks are not larger than 2 GB (see Parquet file layout). The interactive `dive` TUI currently caps S3 files at 2 GB.
Roadmap[¶](https://hardwood.dev/1.0.0.Final/#roadmap "Permanent link")
Forward-looking items tracked for post-1.0. None are committed to a specific release.
- **Finalize `ColumnReader` API** — stabilize the API for columnar access and move it out of "Experimental" state. (#522)
- **Writer support** — write Parquet files in addition to reading; today Hardwood is reader-only. (#9)
- **Bloom filter predicate pushdown** — use per-chunk bloom filters for equality-predicate skipping on high-cardinality columns, where min/max statistics can't help. (#105)
- **Parquet Modular Encryption** — read files encrypted under the Parquet Modular Encryption spec: encrypted footer, per-column keys, AES-GCM and AES-GCM-CTR. (#128)
- **Apache Arrow interop** — `ColumnReader` output as Arrow `FieldVector` / `VectorSchemaRoot` for zero-copy handoff to DuckDB, DataFusion, Pandas-via-JNI, and other Arrow-native consumers. (#153)
Getting help[¶](https://hardwood.dev/1.0.0.Final/#getting-help "Permanent link")
- **Questions, ideas, design discussion** — GitHub Discussions. The best first stop for "how do I…", "is X possible…", or "what's the right way to…".
- **Bug reports and feature requests** — the GitHub issue tracker. Please check whether a similar issue already exists.
Talks & posts[¶](https://hardwood.dev/1.0.0.Final/#talks-posts "Permanent link")
- Hardwood: A New Parser for Apache Parquet — project announcement.
- Open Source Friday with Gunnar Morling — GitHub Open Source Friday.
- Chasing Efficient Java Development: From 1BRC to Developing Hardwood AI Natively — InfoQ podcast on building Hardwood.