Home Resources DuckDB + DuckLake: Building a Lightweight Data Lakehouse Without Heavy Infrastructure

DuckDB + DuckLake: Building a Lightweight Data Lakehouse Without Heavy Infrastructure

In this AI Tech Experts Webinar, Grzegorz Rybak (Senior Data Engineer) and Cezary Gorczyński (Data Engineer) explore whether #DuckDB + #DuckLake can serve as a practical lightweight data platform for modern analytics workloads.

They walk through common consulting scenarios where teams must balance fast delivery with long-term scalability and explain how a DuckDB-based stack can provide strong analytical performance without committing to a full warehouse platform. Topics covered

real trade-offs:

  • concurrency limits, streaming gaps, file fragmentation
  • DuckDB as an embedded OLAP analytical engine
  • querying Parquet directly with a zero-ingest workflow
  • DuckLake as a lakehouse management layer
  • the “Holy Trinity” architecture: compute, storage, metadata

Timeline

00:00 DuckDB intro

00:57 Client problems: greenfield vs legacy data stacks

06:42 DuckDB architecture and in-process analytics engine

12:38 DuckLake lakehouse layer and the “Holy Trinity” architecture

17:23 Trade-offs: concurrency, scaling and streaming limits

21:37 Conclusions: portable lakehouse strategy

Speaker