DuckDB is an in-process
SQL OLAP database management system
Why DuckDB?
Simple and portable
- In-process, serverless
- C++11, no dependencies, single-file build
- APIs for Python, R, Java, Julia, Swift, …
- Runs on Windows, Linux, macOS, OpenBSD, …
Feature-rich
- Transactions, persistence
- Extensive SQL support
- Direct Parquet, CSV, and JSON querying
- Joins, aggregates, window functions
Fast
- Optimized for analytics
- Vectorized and parallel engine
- Larger than memory processing
- Parallel Parquet, CSV, and NDJSON loaders
All the benefits of a database, none of the hassle.
Installation
Choose your environment to use for DuckDB
- Command Line
- Python
- R
- Java
- node.js
- Julia
- C++
- ODBC
Latest release: DuckDB 0.8.1 System detected: Other Installations
pip install duckdb==0.8.1
install.packages("duckdb")
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>0.8.1</version>
</dependency>
More Options
npm install duckdb
using Pkg
Pkg.add("DuckDB")
https://github.com/
https://github.com/
https://github.com/
brew install duckdb
---
Direct download: https://github.com
https://github.com
https://github.com
https://github.com/
https://github.com/
Not available
When to use DuckDB
- Processing and storing tabular datasets, e.g. from CSV or Parquet files
- Interactive data analysis, e.g. Joining & aggregate multiple large tables
- Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns
- Large result set transfer to client
When to not use DuckDB
- High-volume transactional use cases (e.g. tracking orders in a webshop)
- Large client/server installations for centralized enterprise data warehousing
- Writing to a single database from multiple concurrent processes
- Multiple concurrent processes reading from a single writable database
Blog
ArchiveDuckDB ADBC - Zero-Copy data transfer via Arrow Database Connectivity
TLDR: DuckDB has added support for Arrow Database Connectivity (ADBC), an API standard that enables efficient data ingestion and retrieval from database systems, similar to Open Database Connectivity (ODBC) interface. However, unlike ODBC, ADBC specifically caters to the columnar storage model, facilitating fast data transfers between a columnar database and […]
continue readingFrom Waddle to Flying: Quickly expanding DuckDB's functionality with Scalar Python UDFs
TLDR: DuckDB now supports vectorized Scalar Python User Defined Functions (UDFs). By implementing Python UDFs, users can easily expand the functionality of DuckDB while taking advantage of DuckDB’s fast execution model, SQL and data safety. User Defined Functions (UDFs) enable users to extend the functionality of a Database Management System […]
continue readingCorrelated Subqueries in SQL
Subqueries in SQL are a powerful abstraction that allow simple queries to be used as composable building blocks. They allow you to break down complex problems into smaller parts, and subsequently make it easier to write, understand and maintain large and complex queries. DuckDB uses a state-of-the-art subquery decorrelation optimizer […]
continue reading