DuckDB is an embeddable SQL OLAP Database Management System

duckdb.threadsafety bool

Indicates that this package is threadsafe

duckdb.apilevel int

Indicates which Python DBAPI version this package implements

duckdb.paramstyle str

Indicates which parameter style duckdb supports

duckdb.default_connection duckdb.DuckDBPyConnection

The connection that is used by default if you don’t explicitly pass one to the root methods in this module

exception duckdb.BinderException

Bases: ProgrammingError

exception duckdb.CastException

Bases: DataError

exception duckdb.CatalogException

Bases: ProgrammingError

exception duckdb.ConnectionException

Bases: OperationalError

exception duckdb.ConstraintException

Bases: IntegrityError

exception duckdb.ConversionException

Bases: DataError

exception duckdb.DataError

Bases: Error

class duckdb.DuckDBPyConnection

Bases: pybind11_object

append(self: duckdb.DuckDBPyConnection, table_name: str, df: pandas.DataFrame) duckdb.DuckDBPyConnection

Append the passed Data.Frame to the named table

arrow(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.Table

Fetch a result as Arrow table following execute()

begin(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Start a new transaction

close(self: duckdb.DuckDBPyConnection) None

Close the connection

commit(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Commit changes performed within a transaction

cursor(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Create a duplicate of the current connection

property description

Get result set attributes, mainly column names

df(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as Data.Frame following execute()

duplicate(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Create a duplicate of the current connection

execute(self: duckdb.DuckDBPyConnection, query: str, parameters: object = None, multiple_parameter_sets: bool = False) duckdb.DuckDBPyConnection

Execute the given SQL query, optionally using prepared statements with parameters set

executemany(self: duckdb.DuckDBPyConnection, query: str, parameters: object = None) duckdb.DuckDBPyConnection

Execute the given prepared statement multiple times using the list of parameter sets in parameters

fetch_arrow_table(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.Table

Fetch a result as Arrow table following execute()

fetch_df(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as Data.Frame following execute()

fetch_df_chunk(self: duckdb.DuckDBPyConnection, vectors_per_chunk: int = 1, *, date_as_object: bool = False) pandas.DataFrame

Fetch a chunk of the result as Data.Frame following execute()

fetch_record_batch(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.RecordBatchReader

Fetch an Arrow RecordBatchReader following execute()

fetchall(self: duckdb.DuckDBPyConnection) list

Fetch all rows from a result following execute

fetchdf(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as Data.Frame following execute()

fetchmany(self: duckdb.DuckDBPyConnection, size: int = 1) list

Fetch the next set of rows from a result following execute

fetchnumpy(self: duckdb.DuckDBPyConnection) dict

Fetch a result as list of NumPy arrays following execute

fetchone(self: duckdb.DuckDBPyConnection) object

Fetch a single row from a result following execute

from_arrow(self: duckdb.DuckDBPyConnection, arrow_object: object) duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

from_csv_auto(self: duckdb.DuckDBPyConnection, file_name: str) duckdb.DuckDBPyRelation

Create a relation object from the CSV file in file_name

from_df(self: duckdb.DuckDBPyConnection, df: pandas.DataFrame = None) duckdb.DuckDBPyRelation

Create a relation object from the Data.Frame in df

from_parquet(self: duckdb.DuckDBPyConnection, file_name: str, binary_as_string: bool = False) duckdb.DuckDBPyRelation

Create a relation object from the Parquet file in file_name

from_query(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') duckdb.DuckDBPyRelation

Create a relation object from the given SQL query

from_substrait(self: duckdb.DuckDBPyConnection, proto: bytes) duckdb.DuckDBPyRelation

Create a query object from protobuf plan

get_substrait(self: duckdb.DuckDBPyConnection, query: str) duckdb.DuckDBPyRelation

Serialize a query to protobuf

get_substrait_json(self: duckdb.DuckDBPyConnection, query: str) duckdb.DuckDBPyRelation

Serialize a query to protobuf on the JSON format

get_table_names(self: duckdb.DuckDBPyConnection, query: str) Set[str]

Extract the required table names from a query

install_extension(self: duckdb.DuckDBPyConnection, extension: str, *, force_install: bool = False) None

Install an extension by name

load_extension(self: duckdb.DuckDBPyConnection, extension: str) None

Load an installed extension

query(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

register(self: duckdb.DuckDBPyConnection, view_name: str, python_object: object) duckdb.DuckDBPyConnection

Register the passed Python Object value for querying with a view

rollback(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Roll back changes performed within a transaction

table(self: duckdb.DuckDBPyConnection, table_name: str) duckdb.DuckDBPyRelation

Create a relation object for the name’d table

table_function(self: duckdb.DuckDBPyConnection, name: str, parameters: object = None) duckdb.DuckDBPyRelation

Create a relation object from the name’d table function with given parameters

unregister(self: duckdb.DuckDBPyConnection, view_name: str) duckdb.DuckDBPyConnection

Unregister the view name

values(self: duckdb.DuckDBPyConnection, values: object) duckdb.DuckDBPyRelation

Create a relation object from the passed values

view(self: duckdb.DuckDBPyConnection, view_name: str) duckdb.DuckDBPyRelation

Create a relation object for the name’d view

class duckdb.DuckDBPyRelation

Bases: pybind11_object

abs(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the absolute value for the specified columns.

aggregate(self: duckdb.DuckDBPyRelation, aggr_expr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate aggr_expr by the optional groups group_expr on the relation

property alias

Get the name of the current alias

apply(self: duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') duckdb.DuckDBPyRelation

Compute the function of a single column or a list of columns by the optional groups on the relation

arrow(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table

Execute and fetch all rows as an Arrow Table

property columns

Get the names of the columns of this relation.

count(self: duckdb.DuckDBPyRelation, count_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate count of a single column or a list of columns by the optional groups on the relation

create(self: duckdb.DuckDBPyRelation, table_name: str) None

Creates a new table named table_name with the contents of the relation object

create_view(self: duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) duckdb.DuckDBPyRelation

Creates a view named view_name that refers to the relation object

cummax(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the cumulative maximum of the aggregate column.

cummin(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the cumulative minimum of the aggregate column.

cumprod(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the cumulative product of the aggregate column.

cumsum(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the cumulative sum of the aggregate column.

describe(self: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Gives basic statistics (e.g., min,max) and if null exists for each column of the relation.

df(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) pandas.DataFrame

Execute and fetch all rows as a pandas DataFrame

distinct(self: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Retrieve distinct rows from this relation object

property dtypes

Get the columns types of the result.

except_(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Create the set except of this relation object with another relation object in other_rel

execute(self: duckdb.DuckDBPyRelation) duckdb::DuckDBPyResult

Transform the relation into a result set

explain(self: duckdb.DuckDBPyRelation) str
fetchall(self: duckdb.DuckDBPyRelation) object

Execute and fetch all rows as a list of tuples

fetchmany(self: duckdb.DuckDBPyRelation, size: int = 1) object

Execute and fetch the next set of rows as a list of tuples

fetchnumpy(self: duckdb.DuckDBPyRelation) dict

Execute and fetch all rows as a Python dict mapping each column to one numpy arrays

fetchone(self: duckdb.DuckDBPyRelation) object

Execute and fetch a single row as a tuple

filter(self: duckdb.DuckDBPyRelation, filter_expr: str) duckdb.DuckDBPyRelation

Filter the relation object by the filter in filter_expr

insert(self: duckdb.DuckDBPyRelation, values: object) None

Inserts the given values into the relation

insert_into(self: duckdb.DuckDBPyRelation, table_name: str) None

Inserts the relation object into an existing table named table_name

intersect(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Create the set intersection of this relation object with another relation object in other_rel

join(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation, condition: str, how: str = 'inner') duckdb.DuckDBPyRelation

Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are ‘inner’ and ‘left’

kurt(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the excess kurtosis of the aggregate column.

limit(self: duckdb.DuckDBPyRelation, n: int, offset: int = 0) duckdb.DuckDBPyRelation

Only retrieve the first n rows from this relation object, starting at offset

mad(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the median absolute deviation for the aggregate columns. NULL values are ignored. Temporal types return a positive INTERVAL.

map(self: duckdb.DuckDBPyRelation, map_function: function) duckdb.DuckDBPyRelation

Calls the passed function on the relation

max(self: duckdb.DuckDBPyRelation, max_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate max of a single column or a list of columns by the optional groups on the relation

mean(self: duckdb.DuckDBPyRelation, mean_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate mean of a single column or a list of columns by the optional groups on the relation

median(self: duckdb.DuckDBPyRelation, median_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate median of a single column or a list of columns by the optional groups on the relation

min(self: duckdb.DuckDBPyRelation, min_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate min of a single column or a list of columns by the optional groups on the relation

mode(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the most frequent value for the aggregate columns. NULL values are ignored.

order(self: duckdb.DuckDBPyRelation, order_expr: str) duckdb.DuckDBPyRelation

Reorder the relation object by order_expr

prod(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Calculates the product of the aggregate column.

project(self: duckdb.DuckDBPyRelation, project_expr: str) duckdb.DuckDBPyRelation

Project the relation object by the projection in project_expr

quantile(self: duckdb.DuckDBPyRelation, q: str, quantile_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the quantile of a single column or a list of columns by the optional groups on the relation

query(self: duckdb.DuckDBPyRelation, virtual_table_name: str, sql_query: str) duckdb.DuckDBPyRelation

Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object

record_batch(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.RecordBatchReader

Execute and return an Arrow Record Batch Reader that yields all rows

sem(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the standard error of the mean of the aggregate column.

set_alias(self: duckdb.DuckDBPyRelation, alias: str) duckdb.DuckDBPyRelation

Rename the relation object to new alias

property shape

Tuple of # of rows, # of columns in relation.

skew(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the skewness of the aggregate column.

std(self: duckdb.DuckDBPyRelation, std_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the standard deviation of a single column or a list of columns by the optional groups on the relation

sum(self: duckdb.DuckDBPyRelation, sum_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate sum of a single column or a list of columns by the optional groups on the relation

to_arrow_table(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table

Execute and fetch all rows as an Arrow Table

to_df(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) pandas.DataFrame

Execute and fetch all rows as a pandas DataFrame

property type

Get the type of the relation.

property types

Get the columns types of the result.

union(self: duckdb.DuckDBPyRelation, union_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Create the set union of this relation object with another relation object in other_rel

unique(self: duckdb.DuckDBPyRelation, unique_aggr: str) duckdb.DuckDBPyRelation

Number of distinct values in a column.

value_counts(self: duckdb.DuckDBPyRelation, value_counts_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Count number of rows with each unique value of variable

var(self: duckdb.DuckDBPyRelation, var_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the variance of a single column or a list of columns by the optional groups on the relation

write_csv(self: duckdb.DuckDBPyRelation, file_name: str) None

Write the relation object to a CSV file in file_name

class duckdb.DuckDBPyResult

Bases: pybind11_object

arrow(self: duckdb.DuckDBPyResult, chunk_size: int = 1000000) pyarrow.lib.Table

Fetch all rows as an Arrow Table

close(self: duckdb.DuckDBPyResult) None
description(self: duckdb.DuckDBPyResult) list
df(self: duckdb.DuckDBPyResult, *, date_as_object: bool = False) pandas.DataFrame

Fetch all rows as a pandas DataFrame

fetch_arrow_reader(self: duckdb.DuckDBPyResult, approx_batch_size: int) pyarrow.lib.RecordBatchReader

Fetch all rows as an Arrow Record Batch Reader

fetch_arrow_table(self: duckdb.DuckDBPyResult, chunk_size: int = 1000000) pyarrow.lib.Table

Fetch all rows as an Arrow Table

fetch_df(self: duckdb.DuckDBPyResult, *, date_as_object: bool = False) pandas.DataFrame

Fetch all rows as a pandas DataFrame

fetch_df_chunk(self: duckdb.DuckDBPyResult, num_of_vectors: int = 1, *, date_as_object: bool = False) pandas.DataFrame

Fetch a chunk of rows as a pandas DataFrame

fetchall(self: duckdb.DuckDBPyResult) list

Fetch all rows as a list of tuples

fetchdf(self: duckdb.DuckDBPyResult, *, date_as_object: bool = False) pandas.DataFrame

Fetch all rows as a pandas DataFrame

fetchmany(self: duckdb.DuckDBPyResult, size: int = 1) list

Fetch the next set of rows as a list of tuples

fetchnumpy(self: duckdb.DuckDBPyResult) dict

Fetch all rows as a Python dict mapping each column to one numpy arrays

fetchone(self: duckdb.DuckDBPyResult) object

Fetch a single row as a tuple

exception duckdb.Error

Bases: Exception

exception duckdb.FatalException

Bases: Error

exception duckdb.IOException

Bases: OperationalError

exception duckdb.IntegrityError

Bases: Error

exception duckdb.InternalError

Bases: Error

exception duckdb.InternalException

Bases: InternalError

exception duckdb.InterruptException

Bases: Error

exception duckdb.InvalidInputException

Bases: ProgrammingError

exception duckdb.InvalidTypeException

Bases: ProgrammingError

exception duckdb.NotImplementedException

Bases: NotSupportedError

exception duckdb.NotSupportedError

Bases: Error

exception duckdb.OperationalError

Bases: Error

exception duckdb.OutOfMemoryException

Bases: OperationalError

exception duckdb.OutOfRangeException

Bases: DataError

exception duckdb.ParserException

Bases: ProgrammingError

exception duckdb.PermissionException

Bases: Error

exception duckdb.ProgrammingError

Bases: Error

exception duckdb.SequenceException

Bases: Error

exception duckdb.SerializationException

Bases: OperationalError

exception duckdb.StandardException

Bases: Error

exception duckdb.SyntaxException

Bases: ProgrammingError

exception duckdb.TransactionException

Bases: OperationalError

exception duckdb.TypeMismatchException

Bases: DataError

exception duckdb.ValueOutOfRangeException

Bases: DataError

exception duckdb.Warning

Bases: Exception

duckdb.aggregate(df: pandas.DataFrame, aggr_expr: str, group_expr: str = '', connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Compute the aggregate aggr_expr by the optional groups group_expr on Data.frame df

duckdb.alias(df: pandas.DataFrame, alias: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation from Data.Frame df with the passed alias

duckdb.arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

duckdb.connect(database: str = ':memory:', read_only: bool = False, config: object = None) duckdb.DuckDBPyConnection

Create a DuckDB database instance. Can take a database file name to read/write persistent data and a read_only flag if no changes are desired

duckdb.df(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object from the Data.Frame df

duckdb.distinct(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Compute the distinct rows from Data.Frame df

duckdb.filter(df: pandas.DataFrame, filter_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Filter the Data.Frame df by the filter in filter_expr

duckdb.from_arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

duckdb.from_csv_auto(file_name: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Creates a relation object from the CSV file in file_name

duckdb.from_df(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object from the Data.Frame df

duckdb.from_parquet(*args, **kwargs)

Overloaded function.

  1. from_parquet(file_name: str, binary_as_string: bool, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Creates a relation object from the Parquet file in file_name

  1. from_parquet(file_name: str, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Creates a relation object from the Parquet file in file_name

duckdb.from_query(query: str, alias: str = 'query_relation', connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object from the given SQL query

duckdb.from_substrait(proto: bytes, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Creates a query object from the substrait plan

duckdb.get_substrait(query: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Serialize a query object to protobuf

duckdb.get_substrait_json(query: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Serialize a query object to protobuf

duckdb.limit(df: pandas.DataFrame, n: int, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Retrieve the first n rows from the Data.Frame df

duckdb.order(df: pandas.DataFrame, order_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Reorder the Data.Frame df by order_expr

duckdb.project(df: pandas.DataFrame, project_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Project the Data.Frame df by the projection in project_expr

duckdb.query(query: str, alias: str = 'query_relation', connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

duckdb.query_df(df: pandas.DataFrame, virtual_table_name: str, sql_query: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyResult

Run the given SQL query in sql_query on the view named virtual_table_name that contains the content of Data.Frame df

class duckdb.token_type

Bases: pybind11_object

Members:

identifier

numeric_const

string_const

operator

keyword

comment

comment = <token_type.comment: 5>
identifier = <token_type.identifier: 0>
keyword = <token_type.keyword: 4>
property name
numeric_const = <token_type.numeric_const: 1>
operator = <token_type.operator: 3>
string_const = <token_type.string_const: 2>
property value
duckdb.tokenize(query: str) object

Tokenizes a SQL string, returning a list of (position, type) tuples that can be used for e.g. syntax highlighting

duckdb.values(values: object, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object from the passed values

duckdb.write_csv(df: pandas.DataFrame, file_name: str, connection: duckdb.DuckDBPyConnection = None) None

Write the Data.Frame df to a CSV file in file_name