Python Client API

Version 0.8.1

Version:

DuckDB is an embeddable SQL OLAP Database Management System

duckdb.threadsafety bool¶: Indicates that this package is threadsafe

duckdb.apilevel int¶: Indicates which Python DBAPI version this package implements

duckdb.paramstyle str¶: Indicates which parameter style duckdb supports

duckdb.default_connection duckdb.DuckDBPyConnection¶: The connection that is used by default if you don’t explicitly pass one to the root methods in this module

exception duckdb.BinderException¶: Bases: ProgrammingError

exception duckdb.CastException¶: Bases: DataError

exception duckdb.CatalogException¶: Bases: ProgrammingError

exception duckdb.ConnectionException¶: Bases: OperationalError

exception duckdb.ConstraintException¶: Bases: IntegrityError

exception duckdb.ConversionException¶: Bases: DataError

exception duckdb.DataError¶: Bases: Error

class duckdb.DuckDBPyConnection¶

Bases: pybind11_object

append(self: duckdb.DuckDBPyConnection, table_name: str, df: pandas.DataFrame, *, by_name: bool = False) → duckdb.DuckDBPyConnection¶: Append the passed DataFrame to the named table

array_type(self: duckdb.DuckDBPyConnection, type: duckdb.typing.DuckDBPyType) → duckdb.typing.DuckDBPyType¶: Create an array type object of ‘type’

arrow(self: duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) → pyarrow.lib.Table¶: Fetch a result as Arrow table following execute()

begin(self: duckdb.DuckDBPyConnection) → duckdb.DuckDBPyConnection¶: Start a new transaction

close(self: duckdb.DuckDBPyConnection) → None¶: Close the connection

commit(self: duckdb.DuckDBPyConnection) → duckdb.DuckDBPyConnection¶: Commit changes performed within a transaction

create_function(self: duckdb.DuckDBPyConnection, name: str, function: function, return_type: object = None, parameters: duckdb.typing.DuckDBPyType = None, *, type: duckdb.functional.PythonUDFType = <PythonUDFType.NATIVE: 0>, null_handling: duckdb.functional.FunctionNullHandling = 0, exception_handling: duckdb.PythonExceptionHandling = 0, side_effects: bool = False) → duckdb.DuckDBPyConnection¶: Create a DuckDB function out of the passing in python function so it can be used in queries

cursor(self: duckdb.DuckDBPyConnection) → duckdb.DuckDBPyConnection¶: Create a duplicate of the current connection

decimal_type(self: duckdb.DuckDBPyConnection, width: int, scale: int) → duckdb.typing.DuckDBPyType¶: Create a decimal type with ‘width’ and ‘scale’

property description¶: Get result set attributes, mainly column names

df(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶: Fetch a result as DataFrame following execute()

dtype(self: duckdb.DuckDBPyConnection, type_str: str) → duckdb.typing.DuckDBPyType¶: Create a type object by parsing the ‘type_str’ string

duplicate(self: duckdb.DuckDBPyConnection) → duckdb.DuckDBPyConnection¶: Create a duplicate of the current connection

enum_type(self: duckdb.DuckDBPyConnection, name: str, type: duckdb.typing.DuckDBPyType, values: list) → duckdb.typing.DuckDBPyType¶: Create an enum type of underlying ‘type’, consisting of the list of ‘values’

execute(self: duckdb.DuckDBPyConnection, query: str, parameters: object = None, multiple_parameter_sets: bool = False) → duckdb.DuckDBPyConnection¶: Execute the given SQL query, optionally using prepared statements with parameters set

executemany(self: duckdb.DuckDBPyConnection, query: str, parameters: object = None) → duckdb.DuckDBPyConnection¶: Execute the given prepared statement multiple times using the list of parameter sets in parameters

fetch_arrow_table(self: duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) → pyarrow.lib.Table¶: Fetch a result as Arrow table following execute()

fetch_df(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶: Fetch a result as DataFrame following execute()

fetch_df_chunk(self: duckdb.DuckDBPyConnection, vectors_per_chunk: int = 1, *, date_as_object: bool = False) → pandas.DataFrame¶: Fetch a chunk of the result as Data.Frame following execute()

fetch_record_batch(self: duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) → pyarrow.lib.RecordBatchReader¶: Fetch an Arrow RecordBatchReader following execute()

fetchall(self: duckdb.DuckDBPyConnection) → list¶: Fetch all rows from a result following execute

fetchdf(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶: Fetch a result as DataFrame following execute()

fetchmany(self: duckdb.DuckDBPyConnection, size: int = 1) → list¶: Fetch the next set of rows from a result following execute

fetchnumpy(self: duckdb.DuckDBPyConnection) → dict¶: Fetch a result as list of NumPy arrays following execute

fetchone(self: duckdb.DuckDBPyConnection) → Optional[tuple]¶: Fetch a single row from a result following execute

filesystem_is_registered(self: duckdb.DuckDBPyConnection, name: str) → bool¶: Check if a filesystem with the provided name is currently registered

from_arrow(self: duckdb.DuckDBPyConnection, arrow_object: object) → duckdb.DuckDBPyRelation¶: Create a relation object from an Arrow object

from_csv_auto(self: duckdb.DuckDBPyConnection, name: object, *, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None, null_padding: object = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the CSV file in ‘name’

from_df(self: duckdb.DuckDBPyConnection, df: pandas.DataFrame = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the Data.Frame in df

from_parquet(*args, **kwargs)¶

Overloaded function.

from_parquet(self: duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

from_parquet(self: duckdb.DuckDBPyConnection, file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

from_query(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') → duckdb.DuckDBPyRelation¶: Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

from_substrait(self: duckdb.DuckDBPyConnection, proto: bytes) → duckdb.DuckDBPyRelation¶: Create a query object from protobuf plan

from_substrait_json(self: duckdb.DuckDBPyConnection, json: str) → duckdb.DuckDBPyRelation¶: Create a query object from a JSON protobuf plan

get_substrait(self: duckdb.DuckDBPyConnection, query: str, *, enable_optimizer: bool = True) → duckdb.DuckDBPyRelation¶: Serialize a query to protobuf

get_substrait_json(self: duckdb.DuckDBPyConnection, query: str, *, enable_optimizer: bool = True) → duckdb.DuckDBPyRelation¶: Serialize a query to protobuf on the JSON format

get_table_names(self: duckdb.DuckDBPyConnection, query: str) → Set[str]¶: Extract the required table names from a query

install_extension(self: duckdb.DuckDBPyConnection, extension: str, *, force_install: bool = False) → None¶: Install an extension by name

list_filesystems(self: duckdb.DuckDBPyConnection) → list¶: List registered filesystems, including builtin ones

list_type(self: duckdb.DuckDBPyConnection, type: duckdb.typing.DuckDBPyType) → duckdb.typing.DuckDBPyType¶: Create an array type object of ‘type’

load_extension(self: duckdb.DuckDBPyConnection, extension: str) → None¶: Load an installed extension

map_type(self: duckdb.DuckDBPyConnection, key: duckdb.typing.DuckDBPyType, value: duckdb.typing.DuckDBPyType) → duckdb.typing.DuckDBPyType¶: Create a map type object from ‘key_type’ and ‘value_type’

pl(self: duckdb.DuckDBPyConnection, rows_per_batch: int = 1000000) → duckdb::PolarsDataFrame¶: Fetch a result as Polars DataFrame following execute()

query(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') → duckdb.DuckDBPyRelation¶: Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

read_csv(self: duckdb.DuckDBPyConnection, name: object, *, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None, null_padding: object = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the CSV file in ‘name’

read_json(self: duckdb.DuckDBPyConnection, name: str, *, columns: Optional[object] = None, sample_size: Optional[object] = None, maximum_depth: Optional[object] = None, records: Optional[str] = None, format: Optional[str] = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the JSON file in ‘name’

read_parquet(*args, **kwargs)¶

Overloaded function.

read_parquet(self: duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

read_parquet(self: duckdb.DuckDBPyConnection, file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

register(self: duckdb.DuckDBPyConnection, view_name: str, python_object: object) → duckdb.DuckDBPyConnection¶: Register the passed Python Object value for querying with a view

register_filesystem(self: duckdb.DuckDBPyConnection, filesystem: fsspec.AbstractFileSystem) → None¶: Register a fsspec compliant filesystem

remove_function(self: duckdb.DuckDBPyConnection, name: str) → duckdb.DuckDBPyConnection¶: Remove a previously created function

rollback(self: duckdb.DuckDBPyConnection) → duckdb.DuckDBPyConnection¶: Roll back changes performed within a transaction

row_type(self: duckdb.DuckDBPyConnection, fields: object) → duckdb.typing.DuckDBPyType¶: Create a struct type object from ‘fields’

sql(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') → duckdb.DuckDBPyRelation¶: Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

sqltype(self: duckdb.DuckDBPyConnection, type_str: str) → duckdb.typing.DuckDBPyType¶: Create a type object by parsing the ‘type_str’ string

string_type(self: duckdb.DuckDBPyConnection, collation: str = '') → duckdb.typing.DuckDBPyType¶: Create a string type with an optional collation

struct_type(self: duckdb.DuckDBPyConnection, fields: object) → duckdb.typing.DuckDBPyType¶: Create a struct type object from ‘fields’

table(self: duckdb.DuckDBPyConnection, table_name: str) → duckdb.DuckDBPyRelation¶: Create a relation object for the name’d table

table_function(self: duckdb.DuckDBPyConnection, name: str, parameters: object = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the name’d table function with given parameters

tf(self: duckdb.DuckDBPyConnection) → dict¶: Fetch a result as dict of TensorFlow Tensors following execute()

torch(self: duckdb.DuckDBPyConnection) → dict¶: Fetch a result as dict of PyTorch Tensors following execute()

type(self: duckdb.DuckDBPyConnection, type_str: str) → duckdb.typing.DuckDBPyType¶: Create a type object by parsing the ‘type_str’ string

union_type(self: duckdb.DuckDBPyConnection, members: object) → duckdb.typing.DuckDBPyType¶: Create a union type object from ‘members’

unregister(self: duckdb.DuckDBPyConnection, view_name: str) → duckdb.DuckDBPyConnection¶: Unregister the view name

unregister_filesystem(self: duckdb.DuckDBPyConnection, name: str) → None¶: Unregister a filesystem

values(self: duckdb.DuckDBPyConnection, values: object) → duckdb.DuckDBPyRelation¶: Create a relation object from the passed values

view(self: duckdb.DuckDBPyConnection, view_name: str) → duckdb.DuckDBPyRelation¶: Create a relation object for the name’d view

class duckdb.DuckDBPyRelation¶

Bases: pybind11_object

abs(self: duckdb.DuckDBPyRelation, aggregation_columns: str) → duckdb.DuckDBPyRelation¶: Returns the absolute value for the specified columns.

aggregate(self: duckdb.DuckDBPyRelation, aggr_expr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the aggregate aggr_expr by the optional groups group_expr on the relation

property alias¶: Get the name of the current alias

apply(self: duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') → duckdb.DuckDBPyRelation¶: Compute the function of a single column or a list of columns by the optional groups on the relation

arrow(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) → pyarrow.lib.Table¶: Execute and fetch all rows as an Arrow Table

close(self: duckdb.DuckDBPyRelation) → None¶: Closes the result

property columns¶: Return a list containing the names of the columns of the relation.

count(self: duckdb.DuckDBPyRelation, count_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the aggregate count of a single column or a list of columns by the optional groups on the relation

create(self: duckdb.DuckDBPyRelation, table_name: str) → None¶: Creates a new table named table_name with the contents of the relation object

create_view(self: duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) → duckdb.DuckDBPyRelation¶: Creates a view named view_name that refers to the relation object

cummax(self: duckdb.DuckDBPyRelation, aggregation_columns: str) → duckdb.DuckDBPyRelation¶: Returns the cumulative maximum of the aggregate column.

cummin(self: duckdb.DuckDBPyRelation, aggregation_columns: str) → duckdb.DuckDBPyRelation¶: Returns the cumulative minimum of the aggregate column.

cumprod(self: duckdb.DuckDBPyRelation, aggregation_columns: str) → duckdb.DuckDBPyRelation¶: Returns the cumulative product of the aggregate column.

cumsum(self: duckdb.DuckDBPyRelation, aggregation_columns: str) → duckdb.DuckDBPyRelation¶: Returns the cumulative sum of the aggregate column.

describe(self: duckdb.DuckDBPyRelation) → duckdb.DuckDBPyRelation¶: Gives basic statistics (e.g., min,max) and if null exists for each column of the relation.

property description¶: Return the description of the result

df(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) → pandas.DataFrame¶: Execute and fetch all rows as a pandas DataFrame

distinct(self: duckdb.DuckDBPyRelation) → duckdb.DuckDBPyRelation¶: Retrieve distinct rows from this relation object

property dtypes¶: Return a list containing the types of the columns of the relation.

except_(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation) → duckdb.DuckDBPyRelation¶: Create the set except of this relation object with another relation object in other_rel

execute(self: duckdb.DuckDBPyRelation) → duckdb.DuckDBPyRelation¶: Transform the relation into a result set

explain(self: duckdb.DuckDBPyRelation, type: duckdb.ExplainType = 'standard') → str¶

fetch_arrow_reader(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) → pyarrow.lib.RecordBatchReader¶: Execute and return an Arrow Record Batch Reader that yields all rows

fetch_arrow_table(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) → pyarrow.lib.Table¶: Execute and fetch all rows as an Arrow Table

fetchall(self: duckdb.DuckDBPyRelation) → list¶: Execute and fetch all rows as a list of tuples

fetchdf(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) → pandas.DataFrame¶: Execute and fetch all rows as a pandas DataFrame

fetchmany(self: duckdb.DuckDBPyRelation, size: int = 1) → list¶: Execute and fetch the next set of rows as a list of tuples

fetchnumpy(self: duckdb.DuckDBPyRelation) → dict¶: Execute and fetch all rows as a Python dict mapping each column to one numpy arrays

fetchone(self: duckdb.DuckDBPyRelation) → Optional[tuple]¶: Execute and fetch a single row as a tuple

filter(self: duckdb.DuckDBPyRelation, filter_expr: str) → duckdb.DuckDBPyRelation¶: Filter the relation object by the filter in filter_expr

insert(self: duckdb.DuckDBPyRelation, values: object) → None¶: Inserts the given values into the relation

insert_into(self: duckdb.DuckDBPyRelation, table_name: str) → None¶: Inserts the relation object into an existing table named table_name

intersect(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation) → duckdb.DuckDBPyRelation¶: Create the set intersection of this relation object with another relation object in other_rel

join(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation, condition: str, how: str = 'inner') → duckdb.DuckDBPyRelation¶: Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are ‘inner’ and ‘left’

kurt(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') → duckdb.DuckDBPyRelation¶: Returns the excess kurtosis of the aggregate column.

limit(self: duckdb.DuckDBPyRelation, n: int, offset: int = 0) → duckdb.DuckDBPyRelation¶: Only retrieve the first n rows from this relation object, starting at offset

mad(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') → duckdb.DuckDBPyRelation¶: Returns the median absolute deviation for the aggregate columns. NULL values are ignored. Temporal types return a positive INTERVAL.

map(self: duckdb.DuckDBPyRelation, map_function: function, *, schema: Optional[object] = None) → duckdb.DuckDBPyRelation¶: Calls the passed function on the relation

max(self: duckdb.DuckDBPyRelation, max_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the aggregate max of a single column or a list of columns by the optional groups on the relation

mean(self: duckdb.DuckDBPyRelation, mean_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the aggregate mean of a single column or a list of columns by the optional groups on the relation

median(self: duckdb.DuckDBPyRelation, median_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the aggregate median of a single column or a list of columns by the optional groups on the relation

min(self: duckdb.DuckDBPyRelation, min_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the aggregate min of a single column or a list of columns by the optional groups on the relation

mode(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') → duckdb.DuckDBPyRelation¶: Returns the most frequent value for the aggregate columns. NULL values are ignored.

order(self: duckdb.DuckDBPyRelation, order_expr: str) → duckdb.DuckDBPyRelation¶: Reorder the relation object by order_expr

pl(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) → duckdb::PolarsDataFrame¶: Execute and fetch all rows as a Polars DataFrame

prod(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') → duckdb.DuckDBPyRelation¶: Calculates the product of the aggregate column.

project(self: duckdb.DuckDBPyRelation, project_expr: str) → duckdb.DuckDBPyRelation¶: Project the relation object by the projection in project_expr

quantile(self: duckdb.DuckDBPyRelation, q: str, quantile_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the quantile of a single column or a list of columns by the optional groups on the relation

query(self: duckdb.DuckDBPyRelation, virtual_table_name: str, sql_query: str) → duckdb.DuckDBPyRelation¶: Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object

record_batch(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) → pyarrow.lib.RecordBatchReader¶: Execute and return an Arrow Record Batch Reader that yields all rows

select_dtypes(self: duckdb.DuckDBPyRelation, types: object) → duckdb.DuckDBPyRelation¶: Select columns from the relation, by filtering based on type(s)

select_types(self: duckdb.DuckDBPyRelation, types: object) → duckdb.DuckDBPyRelation¶: Select columns from the relation, by filtering based on type(s)

sem(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') → duckdb.DuckDBPyRelation¶: Returns the standard error of the mean of the aggregate column.

set_alias(self: duckdb.DuckDBPyRelation, alias: str) → duckdb.DuckDBPyRelation¶: Rename the relation object to new alias

property shape¶: Tuple of # of rows, # of columns in relation.

show(self: duckdb.DuckDBPyRelation) → None¶: Display a summary of the data

skew(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') → duckdb.DuckDBPyRelation¶: Returns the skewness of the aggregate column.

sql_query(self: duckdb.DuckDBPyRelation) → str¶: Get the SQL query that is equivalent to the relation

std(self: duckdb.DuckDBPyRelation, std_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the standard deviation of a single column or a list of columns by the optional groups on the relation

sum(self: duckdb.DuckDBPyRelation, sum_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the aggregate sum of a single column or a list of columns by the optional groups on the relation

tf(self: duckdb.DuckDBPyRelation) → dict¶: Fetch a result as dict of TensorFlow Tensors

to_arrow_table(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) → pyarrow.lib.Table¶: Execute and fetch all rows as an Arrow Table

to_csv(self: duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None) → None¶: Write the relation object to a CSV file in ‘file_name’

to_df(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) → pandas.DataFrame¶: Execute and fetch all rows as a pandas DataFrame

to_parquet(self: duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None) → None¶: Write the relation object to a Parquet file in ‘file_name’

to_table(self: duckdb.DuckDBPyRelation, table_name: str) → None¶: Creates a new table named table_name with the contents of the relation object

to_view(self: duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) → duckdb.DuckDBPyRelation¶: Creates a view named view_name that refers to the relation object

torch(self: duckdb.DuckDBPyRelation) → dict¶: Fetch a result as dict of PyTorch Tensors

property type¶: Get the type of the relation.

property types¶: Return a list containing the types of the columns of the relation.

union(self: duckdb.DuckDBPyRelation, union_rel: duckdb.DuckDBPyRelation) → duckdb.DuckDBPyRelation¶: Create the set union of this relation object with another relation object in other_rel

unique(self: duckdb.DuckDBPyRelation, unique_aggr: str) → duckdb.DuckDBPyRelation¶: Number of distinct values in a column.

value_counts(self: duckdb.DuckDBPyRelation, value_counts_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Count number of rows with each unique value of variable

var(self: duckdb.DuckDBPyRelation, var_aggr: str, group_expr: str = '') → duckdb.DuckDBPyRelation¶: Compute the variance of a single column or a list of columns by the optional groups on the relation

write_csv(self: duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None) → None¶: Write the relation object to a CSV file in ‘file_name’

write_parquet(self: duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None) → None¶: Write the relation object to a Parquet file in ‘file_name’

exception duckdb.Error¶: Bases: Exception

class duckdb.ExplainType¶

Bases: pybind11_object

Members:

STANDARD

ANALYZE

ANALYZE = <ExplainType.ANALYZE: 1>¶

STANDARD = <ExplainType.STANDARD: 0>¶

property name¶

property value¶

exception duckdb.FatalException¶: Bases: Error

exception duckdb.HTTPException¶

Bases: IOException

Thrown when an error occurs in the httpfs extension, or whilst downloading an extension.

body: str¶

headers: Dict[str, str]¶

reason: str¶

status_code: int¶

exception duckdb.IOException¶: Bases: OperationalError

exception duckdb.IntegrityError¶: Bases: Error

exception duckdb.InternalError¶: Bases: Error

exception duckdb.InternalException¶: Bases: InternalError

exception duckdb.InterruptException¶: Bases: Error

exception duckdb.InvalidInputException¶: Bases: ProgrammingError

exception duckdb.InvalidTypeException¶: Bases: ProgrammingError

exception duckdb.NotImplementedException¶: Bases: NotSupportedError

exception duckdb.NotSupportedError¶: Bases: Error

exception duckdb.OperationalError¶: Bases: Error

exception duckdb.OutOfMemoryException¶: Bases: OperationalError

exception duckdb.OutOfRangeException¶: Bases: DataError

exception duckdb.ParserException¶: Bases: ProgrammingError

exception duckdb.PermissionException¶: Bases: Error

exception duckdb.ProgrammingError¶: Bases: Error

class duckdb.PythonExceptionHandling¶

Bases: pybind11_object

Members:

DEFAULT

RETURN_NULL

DEFAULT = <PythonExceptionHandling.DEFAULT: 0>¶

RETURN_NULL = <PythonExceptionHandling.RETURN_NULL: 1>¶

property name¶

property value¶

exception duckdb.SequenceException¶: Bases: Error

exception duckdb.SerializationException¶: Bases: OperationalError

exception duckdb.StandardException¶: Bases: Error

exception duckdb.SyntaxException¶: Bases: ProgrammingError

exception duckdb.TransactionException¶: Bases: OperationalError

exception duckdb.TypeMismatchException¶: Bases: DataError

exception duckdb.ValueOutOfRangeException¶: Bases: DataError

exception duckdb.Warning¶: Bases: Exception

duckdb.aggregate(df: pandas.DataFrame, aggr_expr: str, group_expr: str = '', connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Compute the aggregate aggr_expr by the optional groups group_expr on DataFrame df

duckdb.alias(df: pandas.DataFrame, alias: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Create a relation from DataFrame df with the passed alias

duckdb.append(table_name: str, df: pandas.DataFrame, *, by_name: bool = False, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Append the passed DataFrame to the named table

duckdb.array_type(type: duckdb.typing.DuckDBPyType, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create an array type object of ‘type’

duckdb.arrow(*args, **kwargs)¶

Overloaded function.

arrow(rows_per_batch: int = 1000000, connection: duckdb.DuckDBPyConnection = None) -> pyarrow.lib.Table

Fetch a result as Arrow table following execute()

arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

duckdb.begin(connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Start a new transaction

duckdb.close(connection: duckdb.DuckDBPyConnection = None) → None¶: Close the connection

duckdb.commit(connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Commit changes performed within a transaction

duckdb.connect(database: str = ':memory:', read_only: bool = False, config: dict = None) → duckdb.DuckDBPyConnection¶: Create a DuckDB database instance. Can take a database file name to read/write persistent data and a read_only flag if no changes are desired

duckdb.create_function(name: str, function: function, return_type: object = None, parameters: duckdb.typing.DuckDBPyType = None, *, type: duckdb.functional.PythonUDFType = <PythonUDFType.NATIVE: 0>, null_handling: duckdb.functional.FunctionNullHandling = 0, exception_handling: duckdb.PythonExceptionHandling = 0, side_effects: bool = False, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Create a DuckDB function out of the passing in python function so it can be used in queries

duckdb.cursor(connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Create a duplicate of the current connection

duckdb.decimal_type(width: int, scale: int, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a decimal type with ‘width’ and ‘scale’

duckdb.description(connection: duckdb.DuckDBPyConnection = None) → Optional[list]¶: Get result set attributes, mainly column names

duckdb.df(*args, **kwargs)¶

Overloaded function.

df(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) -> pandas.DataFrame

Fetch a result as DataFrame following execute()

df(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the DataFrame df

duckdb.distinct(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Compute the distinct rows from DataFrame df

duckdb.dtype(type_str: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a type object from ‘type_str’

duckdb.duplicate(connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Create a duplicate of the current connection

duckdb.enum_type(name: str, type: duckdb.typing.DuckDBPyType, values: list, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create an enum type of underlying ‘type’, consisting of the list of ‘values’

duckdb.execute(query: str, parameters: object = None, multiple_parameter_sets: bool = False, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Execute the given SQL query, optionally using prepared statements with parameters set

duckdb.executemany(query: str, parameters: object = None, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Execute the given prepared statement multiple times using the list of parameter sets in parameters

duckdb.fetch_arrow_table(rows_per_batch: int = 1000000, connection: duckdb.DuckDBPyConnection = None) → pyarrow.lib.Table¶: Fetch a result as Arrow table following execute()

duckdb.fetch_df(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) → pandas.DataFrame¶: Fetch a result as DataFrame following execute()

duckdb.fetch_df_chunk(vectors_per_chunk: int = 1, *, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) → pandas.DataFrame¶: Fetch a chunk of the result as DataFrame following execute()

duckdb.fetch_record_batch(rows_per_batch: int = 1000000, connection: duckdb.DuckDBPyConnection = None) → pyarrow.lib.RecordBatchReader¶: Fetch an Arrow RecordBatchReader following execute()

duckdb.fetchall(connection: duckdb.DuckDBPyConnection = None) → list¶: Fetch all rows from a result following execute

duckdb.fetchdf(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) → pandas.DataFrame¶: Fetch a result as DataFrame following execute()

duckdb.fetchmany(size: int = 1, connection: duckdb.DuckDBPyConnection = None) → list¶: Fetch the next set of rows from a result following execute

duckdb.fetchnumpy(connection: duckdb.DuckDBPyConnection = None) → dict¶: Fetch a result as list of NumPy arrays following execute

duckdb.fetchone(connection: duckdb.DuckDBPyConnection = None) → Optional[tuple]¶: Fetch a single row from a result following execute

duckdb.filesystem_is_registered(name: str, connection: duckdb.DuckDBPyConnection = None) → bool¶: Check if a filesystem with the provided name is currently registered

duckdb.filter(df: pandas.DataFrame, filter_expr: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Filter the DataFrame df by the filter in filter_expr

duckdb.from_arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Create a relation object from an Arrow object

duckdb.from_csv_auto(name: object, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None, null_padding: object = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the CSV file in ‘name’

duckdb.from_df(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the DataFrame df

duckdb.from_parquet(*args, **kwargs)¶

Overloaded function.

from_parquet(file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

from_parquet(file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

duckdb.from_query(*args, **kwargs)¶

Overloaded function.

from_query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the given SQL query

from_query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the given SQL query

duckdb.from_substrait(*args, **kwargs)¶

Overloaded function.

from_substrait(proto: bytes, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Creates a query object from the substrait plan

from_substrait(proto: bytes, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a query object from protobuf plan

duckdb.from_substrait_json(json: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Serialize a query object to protobuf

duckdb.get_substrait(*args, **kwargs)¶

Overloaded function.

get_substrait(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation

Serialize a query object to protobuf

get_substrait(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation

Serialize a query to protobuf

duckdb.get_substrait_json(*args, **kwargs)¶

Overloaded function.

get_substrait_json(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation

Serialize a query object to protobuf

get_substrait_json(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation

Serialize a query to protobuf on the JSON format

duckdb.get_table_names(query: str, connection: duckdb.DuckDBPyConnection = None) → Set[str]¶: Extract the required table names from a query

duckdb.install_extension(extension: str, *, force_install: bool = False, connection: duckdb.DuckDBPyConnection = None) → None¶: Install an extension by name

duckdb.limit(df: pandas.DataFrame, n: int, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Retrieve the first n rows from the DataFrame df

duckdb.list_filesystems(connection: duckdb.DuckDBPyConnection = None) → list¶: List registered filesystems, including builtin ones

duckdb.list_type(type: duckdb.typing.DuckDBPyType, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create an array type object of ‘type’

duckdb.load_extension(extension: str, connection: duckdb.DuckDBPyConnection = None) → None¶: Load an installed extension

duckdb.map_type(key: duckdb.typing.DuckDBPyType, value: duckdb.typing.DuckDBPyType, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a map type object from ‘key_type’ and ‘value_type’

duckdb.order(df: pandas.DataFrame, order_expr: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Reorder the DataFrame df by order_expr

duckdb.pl(rows_per_batch: int = 1000000, connection: duckdb.DuckDBPyConnection = None) → duckdb::PolarsDataFrame¶: Fetch a result as Polars DataFrame following execute()

duckdb.project(df: pandas.DataFrame, project_expr: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Project the DataFrame df by the projection in project_expr

duckdb.query(*args, **kwargs)¶

Overloaded function.

query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

duckdb.query_df(df: pandas.DataFrame, virtual_table_name: str, sql_query: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Run the given SQL query in sql_query on the view named virtual_table_name that contains the content of DataFrame df

duckdb.read_csv(name: object, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None, null_padding: object = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the CSV file in ‘name’

duckdb.read_json(name: str, connection: duckdb.DuckDBPyConnection = None, columns: Optional[object] = None, sample_size: Optional[object] = None, maximum_depth: Optional[object] = None, records: Optional[str] = None, format: Optional[str] = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the JSON file in ‘name’

duckdb.read_parquet(*args, **kwargs)¶

Overloaded function.

read_parquet(file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

read_parquet(file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

duckdb.register(view_name: str, python_object: object, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Register the passed Python Object value for querying with a view

duckdb.register_filesystem(filesystem: fsspec.AbstractFileSystem, connection: duckdb.DuckDBPyConnection = None) → None¶: Register a fsspec compliant filesystem

duckdb.remove_function(name: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Remove a previously created function

duckdb.rollback(connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Roll back changes performed within a transaction

duckdb.row_type(fields: object, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a struct type object from ‘fields’

duckdb.sql(query: str, alias: str = 'query_relation', connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

duckdb.sqltype(type_str: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a type object from ‘type_str’

duckdb.string_type(collation: str = '', connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a string type with an optional collation

duckdb.struct_type(fields: object, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a struct type object from ‘fields’

duckdb.table(table_name: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Create a relation object for the name’d table

duckdb.table_function(name: str, parameters: object = None, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Create a relation object from the name’d table function with given parameters

duckdb.tf(connection: duckdb.DuckDBPyConnection = None) → dict¶: Fetch a result as dict of TensorFlow Tensors following execute()

class duckdb.token_type¶

Bases: pybind11_object

Members:

identifier

numeric_const

string_const

operator

keyword

comment

comment = <token_type.comment: 5>¶

identifier = <token_type.identifier: 0>¶

keyword = <token_type.keyword: 4>¶

property name¶

numeric_const = <token_type.numeric_const: 1>¶

operator = <token_type.operator: 3>¶

string_const = <token_type.string_const: 2>¶

property value¶

duckdb.tokenize(query: str) → list¶: Tokenizes a SQL string, returning a list of (position, type) tuples that can be used for e.g. syntax highlighting

duckdb.torch(connection: duckdb.DuckDBPyConnection = None) → dict¶: Fetch a result as dict of PyTorch Tensors following execute()

duckdb.type(type_str: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a type object from ‘type_str’

duckdb.union_type(members: object, connection: duckdb.DuckDBPyConnection = None) → duckdb.typing.DuckDBPyType¶: Create a union type object from ‘members’

duckdb.unregister(view_name: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyConnection¶: Unregister the view name

duckdb.unregister_filesystem(name: str, connection: duckdb.DuckDBPyConnection = None) → None¶: Unregister a filesystem

duckdb.values(*args, **kwargs)¶

Overloaded function.

values(values: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the passed values

values(values: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the passed values

duckdb.view(view_name: str, connection: duckdb.DuckDBPyConnection = None) → duckdb.DuckDBPyRelation¶: Create a relation object for the name’d view

duckdb.write_csv(df: pandas.DataFrame, file_name: str, connection: duckdb.DuckDBPyConnection = None) → None¶: Write the DataFrame df to a CSV file in file_name

Search Shortcut cmd + k | ctrl + k