DuckDB is an embeddable SQL OLAP Database Management System

duckdb.threadsafety bool

Indicates that this package is threadsafe

duckdb.apilevel int

Indicates which Python DBAPI version this package implements

duckdb.paramstyle str

Indicates which parameter style duckdb supports

duckdb.default_connection duckdb.DuckDBPyConnection

The connection that is used by default if you don’t explicitly pass one to the root methods in this module

exception duckdb.BinderException

Bases: ProgrammingError

exception duckdb.CastException

Bases: DataError

exception duckdb.CatalogException

Bases: ProgrammingError

exception duckdb.ConnectionException

Bases: OperationalError

exception duckdb.ConstraintException

Bases: IntegrityError

exception duckdb.ConversionException

Bases: DataError

exception duckdb.DataError

Bases: Error

class duckdb.DuckDBPyConnection

Bases: pybind11_object

append(self: duckdb.DuckDBPyConnection, table_name: str, df: pandas.DataFrame) duckdb.DuckDBPyConnection

Append the passed Data.Frame to the named table

arrow(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.Table

Fetch a result as Arrow table following execute()

begin(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Start a new transaction

close(self: duckdb.DuckDBPyConnection) None

Close the connection

commit(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Commit changes performed within a transaction

cursor(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Create a duplicate of the current connection

property description

Get result set attributes, mainly column names

df(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as DataFrame following execute()

duplicate(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Create a duplicate of the current connection

execute(self: duckdb.DuckDBPyConnection, query: str, parameters: object = None, multiple_parameter_sets: bool = False) duckdb.DuckDBPyConnection

Execute the given SQL query, optionally using prepared statements with parameters set

executemany(self: duckdb.DuckDBPyConnection, query: str, parameters: object = None) duckdb.DuckDBPyConnection

Execute the given prepared statement multiple times using the list of parameter sets in parameters

fetch_arrow_table(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.Table

Fetch a result as Arrow table following execute()

fetch_df(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as DataFrame following execute()

fetch_df_chunk(self: duckdb.DuckDBPyConnection, vectors_per_chunk: int = 1, *, date_as_object: bool = False) pandas.DataFrame

Fetch a chunk of the result as Data.Frame following execute()

fetch_record_batch(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.RecordBatchReader

Fetch an Arrow RecordBatchReader following execute()

fetchall(self: duckdb.DuckDBPyConnection) list

Fetch all rows from a result following execute

fetchdf(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame

Fetch a result as DataFrame following execute()

fetchmany(self: duckdb.DuckDBPyConnection, size: int = 1) list

Fetch the next set of rows from a result following execute

fetchnumpy(self: duckdb.DuckDBPyConnection) dict

Fetch a result as list of NumPy arrays following execute

fetchone(self: duckdb.DuckDBPyConnection) Optional[tuple]

Fetch a single row from a result following execute

from_arrow(self: duckdb.DuckDBPyConnection, arrow_object: object) duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

from_csv_auto(self: duckdb.DuckDBPyConnection, name: str, *, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) duckdb.DuckDBPyRelation

Create a relation object from the CSV file in ‘name’

from_df(self: duckdb.DuckDBPyConnection, df: pandas.DataFrame = None) duckdb.DuckDBPyRelation

Create a relation object from the Data.Frame in df

from_parquet(*args, **kwargs)

Overloaded function.

  1. from_parquet(self: duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

  1. from_parquet(self: duckdb.DuckDBPyConnection, file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

from_query(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

from_substrait(self: duckdb.DuckDBPyConnection, proto: bytes) duckdb.DuckDBPyRelation

Create a query object from protobuf plan

from_substrait_json(self: duckdb.DuckDBPyConnection, json: str) duckdb.DuckDBPyRelation

Create a query object from a JSON protobuf plan

get_substrait(self: duckdb.DuckDBPyConnection, query: str, *, enable_optimizer: bool = True) duckdb.DuckDBPyRelation

Serialize a query to protobuf

get_substrait_json(self: duckdb.DuckDBPyConnection, query: str, *, enable_optimizer: bool = True) duckdb.DuckDBPyRelation

Serialize a query to protobuf on the JSON format

get_table_names(self: duckdb.DuckDBPyConnection, query: str) Set[str]

Extract the required table names from a query

install_extension(self: duckdb.DuckDBPyConnection, extension: str, *, force_install: bool = False) None

Install an extension by name

list_filesystems(self: duckdb.DuckDBPyConnection) list

List registered filesystems, including builtin ones

load_extension(self: duckdb.DuckDBPyConnection, extension: str) None

Load an installed extension

pl(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) duckdb::PolarsDataFrame

Fetch a result as Polars DataFrame following execute()

query(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

read_csv(self: duckdb.DuckDBPyConnection, name: str, *, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) duckdb.DuckDBPyRelation

Create a relation object from the CSV file in ‘name’

read_json(self: duckdb.DuckDBPyConnection, name: str, *, columns: object = None, sample_size: object = None, maximum_depth: object = None) duckdb.DuckDBPyRelation

Create a relation object from the JSON file in ‘name’

read_parquet(*args, **kwargs)

Overloaded function.

  1. read_parquet(self: duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

  1. read_parquet(self: duckdb.DuckDBPyConnection, file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

register(self: duckdb.DuckDBPyConnection, view_name: str, python_object: object) duckdb.DuckDBPyConnection

Register the passed Python Object value for querying with a view

register_filesystem(self: duckdb.DuckDBPyConnection, filesystem: fsspec.AbstractFileSystem) None

Register a fsspec compliant filesystem

rollback(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection

Roll back changes performed within a transaction

sql(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

table(self: duckdb.DuckDBPyConnection, table_name: str) duckdb.DuckDBPyRelation

Create a relation object for the name’d table

table_function(self: duckdb.DuckDBPyConnection, name: str, parameters: object = None) duckdb.DuckDBPyRelation

Create a relation object from the name’d table function with given parameters

unregister(self: duckdb.DuckDBPyConnection, view_name: str) duckdb.DuckDBPyConnection

Unregister the view name

unregister_filesystem(self: duckdb.DuckDBPyConnection, name: str) None

Unregister a filesystem

values(self: duckdb.DuckDBPyConnection, values: object) duckdb.DuckDBPyRelation

Create a relation object from the passed values

view(self: duckdb.DuckDBPyConnection, view_name: str) duckdb.DuckDBPyRelation

Create a relation object for the name’d view

class duckdb.DuckDBPyRelation

Bases: pybind11_object

abs(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the absolute value for the specified columns.

aggregate(self: duckdb.DuckDBPyRelation, aggr_expr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate aggr_expr by the optional groups group_expr on the relation

property alias

Get the name of the current alias

apply(self: duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') duckdb.DuckDBPyRelation

Compute the function of a single column or a list of columns by the optional groups on the relation

arrow(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table

Execute and fetch all rows as an Arrow Table

close(self: duckdb.DuckDBPyRelation) None

Closes the result

property columns

Return a list containing the names of the columns of the relation.

count(self: duckdb.DuckDBPyRelation, count_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate count of a single column or a list of columns by the optional groups on the relation

create(self: duckdb.DuckDBPyRelation, table_name: str) None

Creates a new table named table_name with the contents of the relation object

create_view(self: duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) duckdb.DuckDBPyRelation

Creates a view named view_name that refers to the relation object

cummax(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the cumulative maximum of the aggregate column.

cummin(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the cumulative minimum of the aggregate column.

cumprod(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the cumulative product of the aggregate column.

cumsum(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation

Returns the cumulative sum of the aggregate column.

describe(self: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Gives basic statistics (e.g., min,max) and if null exists for each column of the relation.

property description

Return the description of the result

df(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) pandas.DataFrame

Execute and fetch all rows as a pandas DataFrame

distinct(self: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Retrieve distinct rows from this relation object

property dtypes

Return a list containing the types of the columns of the relation.

except_(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Create the set except of this relation object with another relation object in other_rel

execute(self: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Transform the relation into a result set

explain(self: duckdb.DuckDBPyRelation) str
fetch_arrow_reader(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.RecordBatchReader

Execute and return an Arrow Record Batch Reader that yields all rows

fetch_arrow_table(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table

Execute and fetch all rows as an Arrow Table

fetchall(self: duckdb.DuckDBPyRelation) list

Execute and fetch all rows as a list of tuples

fetchdf(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) pandas.DataFrame

Execute and fetch all rows as a pandas DataFrame

fetchmany(self: duckdb.DuckDBPyRelation, size: int = 1) list

Execute and fetch the next set of rows as a list of tuples

fetchnumpy(self: duckdb.DuckDBPyRelation) dict

Execute and fetch all rows as a Python dict mapping each column to one numpy arrays

fetchone(self: duckdb.DuckDBPyRelation) Optional[tuple]

Execute and fetch a single row as a tuple

filter(self: duckdb.DuckDBPyRelation, filter_expr: str) duckdb.DuckDBPyRelation

Filter the relation object by the filter in filter_expr

insert(self: duckdb.DuckDBPyRelation, values: object) None

Inserts the given values into the relation

insert_into(self: duckdb.DuckDBPyRelation, table_name: str) None

Inserts the relation object into an existing table named table_name

intersect(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Create the set intersection of this relation object with another relation object in other_rel

join(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation, condition: str, how: str = 'inner') duckdb.DuckDBPyRelation

Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are ‘inner’ and ‘left’

kurt(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the excess kurtosis of the aggregate column.

limit(self: duckdb.DuckDBPyRelation, n: int, offset: int = 0) duckdb.DuckDBPyRelation

Only retrieve the first n rows from this relation object, starting at offset

mad(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the median absolute deviation for the aggregate columns. NULL values are ignored. Temporal types return a positive INTERVAL.

map(self: duckdb.DuckDBPyRelation, map_function: function) duckdb.DuckDBPyRelation

Calls the passed function on the relation

max(self: duckdb.DuckDBPyRelation, max_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate max of a single column or a list of columns by the optional groups on the relation

mean(self: duckdb.DuckDBPyRelation, mean_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate mean of a single column or a list of columns by the optional groups on the relation

median(self: duckdb.DuckDBPyRelation, median_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate median of a single column or a list of columns by the optional groups on the relation

min(self: duckdb.DuckDBPyRelation, min_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate min of a single column or a list of columns by the optional groups on the relation

mode(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the most frequent value for the aggregate columns. NULL values are ignored.

order(self: duckdb.DuckDBPyRelation, order_expr: str) duckdb.DuckDBPyRelation

Reorder the relation object by order_expr

pl(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) duckdb::PolarsDataFrame

Execute and fetch all rows as a Polars DataFrame

prod(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Calculates the product of the aggregate column.

project(self: duckdb.DuckDBPyRelation, project_expr: str) duckdb.DuckDBPyRelation

Project the relation object by the projection in project_expr

quantile(self: duckdb.DuckDBPyRelation, q: str, quantile_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the quantile of a single column or a list of columns by the optional groups on the relation

query(self: duckdb.DuckDBPyRelation, virtual_table_name: str, sql_query: str) duckdb.DuckDBPyRelation

Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object

record_batch(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.RecordBatchReader

Execute and return an Arrow Record Batch Reader that yields all rows

sem(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the standard error of the mean of the aggregate column.

set_alias(self: duckdb.DuckDBPyRelation, alias: str) duckdb.DuckDBPyRelation

Rename the relation object to new alias

property shape

Tuple of # of rows, # of columns in relation.

show(self: duckdb.DuckDBPyRelation) None

Display a summary of the data

skew(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation

Returns the skewness of the aggregate column.

sql_query(self: duckdb.DuckDBPyRelation) str

Get the SQL query that is equivalent to the relation

std(self: duckdb.DuckDBPyRelation, std_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the standard deviation of a single column or a list of columns by the optional groups on the relation

sum(self: duckdb.DuckDBPyRelation, sum_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the aggregate sum of a single column or a list of columns by the optional groups on the relation

to_arrow_table(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table

Execute and fetch all rows as an Arrow Table

to_csv(self: duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None) None

Write the relation object to a CSV file in ‘file_name’

to_df(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) pandas.DataFrame

Execute and fetch all rows as a pandas DataFrame

to_parquet(self: duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None) None

Write the relation object to a Parquet file in ‘file_name’

to_table(self: duckdb.DuckDBPyRelation, table_name: str) None

Creates a new table named table_name with the contents of the relation object

to_view(self: duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) duckdb.DuckDBPyRelation

Creates a view named view_name that refers to the relation object

property type

Get the type of the relation.

property types

Return a list containing the types of the columns of the relation.

union(self: duckdb.DuckDBPyRelation, union_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation

Create the set union of this relation object with another relation object in other_rel

unique(self: duckdb.DuckDBPyRelation, unique_aggr: str) duckdb.DuckDBPyRelation

Number of distinct values in a column.

value_counts(self: duckdb.DuckDBPyRelation, value_counts_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Count number of rows with each unique value of variable

var(self: duckdb.DuckDBPyRelation, var_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation

Compute the variance of a single column or a list of columns by the optional groups on the relation

write_csv(self: duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None) None

Write the relation object to a CSV file in ‘file_name’

write_parquet(self: duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None) None

Write the relation object to a Parquet file in ‘file_name’

exception duckdb.Error

Bases: Exception

exception duckdb.FatalException

Bases: Error

exception duckdb.IOException

Bases: OperationalError

exception duckdb.IntegrityError

Bases: Error

exception duckdb.InternalError

Bases: Error

exception duckdb.InternalException

Bases: InternalError

exception duckdb.InterruptException

Bases: Error

exception duckdb.InvalidInputException

Bases: ProgrammingError

exception duckdb.InvalidTypeException

Bases: ProgrammingError

exception duckdb.NotImplementedException

Bases: NotSupportedError

exception duckdb.NotSupportedError

Bases: Error

exception duckdb.OperationalError

Bases: Error

exception duckdb.OutOfMemoryException

Bases: OperationalError

exception duckdb.OutOfRangeException

Bases: DataError

exception duckdb.ParserException

Bases: ProgrammingError

exception duckdb.PermissionException

Bases: Error

exception duckdb.ProgrammingError

Bases: Error

exception duckdb.SequenceException

Bases: Error

exception duckdb.SerializationException

Bases: OperationalError

exception duckdb.StandardException

Bases: Error

exception duckdb.SyntaxException

Bases: ProgrammingError

exception duckdb.TransactionException

Bases: OperationalError

exception duckdb.TypeMismatchException

Bases: DataError

exception duckdb.ValueOutOfRangeException

Bases: DataError

exception duckdb.Warning

Bases: Exception

duckdb.aggregate(df: pandas.DataFrame, aggr_expr: str, group_expr: str = '', connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Compute the aggregate aggr_expr by the optional groups group_expr on DataFrame df

duckdb.alias(df: pandas.DataFrame, alias: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation from DataFrame df with the passed alias

duckdb.append(table_name: str, df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Append the passed DataFrame to the named table

duckdb.arrow(*args, **kwargs)

Overloaded function.

  1. arrow(chunk_size: int = 1000000, connection: duckdb.DuckDBPyConnection = None) -> pyarrow.lib.Table

Fetch a result as Arrow table following execute()

  1. arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

duckdb.begin(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Start a new transaction

duckdb.close(connection: duckdb.DuckDBPyConnection = None) None

Close the connection

duckdb.commit(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Commit changes performed within a transaction

duckdb.connect(database: str = ':memory:', read_only: bool = False, config: dict = None) duckdb.DuckDBPyConnection

Create a DuckDB database instance. Can take a database file name to read/write persistent data and a read_only flag if no changes are desired

duckdb.cursor(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Create a duplicate of the current connection

duckdb.description(connection: duckdb.DuckDBPyConnection = None) Optional[list]

Get result set attributes, mainly column names

duckdb.df(*args, **kwargs)

Overloaded function.

  1. df(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) -> pandas.DataFrame

Fetch a result as DataFrame following execute()

  1. df(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the DataFrame df

duckdb.distinct(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Compute the distinct rows from DataFrame df

duckdb.duplicate(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Create a duplicate of the current connection

duckdb.execute(query: str, parameters: object = None, multiple_parameter_sets: bool = False, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Execute the given SQL query, optionally using prepared statements with parameters set

duckdb.executemany(query: str, parameters: object = None, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Execute the given prepared statement multiple times using the list of parameter sets in parameters

duckdb.fetch_arrow_table(chunk_size: int = 1000000, connection: duckdb.DuckDBPyConnection = None) pyarrow.lib.Table

Fetch a result as Arrow table following execute()

duckdb.fetch_df(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) pandas.DataFrame

Fetch a result as DataFrame following execute()

duckdb.fetch_df_chunk(vectors_per_chunk: int = 1, *, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) pandas.DataFrame

Fetch a chunk of the result as DataFrame following execute()

duckdb.fetch_record_batch(chunk_size: int = 1000000, connection: duckdb.DuckDBPyConnection = None) pyarrow.lib.RecordBatchReader

Fetch an Arrow RecordBatchReader following execute()

duckdb.fetchall(connection: duckdb.DuckDBPyConnection = None) list

Fetch all rows from a result following execute

duckdb.fetchdf(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) pandas.DataFrame

Fetch a result as DataFrame following execute()

duckdb.fetchmany(size: int = 1, connection: duckdb.DuckDBPyConnection = None) list

Fetch the next set of rows from a result following execute

duckdb.fetchnumpy(connection: duckdb.DuckDBPyConnection = None) dict

Fetch a result as list of NumPy arrays following execute

duckdb.fetchone(connection: duckdb.DuckDBPyConnection = None) Optional[tuple]

Fetch a single row from a result following execute

duckdb.filter(df: pandas.DataFrame, filter_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Filter the DataFrame df by the filter in filter_expr

duckdb.from_arrow(*args, **kwargs)

Overloaded function.

  1. from_arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

  1. from_arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from an Arrow object

duckdb.from_csv_auto(name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) duckdb.DuckDBPyRelation

Create a relation object from the CSV file in ‘name’

duckdb.from_df(*args, **kwargs)

Overloaded function.

  1. from_df(df: pandas.DataFrame = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the DataFrame in df

  1. from_df(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the DataFrame df

duckdb.from_parquet(*args, **kwargs)

Overloaded function.

  1. from_parquet(file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

  1. from_parquet(file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

  1. from_parquet(file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Creates a relation object from the Parquet files in file_glob

  1. from_parquet(file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Creates a relation object from the Parquet files in file_globs

duckdb.from_query(*args, **kwargs)

Overloaded function.

  1. from_query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the given SQL query

  1. from_query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the given SQL query

duckdb.from_substrait(*args, **kwargs)

Overloaded function.

  1. from_substrait(proto: bytes, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a query object from protobuf plan

  1. from_substrait(proto: bytes, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Creates a query object from the substrait plan

duckdb.from_substrait_json(json: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Serialize a query object to protobuf

duckdb.get_substrait(*args, **kwargs)

Overloaded function.

  1. get_substrait(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation

Serialize a query to protobuf

  1. get_substrait(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation

Serialize a query object to protobuf

duckdb.get_substrait_json(*args, **kwargs)

Overloaded function.

  1. get_substrait_json(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation

Serialize a query to protobuf on the JSON format

  1. get_substrait_json(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation

Serialize a query object to protobuf

duckdb.get_table_names(query: str, connection: duckdb.DuckDBPyConnection = None) Set[str]

Extract the required table names from a query

duckdb.install_extension(extension: str, *, force_install: bool = False, connection: duckdb.DuckDBPyConnection = None) None

Install an extension by name

duckdb.limit(df: pandas.DataFrame, n: int, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Retrieve the first n rows from the DataFrame df

duckdb.list_filesystems(connection: duckdb.DuckDBPyConnection = None) list

List registered filesystems, including builtin ones

duckdb.load_extension(extension: str, connection: duckdb.DuckDBPyConnection = None) None

Load an installed extension

duckdb.order(df: pandas.DataFrame, order_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Reorder the DataFrame df by order_expr

duckdb.pl(chunk_size: int = 1000000, connection: duckdb.DuckDBPyConnection = None) duckdb::PolarsDataFrame

Fetch a result as Polars DataFrame following execute()

duckdb.project(df: pandas.DataFrame, project_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Project the DataFrame df by the projection in project_expr

duckdb.query(*args, **kwargs)

Overloaded function.

  1. query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

  1. query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

duckdb.query_df(df: pandas.DataFrame, virtual_table_name: str, sql_query: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Run the given SQL query in sql_query on the view named virtual_table_name that contains the content of DataFrame df

duckdb.read_csv(name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) duckdb.DuckDBPyRelation

Create a relation object from the CSV file in ‘name’

duckdb.read_json(name: str, connection: duckdb.DuckDBPyConnection = None, columns: object = None, sample_size: object = None, maximum_depth: object = None) duckdb.DuckDBPyRelation

Create a relation object from the JSON file in ‘name’

duckdb.read_parquet(*args, **kwargs)

Overloaded function.

  1. read_parquet(file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

  1. read_parquet(file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

duckdb.register(view_name: str, python_object: object, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Register the passed Python Object value for querying with a view

duckdb.register_filesystem(filesystem: fsspec.AbstractFileSystem, connection: duckdb.DuckDBPyConnection = None) None

Register a fsspec compliant filesystem

duckdb.rollback(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Roll back changes performed within a transaction

duckdb.sql(query: str, alias: str = 'query_relation', connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

duckdb.table(table_name: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object for the name’d table

duckdb.table_function(name: str, parameters: object = None, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object from the name’d table function with given parameters

class duckdb.token_type

Bases: pybind11_object

Members:

identifier

numeric_const

string_const

operator

keyword

comment

comment = <token_type.comment: 5>
identifier = <token_type.identifier: 0>
keyword = <token_type.keyword: 4>
property name
numeric_const = <token_type.numeric_const: 1>
operator = <token_type.operator: 3>
string_const = <token_type.string_const: 2>
property value
duckdb.tokenize(query: str) list

Tokenizes a SQL string, returning a list of (position, type) tuples that can be used for e.g. syntax highlighting

duckdb.unregister(view_name: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection

Unregister the view name

duckdb.unregister_filesystem(name: str, connection: duckdb.DuckDBPyConnection = None) None

Unregister a filesystem

duckdb.values(*args, **kwargs)

Overloaded function.

  1. values(values: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the passed values

  1. values(values: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation

Create a relation object from the passed values

duckdb.view(view_name: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation

Create a relation object for the name’d view

duckdb.write_csv(df: pandas.DataFrame, file_name: str, connection: duckdb.DuckDBPyConnection = None) None

Write the DataFrame df to a CSV file in file_name