DuckDB is an embeddable SQL OLAP Database Management System
- duckdb.threadsafety bool¶
-
Indicates that this package is threadsafe
- duckdb.apilevel int¶
-
Indicates which Python DBAPI version this package implements
- duckdb.paramstyle str¶
-
Indicates which parameter style duckdb supports
- duckdb.default_connection duckdb.DuckDBPyConnection¶
-
The connection that is used by default if you don’t explicitly pass one to the root methods in this module
- exception duckdb.BinderException¶
-
Bases:
ProgrammingError
- exception duckdb.CatalogException¶
-
Bases:
ProgrammingError
- exception duckdb.ConnectionException¶
-
Bases:
OperationalError
- exception duckdb.ConstraintException¶
-
Bases:
IntegrityError
- class duckdb.DuckDBPyConnection¶
-
Bases:
pybind11_object
- append(self: duckdb.DuckDBPyConnection, table_name: str, df: pandas.DataFrame) duckdb.DuckDBPyConnection ¶
-
Append the passed Data.Frame to the named table
- arrow(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.Table ¶
-
Fetch a result as Arrow table following execute()
- begin(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection ¶
-
Start a new transaction
- close(self: duckdb.DuckDBPyConnection) None ¶
-
Close the connection
- commit(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection ¶
-
Commit changes performed within a transaction
- cursor(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection ¶
-
Create a duplicate of the current connection
- property description¶
-
Get result set attributes, mainly column names
- df(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame ¶
-
Fetch a result as DataFrame following execute()
- duplicate(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection ¶
-
Create a duplicate of the current connection
- execute(self: duckdb.DuckDBPyConnection, query: str, parameters: object = None, multiple_parameter_sets: bool = False) duckdb.DuckDBPyConnection ¶
-
Execute the given SQL query, optionally using prepared statements with parameters set
- executemany(self: duckdb.DuckDBPyConnection, query: str, parameters: object = None) duckdb.DuckDBPyConnection ¶
-
Execute the given prepared statement multiple times using the list of parameter sets in parameters
- fetch_arrow_table(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.Table ¶
-
Fetch a result as Arrow table following execute()
- fetch_df(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame ¶
-
Fetch a result as DataFrame following execute()
- fetch_df_chunk(self: duckdb.DuckDBPyConnection, vectors_per_chunk: int = 1, *, date_as_object: bool = False) pandas.DataFrame ¶
-
Fetch a chunk of the result as Data.Frame following execute()
- fetch_record_batch(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) pyarrow.lib.RecordBatchReader ¶
-
Fetch an Arrow RecordBatchReader following execute()
- fetchall(self: duckdb.DuckDBPyConnection) list ¶
-
Fetch all rows from a result following execute
- fetchdf(self: duckdb.DuckDBPyConnection, *, date_as_object: bool = False) pandas.DataFrame ¶
-
Fetch a result as DataFrame following execute()
- fetchmany(self: duckdb.DuckDBPyConnection, size: int = 1) list ¶
-
Fetch the next set of rows from a result following execute
- fetchnumpy(self: duckdb.DuckDBPyConnection) dict ¶
-
Fetch a result as list of NumPy arrays following execute
- fetchone(self: duckdb.DuckDBPyConnection) Optional[tuple] ¶
-
Fetch a single row from a result following execute
- from_arrow(self: duckdb.DuckDBPyConnection, arrow_object: object) duckdb.DuckDBPyRelation ¶
-
Create a relation object from an Arrow object
- from_csv_auto(self: duckdb.DuckDBPyConnection, name: str, *, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the CSV file in ‘name’
- from_df(self: duckdb.DuckDBPyConnection, df: pandas.DataFrame = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the Data.Frame in df
- from_parquet(*args, **kwargs)¶
-
Overloaded function.
from_parquet(self: duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_glob
from_parquet(self: duckdb.DuckDBPyConnection, file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_globs
- from_query(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') duckdb.DuckDBPyRelation ¶
-
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
- from_substrait(self: duckdb.DuckDBPyConnection, proto: bytes) duckdb.DuckDBPyRelation ¶
-
Create a query object from protobuf plan
- from_substrait_json(self: duckdb.DuckDBPyConnection, json: str) duckdb.DuckDBPyRelation ¶
-
Create a query object from a JSON protobuf plan
- get_substrait(self: duckdb.DuckDBPyConnection, query: str, *, enable_optimizer: bool = True) duckdb.DuckDBPyRelation ¶
-
Serialize a query to protobuf
- get_substrait_json(self: duckdb.DuckDBPyConnection, query: str, *, enable_optimizer: bool = True) duckdb.DuckDBPyRelation ¶
-
Serialize a query to protobuf on the JSON format
- get_table_names(self: duckdb.DuckDBPyConnection, query: str) Set[str] ¶
-
Extract the required table names from a query
- install_extension(self: duckdb.DuckDBPyConnection, extension: str, *, force_install: bool = False) None ¶
-
Install an extension by name
- list_filesystems(self: duckdb.DuckDBPyConnection) list ¶
-
List registered filesystems, including builtin ones
- load_extension(self: duckdb.DuckDBPyConnection, extension: str) None ¶
-
Load an installed extension
- pl(self: duckdb.DuckDBPyConnection, chunk_size: int = 1000000) duckdb::PolarsDataFrame ¶
-
Fetch a result as Polars DataFrame following execute()
- query(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') duckdb.DuckDBPyRelation ¶
-
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
- read_csv(self: duckdb.DuckDBPyConnection, name: str, *, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the CSV file in ‘name’
- read_json(self: duckdb.DuckDBPyConnection, name: str, *, columns: object = None, sample_size: object = None, maximum_depth: object = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the JSON file in ‘name’
- read_parquet(*args, **kwargs)¶
-
Overloaded function.
read_parquet(self: duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_glob
read_parquet(self: duckdb.DuckDBPyConnection, file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_globs
- register(self: duckdb.DuckDBPyConnection, view_name: str, python_object: object) duckdb.DuckDBPyConnection ¶
-
Register the passed Python Object value for querying with a view
- register_filesystem(self: duckdb.DuckDBPyConnection, filesystem: fsspec.AbstractFileSystem) None ¶
-
Register a fsspec compliant filesystem
- rollback(self: duckdb.DuckDBPyConnection) duckdb.DuckDBPyConnection ¶
-
Roll back changes performed within a transaction
- sql(self: duckdb.DuckDBPyConnection, query: str, alias: str = 'query_relation') duckdb.DuckDBPyRelation ¶
-
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
- table(self: duckdb.DuckDBPyConnection, table_name: str) duckdb.DuckDBPyRelation ¶
-
Create a relation object for the name’d table
- table_function(self: duckdb.DuckDBPyConnection, name: str, parameters: object = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the name’d table function with given parameters
- unregister(self: duckdb.DuckDBPyConnection, view_name: str) duckdb.DuckDBPyConnection ¶
-
Unregister the view name
- unregister_filesystem(self: duckdb.DuckDBPyConnection, name: str) None ¶
-
Unregister a filesystem
- values(self: duckdb.DuckDBPyConnection, values: object) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the passed values
- view(self: duckdb.DuckDBPyConnection, view_name: str) duckdb.DuckDBPyRelation ¶
-
Create a relation object for the name’d view
- class duckdb.DuckDBPyRelation¶
-
Bases:
pybind11_object
- abs(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation ¶
-
Returns the absolute value for the specified columns.
- aggregate(self: duckdb.DuckDBPyRelation, aggr_expr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the aggregate aggr_expr by the optional groups group_expr on the relation
- property alias¶
-
Get the name of the current alias
- apply(self: duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the function of a single column or a list of columns by the optional groups on the relation
- arrow(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table ¶
-
Execute and fetch all rows as an Arrow Table
- close(self: duckdb.DuckDBPyRelation) None ¶
-
Closes the result
- property columns¶
-
Return a list containing the names of the columns of the relation.
- count(self: duckdb.DuckDBPyRelation, count_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the aggregate count of a single column or a list of columns by the optional groups on the relation
- create(self: duckdb.DuckDBPyRelation, table_name: str) None ¶
-
Creates a new table named table_name with the contents of the relation object
- create_view(self: duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) duckdb.DuckDBPyRelation ¶
-
Creates a view named view_name that refers to the relation object
- cummax(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation ¶
-
Returns the cumulative maximum of the aggregate column.
- cummin(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation ¶
-
Returns the cumulative minimum of the aggregate column.
- cumprod(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation ¶
-
Returns the cumulative product of the aggregate column.
- cumsum(self: duckdb.DuckDBPyRelation, aggregation_columns: str) duckdb.DuckDBPyRelation ¶
-
Returns the cumulative sum of the aggregate column.
- describe(self: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation ¶
-
Gives basic statistics (e.g., min,max) and if null exists for each column of the relation.
- property description¶
-
Return the description of the result
- df(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) pandas.DataFrame ¶
-
Execute and fetch all rows as a pandas DataFrame
- distinct(self: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation ¶
-
Retrieve distinct rows from this relation object
- property dtypes¶
-
Return a list containing the types of the columns of the relation.
- except_(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation ¶
-
Create the set except of this relation object with another relation object in other_rel
- execute(self: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation ¶
-
Transform the relation into a result set
- explain(self: duckdb.DuckDBPyRelation) str ¶
- fetch_arrow_reader(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.RecordBatchReader ¶
-
Execute and return an Arrow Record Batch Reader that yields all rows
- fetch_arrow_table(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table ¶
-
Execute and fetch all rows as an Arrow Table
- fetchall(self: duckdb.DuckDBPyRelation) list ¶
-
Execute and fetch all rows as a list of tuples
- fetchdf(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) pandas.DataFrame ¶
-
Execute and fetch all rows as a pandas DataFrame
- fetchmany(self: duckdb.DuckDBPyRelation, size: int = 1) list ¶
-
Execute and fetch the next set of rows as a list of tuples
- fetchnumpy(self: duckdb.DuckDBPyRelation) dict ¶
-
Execute and fetch all rows as a Python dict mapping each column to one numpy arrays
- fetchone(self: duckdb.DuckDBPyRelation) Optional[tuple] ¶
-
Execute and fetch a single row as a tuple
- filter(self: duckdb.DuckDBPyRelation, filter_expr: str) duckdb.DuckDBPyRelation ¶
-
Filter the relation object by the filter in filter_expr
- insert(self: duckdb.DuckDBPyRelation, values: object) None ¶
-
Inserts the given values into the relation
- insert_into(self: duckdb.DuckDBPyRelation, table_name: str) None ¶
-
Inserts the relation object into an existing table named table_name
- intersect(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation ¶
-
Create the set intersection of this relation object with another relation object in other_rel
- join(self: duckdb.DuckDBPyRelation, other_rel: duckdb.DuckDBPyRelation, condition: str, how: str = 'inner') duckdb.DuckDBPyRelation ¶
-
Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are ‘inner’ and ‘left’
- kurt(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation ¶
-
Returns the excess kurtosis of the aggregate column.
- limit(self: duckdb.DuckDBPyRelation, n: int, offset: int = 0) duckdb.DuckDBPyRelation ¶
-
Only retrieve the first n rows from this relation object, starting at offset
- mad(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation ¶
-
Returns the median absolute deviation for the aggregate columns. NULL values are ignored. Temporal types return a positive INTERVAL.
- map(self: duckdb.DuckDBPyRelation, map_function: function) duckdb.DuckDBPyRelation ¶
-
Calls the passed function on the relation
- max(self: duckdb.DuckDBPyRelation, max_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the aggregate max of a single column or a list of columns by the optional groups on the relation
- mean(self: duckdb.DuckDBPyRelation, mean_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the aggregate mean of a single column or a list of columns by the optional groups on the relation
- median(self: duckdb.DuckDBPyRelation, median_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the aggregate median of a single column or a list of columns by the optional groups on the relation
- min(self: duckdb.DuckDBPyRelation, min_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the aggregate min of a single column or a list of columns by the optional groups on the relation
- mode(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation ¶
-
Returns the most frequent value for the aggregate columns. NULL values are ignored.
- order(self: duckdb.DuckDBPyRelation, order_expr: str) duckdb.DuckDBPyRelation ¶
-
Reorder the relation object by order_expr
- pl(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) duckdb::PolarsDataFrame ¶
-
Execute and fetch all rows as a Polars DataFrame
- prod(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation ¶
-
Calculates the product of the aggregate column.
- project(self: duckdb.DuckDBPyRelation, project_expr: str) duckdb.DuckDBPyRelation ¶
-
Project the relation object by the projection in project_expr
- quantile(self: duckdb.DuckDBPyRelation, q: str, quantile_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the quantile of a single column or a list of columns by the optional groups on the relation
- query(self: duckdb.DuckDBPyRelation, virtual_table_name: str, sql_query: str) duckdb.DuckDBPyRelation ¶
-
Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object
- record_batch(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.RecordBatchReader ¶
-
Execute and return an Arrow Record Batch Reader that yields all rows
- sem(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation ¶
-
Returns the standard error of the mean of the aggregate column.
- set_alias(self: duckdb.DuckDBPyRelation, alias: str) duckdb.DuckDBPyRelation ¶
-
Rename the relation object to new alias
- property shape¶
-
Tuple of # of rows, # of columns in relation.
- show(self: duckdb.DuckDBPyRelation) None ¶
-
Display a summary of the data
- skew(self: duckdb.DuckDBPyRelation, aggregation_columns: str, group_columns: str = '') duckdb.DuckDBPyRelation ¶
-
Returns the skewness of the aggregate column.
- sql_query(self: duckdb.DuckDBPyRelation) str ¶
-
Get the SQL query that is equivalent to the relation
- std(self: duckdb.DuckDBPyRelation, std_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the standard deviation of a single column or a list of columns by the optional groups on the relation
- sum(self: duckdb.DuckDBPyRelation, sum_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the aggregate sum of a single column or a list of columns by the optional groups on the relation
- to_arrow_table(self: duckdb.DuckDBPyRelation, batch_size: int = 1000000) pyarrow.lib.Table ¶
-
Execute and fetch all rows as an Arrow Table
- to_csv(self: duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None) None ¶
-
Write the relation object to a CSV file in ‘file_name’
- to_df(self: duckdb.DuckDBPyRelation, *, date_as_object: bool = False) pandas.DataFrame ¶
-
Execute and fetch all rows as a pandas DataFrame
- to_parquet(self: duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None) None ¶
-
Write the relation object to a Parquet file in ‘file_name’
- to_table(self: duckdb.DuckDBPyRelation, table_name: str) None ¶
-
Creates a new table named table_name with the contents of the relation object
- to_view(self: duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) duckdb.DuckDBPyRelation ¶
-
Creates a view named view_name that refers to the relation object
- property type¶
-
Get the type of the relation.
- property types¶
-
Return a list containing the types of the columns of the relation.
- union(self: duckdb.DuckDBPyRelation, union_rel: duckdb.DuckDBPyRelation) duckdb.DuckDBPyRelation ¶
-
Create the set union of this relation object with another relation object in other_rel
- unique(self: duckdb.DuckDBPyRelation, unique_aggr: str) duckdb.DuckDBPyRelation ¶
-
Number of distinct values in a column.
- value_counts(self: duckdb.DuckDBPyRelation, value_counts_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Count number of rows with each unique value of variable
- var(self: duckdb.DuckDBPyRelation, var_aggr: str, group_expr: str = '') duckdb.DuckDBPyRelation ¶
-
Compute the variance of a single column or a list of columns by the optional groups on the relation
- write_csv(self: duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None) None ¶
-
Write the relation object to a CSV file in ‘file_name’
- write_parquet(self: duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None) None ¶
-
Write the relation object to a Parquet file in ‘file_name’
- exception duckdb.Error¶
-
Bases:
Exception
- exception duckdb.IOException¶
-
Bases:
OperationalError
- exception duckdb.InternalException¶
-
Bases:
InternalError
- exception duckdb.InvalidInputException¶
-
Bases:
ProgrammingError
- exception duckdb.InvalidTypeException¶
-
Bases:
ProgrammingError
- exception duckdb.NotImplementedException¶
-
Bases:
NotSupportedError
- exception duckdb.OutOfMemoryException¶
-
Bases:
OperationalError
- exception duckdb.ParserException¶
-
Bases:
ProgrammingError
- exception duckdb.SerializationException¶
-
Bases:
OperationalError
- exception duckdb.SyntaxException¶
-
Bases:
ProgrammingError
- exception duckdb.TransactionException¶
-
Bases:
OperationalError
- exception duckdb.Warning¶
-
Bases:
Exception
- duckdb.aggregate(df: pandas.DataFrame, aggr_expr: str, group_expr: str = '', connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Compute the aggregate aggr_expr by the optional groups group_expr on DataFrame df
- duckdb.alias(df: pandas.DataFrame, alias: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Create a relation from DataFrame df with the passed alias
- duckdb.append(table_name: str, df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Append the passed DataFrame to the named table
- duckdb.arrow(*args, **kwargs)¶
-
Overloaded function.
arrow(chunk_size: int = 1000000, connection: duckdb.DuckDBPyConnection = None) -> pyarrow.lib.Table
Fetch a result as Arrow table following execute()
arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from an Arrow object
- duckdb.begin(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Start a new transaction
- duckdb.close(connection: duckdb.DuckDBPyConnection = None) None ¶
-
Close the connection
- duckdb.commit(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Commit changes performed within a transaction
- duckdb.connect(database: str = ':memory:', read_only: bool = False, config: dict = None) duckdb.DuckDBPyConnection ¶
-
Create a DuckDB database instance. Can take a database file name to read/write persistent data and a read_only flag if no changes are desired
- duckdb.cursor(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Create a duplicate of the current connection
- duckdb.description(connection: duckdb.DuckDBPyConnection = None) Optional[list] ¶
-
Get result set attributes, mainly column names
- duckdb.df(*args, **kwargs)¶
-
Overloaded function.
df(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) -> pandas.DataFrame
Fetch a result as DataFrame following execute()
df(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the DataFrame df
- duckdb.distinct(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Compute the distinct rows from DataFrame df
- duckdb.duplicate(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Create a duplicate of the current connection
- duckdb.execute(query: str, parameters: object = None, multiple_parameter_sets: bool = False, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Execute the given SQL query, optionally using prepared statements with parameters set
- duckdb.executemany(query: str, parameters: object = None, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Execute the given prepared statement multiple times using the list of parameter sets in parameters
- duckdb.fetch_arrow_table(chunk_size: int = 1000000, connection: duckdb.DuckDBPyConnection = None) pyarrow.lib.Table ¶
-
Fetch a result as Arrow table following execute()
- duckdb.fetch_df(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) pandas.DataFrame ¶
-
Fetch a result as DataFrame following execute()
- duckdb.fetch_df_chunk(vectors_per_chunk: int = 1, *, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) pandas.DataFrame ¶
-
Fetch a chunk of the result as DataFrame following execute()
- duckdb.fetch_record_batch(chunk_size: int = 1000000, connection: duckdb.DuckDBPyConnection = None) pyarrow.lib.RecordBatchReader ¶
-
Fetch an Arrow RecordBatchReader following execute()
- duckdb.fetchall(connection: duckdb.DuckDBPyConnection = None) list ¶
-
Fetch all rows from a result following execute
- duckdb.fetchdf(*, date_as_object: bool = False, connection: duckdb.DuckDBPyConnection = None) pandas.DataFrame ¶
-
Fetch a result as DataFrame following execute()
- duckdb.fetchmany(size: int = 1, connection: duckdb.DuckDBPyConnection = None) list ¶
-
Fetch the next set of rows from a result following execute
- duckdb.fetchnumpy(connection: duckdb.DuckDBPyConnection = None) dict ¶
-
Fetch a result as list of NumPy arrays following execute
- duckdb.fetchone(connection: duckdb.DuckDBPyConnection = None) Optional[tuple] ¶
-
Fetch a single row from a result following execute
- duckdb.filter(df: pandas.DataFrame, filter_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Filter the DataFrame df by the filter in filter_expr
- duckdb.from_arrow(*args, **kwargs)¶
-
Overloaded function.
from_arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from an Arrow object
from_arrow(arrow_object: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from an Arrow object
- duckdb.from_csv_auto(name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the CSV file in ‘name’
- duckdb.from_df(*args, **kwargs)¶
-
Overloaded function.
from_df(df: pandas.DataFrame = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the DataFrame in df
from_df(df: pandas.DataFrame, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the DataFrame df
- duckdb.from_parquet(*args, **kwargs)¶
-
Overloaded function.
from_parquet(file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_glob
from_parquet(file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_globs
from_parquet(file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Creates a relation object from the Parquet files in file_glob
from_parquet(file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Creates a relation object from the Parquet files in file_globs
- duckdb.from_query(*args, **kwargs)¶
-
Overloaded function.
from_query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the given SQL query
from_query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the given SQL query
- duckdb.from_substrait(*args, **kwargs)¶
-
Overloaded function.
from_substrait(proto: bytes, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a query object from protobuf plan
from_substrait(proto: bytes, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Creates a query object from the substrait plan
- duckdb.from_substrait_json(json: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Serialize a query object to protobuf
- duckdb.get_substrait(*args, **kwargs)¶
-
Overloaded function.
get_substrait(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation
Serialize a query to protobuf
get_substrait(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation
Serialize a query object to protobuf
- duckdb.get_substrait_json(*args, **kwargs)¶
-
Overloaded function.
get_substrait_json(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation
Serialize a query to protobuf on the JSON format
get_substrait_json(query: str, connection: duckdb.DuckDBPyConnection = None, *, enable_optimizer: bool = True) -> duckdb.DuckDBPyRelation
Serialize a query object to protobuf
- duckdb.get_table_names(query: str, connection: duckdb.DuckDBPyConnection = None) Set[str] ¶
-
Extract the required table names from a query
- duckdb.install_extension(extension: str, *, force_install: bool = False, connection: duckdb.DuckDBPyConnection = None) None ¶
-
Install an extension by name
- duckdb.limit(df: pandas.DataFrame, n: int, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Retrieve the first n rows from the DataFrame df
- duckdb.list_filesystems(connection: duckdb.DuckDBPyConnection = None) list ¶
-
List registered filesystems, including builtin ones
- duckdb.load_extension(extension: str, connection: duckdb.DuckDBPyConnection = None) None ¶
-
Load an installed extension
- duckdb.order(df: pandas.DataFrame, order_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Reorder the DataFrame df by order_expr
- duckdb.pl(chunk_size: int = 1000000, connection: duckdb.DuckDBPyConnection = None) duckdb::PolarsDataFrame ¶
-
Fetch a result as Polars DataFrame following execute()
- duckdb.project(df: pandas.DataFrame, project_expr: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Project the DataFrame df by the projection in project_expr
- duckdb.query(*args, **kwargs)¶
-
Overloaded function.
query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
query(query: str, alias: str = ‘query_relation’, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
- duckdb.query_df(df: pandas.DataFrame, virtual_table_name: str, sql_query: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Run the given SQL query in sql_query on the view named virtual_table_name that contains the content of DataFrame df
- duckdb.read_csv(name: str, connection: duckdb.DuckDBPyConnection = None, header: object = None, compression: object = None, sep: object = None, delimiter: object = None, dtype: object = None, na_values: object = None, skiprows: object = None, quotechar: object = None, escapechar: object = None, encoding: object = None, parallel: object = None, date_format: object = None, timestamp_format: object = None, sample_size: object = None, all_varchar: object = None, normalize_names: object = None, filename: object = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the CSV file in ‘name’
- duckdb.read_json(name: str, connection: duckdb.DuckDBPyConnection = None, columns: object = None, sample_size: object = None, maximum_depth: object = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the JSON file in ‘name’
- duckdb.read_parquet(*args, **kwargs)¶
-
Overloaded function.
read_parquet(file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_glob
read_parquet(file_globs: List[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the Parquet files in file_globs
- duckdb.register(view_name: str, python_object: object, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Register the passed Python Object value for querying with a view
- duckdb.register_filesystem(filesystem: fsspec.AbstractFileSystem, connection: duckdb.DuckDBPyConnection = None) None ¶
-
Register a fsspec compliant filesystem
- duckdb.rollback(connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Roll back changes performed within a transaction
- duckdb.sql(query: str, alias: str = 'query_relation', connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.
- duckdb.table(table_name: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object for the name’d table
- duckdb.table_function(name: str, parameters: object = None, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object from the name’d table function with given parameters
- class duckdb.token_type¶
-
Bases:
pybind11_object
Members:
identifier
numeric_const
string_const
operator
keyword
comment
- comment = <token_type.comment: 5>¶
- identifier = <token_type.identifier: 0>¶
- keyword = <token_type.keyword: 4>¶
- property name¶
- numeric_const = <token_type.numeric_const: 1>¶
- operator = <token_type.operator: 3>¶
- string_const = <token_type.string_const: 2>¶
- property value¶
- duckdb.tokenize(query: str) list ¶
-
Tokenizes a SQL string, returning a list of (position, type) tuples that can be used for e.g. syntax highlighting
- duckdb.unregister(view_name: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyConnection ¶
-
Unregister the view name
- duckdb.unregister_filesystem(name: str, connection: duckdb.DuckDBPyConnection = None) None ¶
-
Unregister a filesystem
- duckdb.values(*args, **kwargs)¶
-
Overloaded function.
values(values: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the passed values
values(values: object, connection: duckdb.DuckDBPyConnection = None) -> duckdb.DuckDBPyRelation
Create a relation object from the passed values
- duckdb.view(view_name: str, connection: duckdb.DuckDBPyConnection = None) duckdb.DuckDBPyRelation ¶
-
Create a relation object for the name’d view
- duckdb.write_csv(df: pandas.DataFrame, file_name: str, connection: duckdb.DuckDBPyConnection = None) None ¶
-
Write the DataFrame df to a CSV file in file_name