DuckDB ASOF Join
Version dev
Version:

Problem: we have a time-based price table; Traditional joins against this table get NULL results if there is a time which does not exactly match.

Solution: “ASOF JOIN” picks a good value for “in the gap” values.

First, we create a price table and sales table.

CREATE TABLE prices AS (
    SELECT '2001-01-01 00:16:00'::TIMESTAMP + INTERVAL (v) MINUTE AS ticker_time,
        v AS unit_price
    FROM range(0,5) vals(v)
);

create table sales(item text, sale_time timestamp, quantity int);
insert into sales values('a', '2001-01-01 00:18:00', 10);
insert into sales values('b', '2001-01-01 00:18:30', 20);
insert into sales values('c', '2001-01-01 00:19:00', 30);

We can see that we have a unit_price defined for each hour, but not for half hours.

SELECT * FROM prices;
┌─────────────────────┬────────────┐
│     ticker_time     │ unit_price │
│      timestamp      │   int64    │
├─────────────────────┼────────────┤
│ 2001-01-01 00:16:00 │          0 │
│ 2001-01-01 00:17:00 │          1 │
│ 2001-01-01 00:18:00 │          2 │ No unit_price for 18:30!
│ 2001-01-01 00:19:00 │          3 │
│ 2001-01-01 00:20:00 │          4 │
└─────────────────────┴────────────┘
SELECT * FROM sales;
┌─────────┬─────────────────────┬──────────┐
│  item   │      sale_time      │ quantity │
│ varchar │      timestamp      │  int32   │
├─────────┼─────────────────────┼──────────┤
│ a       │ 2001-01-01 00:18:00 │       10 │
│ b       │ 2001-01-01 00:18:30 │       20 │ A sale time of 18:30!
│ c       │ 2001-01-01 00:19:00 │       30 │
└─────────┴─────────────────────┴──────────┘

With a normal LEFT JOIN, there is a problem for the 18:30 sale. Since there is not a sale_time of 18:30, a join against that time will be NULL.

-- no price value for 18:30, so item b's unit_price and total are NULL!

SELECT s.*, p.unit_price, s.quantity * p.unit_price AS total
 FROM sales s LEFT JOIN prices p
   ON s.sale_time = p.ticker_time;
┌─────────┬─────────────────────┬──────────┬────────────┬───────┐
│  item   │      sale_time      │ quantity │ unit_price │ total │
│ varchar │      timestamp      │  int32   │   int64    │ int64 │
├─────────┼─────────────────────┼──────────┼────────────┼───────┤
│ a       │ 2001-01-01 00:18:00 │       10 │          2 │    20 │
│ c       │ 2001-01-01 00:19:00 │       30 │          3 │    90 │
│ b       │ 2001-01-01 00:18:30 │       20 │       NULL │  NULL │  NULL result!
└─────────┴─────────────────────┴──────────┴────────────┴───────┘

The ASOF JOIN picks a good price for the 18:30 sale. the ON s.sale_time >= pp.ticker_time will cause the nearest lower value (in this case, for 18:00) to be used.

-- using ASOF, 18:30 "rounds down" to use the 18:00 unit_price

SELECT s.*, p.unit_price, s.quantity * p.unit_price AS total_cost
  FROM sales s ASOF LEFT JOIN prices p
    ON s.sale_time >= p.ticker_time;
┌─────────┬─────────────────────┬──────────┬────────────┬────────────┐
│  item   │      sale_time      │ quantity │ unit_price │ total_cost │
│ varchar │      timestamp      │  int32   │   int64    │   int64    │
├─────────┼─────────────────────┼──────────┼────────────┼────────────┤
│ a       │ 2001-01-01 00:18:00 │       10 │          2 │         20 │
│ b       │ 2001-01-01 00:18:30 │       20 │          2 │         40 │ Good result!
│ c       │ 2001-01-01 00:19:00 │       30 │          3 │         90 │
└─────────┴─────────────────────┴──────────┴────────────┴────────────┘
Search Shortcut cmd + k | ctrl + k