Skip to main content
Available on request — The Autumn Lakehouse is provisioned per customer. Contact us at hey@useautumn.com to get access.
Throughout this page, replace <catalog> and <namespace> with the names Autumn assigned you.

Find your catalog and namespace

Autumn sends you both names when your Lakehouse is provisioned (see Connecting). If you need to rediscover them in ClickHouse Cloud, the catalog mounts as a database — list databases to find it:
SHOW DATABASES;
The catalog appears as one of the listed databases. Your tables then live under `<catalog>`.`<namespace>.v2_3_<object>` (the <namespace>.v2_3_<object> part is a single literal table name — see below).

Identifier syntax

The Glue catalog mounts as a database. The whole <namespace>.v2_3_<object> is the table name — the dot is literal, so backtick both parts:
SELECT * FROM `<catalog>`.`<namespace>.v2_3_features` LIMIT 10;
A common mistake is `<catalog>.<namespace>`.`v2_3_features` — that won’t resolve, because <namespace>.v2_3_features is a single identifier, not database.table. The single-quoted '<namespace>.v2_3_features' form does not resolve in ClickHouse Cloud either — use backticks on both parts.

Timestamps

All number (epoch ms) columns (created_at, started_at, expires_at, current_period_*, *_resets_at, …) are epoch-milliseconds. Convert before use. The only native timestamp is events.timestamp (the unversioned events table — see Schema → Events).
SELECT
  customer_id,
  toDateTime(toInt64(created_at) / 1000) AS created
FROM `<catalog>`.`<namespace>.v2_3_customers`;
fromUnixTimestamp64Milli(toInt64(created_at)) also works and preserves millisecond precision. events.timestamp is already a DateTime — use it directly.

JSON columns

metadata, config, processors, properties, deductions, display, and the discounts arrays are stored as JSON strings.
SELECT JSONExtractString(properties, 'subtype') AS subtype
FROM `<catalog>`.`<namespace>.events`;

Nullable columns: don’t use IS NULL

On the Iceberg tables, WHERE col IS NULL raises NOT_FOUND_COLUMN_IN_BLOCK — the engine reaches for an unmaterialized col.null subcolumn that doesn’t exist. This bites hardest on entity_id (the pooled-vs-entity discriminator). Use coalesce(col, '') = '' for “is null” and coalesce(col, '') != '' for “is not null”.
-- ✗ throws NOT_FOUND_COLUMN_IN_BLOCK
WHERE entity_id IS NULL
-- ✓ pooled (customer-level) rows
WHERE coalesce(entity_id, '') = ''

Examples

Fetch one customer

SELECT internal_id, customer_id, name, email
FROM `<catalog>`.`<namespace>.v2_3_customers`
WHERE customer_id = 'cus_123';

Active subscriptions joined to their plan

Join on the internal ids, not the external ones (see below).
SELECT
  s.customer_id,
  p.name AS plan_name,
  p.price_amount,
  toDateTime(toInt64(s.current_period_end) / 1000) AS renews_at
FROM `<catalog>`.`<namespace>.v2_3_subscriptions` AS s
INNER JOIN `<catalog>`.`<namespace>.v2_3_plans` AS p
  ON p.internal_id = s.internal_product_id
WHERE s.status = 'active';

A customer’s balances for a feature

Use the pooled (customer-level) rows where coalesce(entity_id, '') = ''.
This returns the raw aggregated row, which sums over every entitlement — including superseded versions and past cycles — so granted/remaining/usage will not match the API for customers with history. To reproduce the API/dashboard figure, apply the active + current-cycle filter from Working with balances. For overage specifically, use the reference query below.
SELECT feature_id, granted, remaining, usage
FROM `<catalog>`.`<namespace>.v2_3_balances`
WHERE customer_id = 'cus_123'
  AND feature_id = 'AI_CREDITS'
  AND coalesce(entity_id, '') = '';

Current-period overage

Overage is derived, not stored, and there are two figures — pick deliberately (see Working with balances → Overage):
  • Displayed (max(0, Σusage − Σgranted) — sum first, floor once): what the balance header / dashboard shows. This query computes this one, so it reproduces the dashboard to within sync-lag drift.
  • Billable (Σ max(0, usage − granted) — floor per row, then sum): what Autumn invoices (cusEntToInvoiceOverage). To get it, swap the overage expression as noted in the query.
Both reconstruct from v2_3_breakdowns over the same row set: rows whose product is an active subscription (plus always-live one_off rows), restricted to each entitlement’s current cycle (earliest upcoming reset).
Swap in your <catalog> / <namespace> and the feature id. Known simplifications: it omits rollover / unused terms (zero for most customers, non-zero in general) and is pooled-only (coalesce(entity_id,'') = ''), so per-entity overage under config.disable_pooled_balance is not counted. For those customers, sum the per-entity rows instead.

Invoice totals by status

SELECT status, count() AS invoices, sum(total) AS total
FROM `<catalog>`.`<namespace>.v2_3_invoices`
GROUP BY status
ORDER BY total DESC;
To break down by plan, expand the plan_ids array with ARRAY JOIN:
SELECT plan_id, count() AS invoices
FROM `<catalog>`.`<namespace>.v2_3_invoices`
ARRAY JOIN plan_ids AS plan_id
GROUP BY plan_id;

Event volume per day by subtype

SELECT
  toStartOfDay(timestamp) AS day,
  JSONExtractString(properties, 'subtype') AS subtype,
  count() AS events
FROM `<catalog>`.`<namespace>.events`
WHERE timestamp > now() - INTERVAL 30 DAY
GROUP BY day, subtype
ORDER BY day;

Reconstruct a full customer (API shape)

This rebuilds the entire GET /customers/:id response — scalars, subscriptions, purchases, balances (with per-plan breakdowns), flags, and invoices — from the warehouse in a single query, returned as one JSON object. It’s the most useful query if you want the API’s customer view without calling the API. ClickHouse-specific (uses groupArray, Tuple/Map casts, and the JSON type — ClickHouse 24.8+). Customer-level (pooled) balances and flags use entity_id = ''; per-entity rows are excluded from the customer envelope. FORMAT PrettyJSONEachRow emits one pretty-printed JSON object.
This mirrors the shape of the API response, but the balances / breakdown values are the raw aggregated rows — they sum over superseded versions and past cycles, so they will not match GET /customers/:id for customers with billing history. To match the API, restrict the balances/breakdowns subqueries to active, current-cycle rows per Working with balances → The active + current-cycle filter.

Cross-database joins

You can join your Lakehouse tables against your own data living elsewhere in the same engine.
Qualify each side fully — the Iceberg catalog table and your own ClickHouse table:
SELECT c.customer_id, c.email, u.signup_source
FROM `<catalog>`.`<namespace>.v2_3_customers` AS c
INNER JOIN my_db.users AS u
  ON u.autumn_customer_id = c.customer_id;
The catalog connection is best for ad-hoc and bounded queries. For very large scans, filter early (by env, time range, or id) or materialize a subset into a native table first.

Important notes

Use internal ids, not external ids

This is the single most important rule for reliable queries.
  • Each plan version is its own row in v2_3_plans, with its own internal_id. The only identifier shared across versions is the external plan_id.
  • External ids (plan_id, customer_id, feature_id, entity_id, …) are mutable — you can rename them in Autumn at any time — and a single external id can map to multiple versioned rows.
So:
  • Join and filter on internal_id and the internal_* foreign keys (internal_customer_id, internal_feature_id, internal_product_id, internal_entity_id). These are immutable and globally unique.
  • Treat external ids as display-only — great for human-readable output, unreliable as join or lookup keys.
Filtering by an external plan_id can silently match multiple plan versions (and breaks entirely if the id was renamed). Reach for internal_id whenever you need a stable, exact reference.

Pooled vs per-entity rows

v2_3_balances, v2_3_breakdowns, and v2_3_flags contain two kinds of row:
  • Pooled (customer-level) — entity_id is null.
  • Per-entityentity_id is set.
For customer-level totals, keep the pooled rows with coalesce(entity_id, '') = '' to avoid double counting. To analyze a specific entity, filter on its entity_id (or internal_entity_id).
Write coalesce(entity_id, '') = '', not entity_id IS NULLIS NULL throws on these Iceberg columns (see Nullable columns).
Customers with config.disable_pooled_balance track per-entity rather than pooled — for them, sum the per-entity rows instead of reading the pooled row. See Working with balances for what the aggregated values mean and the active + current-cycle filter you need before trusting them.

Freshness

State tables sync within ~5 minutes under normal load; events have a variable lead time and backfill on first connection. See Overview → Data freshness.