§ The asset register
Ten live data sources feed the Cadence pipeline — one music corpus, two more music charts, a screen layer, three attention/discourse layers, a macro layer and a weather control. This book is the cold inventory: what each source is, how we get it, its status, and where it reaches — then the per-source taxonomy, the insights each drives, the new sources worth adding, and a hard look at how to monetise the whole. Every figure here is read from the live tables, not estimated.
Across 8 signal families. Spotify via NPILABS Athena; the rest via the KARLA harness, SearchApi and public APIs. The Trade family is now live via in-house auction data (Barnebys / Salle).
Across the SQLite databases — plus the 8-year, 150M-song Spotify corpus in Athena. Grew this quarter with search-intent (8k), GDELT themes (38k) and the brand panel.
Seven years of daily/weekly/monthly non-music signal for France; 8+ years of worldwide charts in Spotify.
Since the brand expansion, the gaps have closed and three new layers are live:
Search-intent — SOLVED. Google Trends now collected reliably via SearchApi (paid, keyed): all 36 brands, 8,028 weekly rows. The datacenter-IP throttle is gone — the production-runner path we flagged. GDELT — deepened. Brand & national tone complete (23 series, 40k rows), plus a new thematic-weather layer — GDELT GKG theme coverage-volume (15/19 themes, 38k daily rows: inflation salience, safety, jobs, strikes…), so we now read not just how positive the news is but what it’s about. Genre — added, commercial-safe. Netflix titles tagged with genre + origin via Wikidata (CC0), ~89% of viewing-weeks. Brand activity via the TikTok Ads Library and auction signals via our in-house Barnebys/Salle data round out the stack.
The complete live stack. Status reflects the data as it sits today: green = current and deep, amber = working but shallow or stale, with the reason named. Coverage is the honest count — most non-music signal is France-only today (the France PoC scope), and that is the single biggest expansion lever.
| Source | Signal family | How we get it | Cadence | Depth | Rows | Geographic coverage | Status |
|---|---|---|---|---|---|---|---|
| Spotify Top 50 + Viral 50 + audio features |
Mood (music) | NPILABS Athena france_poc_v1.combined (licensed) | Daily | 8+ yrs | 150M songs | Worldwide (70+ markets in corpus; France in active analysis) | Live · features frozen Nov 2024 |
| Apple Music Top 100 songs |
Mood (music) | Apple RSS endpoint | Daily | From Apr 2026 | 3,400 | 34 markets (EU-20 + Americas + APAC + MEA) | Live · snapshot-shallow |
| Amazon Music Retail digital bestsellers |
Commerce (music) | Amazon storefront scrape | Daily | From Apr 2026 | 120 | 5 markets (US, GB, DE, FR, JP) | Beta · artist enrich pending |
| Netflix Top 10 — Films & TV |
Narrative (screen) | Netflix Tudum public TSV | Weekly | 2021–26 | 45,000 | 9 markets (BR, DE, ES, FR, GB, IT, JP, KR, US) | Live |
| Google Trends Brand search index |
Intent | SearchApi (paid, keyed) | Weekly | 2021–25 | 8,028 | France (36 brands, anchored) | Live · throttle solved via SearchApi |
| Wikipedia Pageviews per article |
Attention | Wikimedia REST API | Daily | 2019–25 | 67,017 | France (fr.wikipedia, 36 brands + culture) | Live · brand set collected 13 Jun |
| GDELT News tone + GKG themes |
Discourse | GDELT 2.0 Doc API (tone + timelinevol) | Daily | 2019–25 | 79,000+ | France (tone: 23 series; themes: 15) | Live · tone complete + thematic weather |
| TikTok Ads Library Brand ad activity |
Brand activity | SearchApi (TikTok Ads) | On-demand | rolling | — | France (validated; per-brand) | Validated |
| Auctions Hammer prices / lots |
Trade | In-house (Barnebys / Salle) | Ongoing | multi-year | large | Global houses (Christie's, Sotheby's, Phillips…) | In-house · luxury tier |
| Eurostat CCI + sub-indices, retail, jobs, inflation |
Macro | Eurostat dissemination API | Monthly | 2019–26 | 516 | France (EU-wide capable) | Live |
| Yahoo Finance CAC 40, EUR/USD |
Trade (markets) | yfinance library | Daily | 2019–25 | 3,613 | France / EU | Live |
| Open-Meteo Temp, rain, sun, wind |
Control | Open-Meteo archive API | Daily | 2019–25 | 15,342 | France (Paris proxy) | Live · control only |
The sources aren't a random scrape pile — they map to eight ways a market reveals itself. The product's whole claim is triangulation: any one family is noise, but where families agree (or one diverges) is the read. This is the frame the catalogue is organised around.
Spotify (anchor) · Apple Music. Audio features (tempo, key, valence) are the validated mood proxy. The hardest signal to fake and the one no incumbent owns.
Netflix Top 10, now genre- & origin-tagged via Wikidata (CC0). Films vs TV, comfort vs prestige, local vs international — what stories a market is choosing.
Wikipedia pageviews. Where curiosity concentrates — artists, brands, events — in near-real time.
Google Trends (now reliable via SearchApi) · Amazon Music. Brand search-intent and purchase-side commerce — the closest layer to demand.
GDELT tone + thematic weather (GKG themes — what coverage is about, and how it's shifting) · Google News headlines. Separates felt mood from reported mood.
TikTok Ads Library. The supply side: who's advertising, how hard, with what creative — paired against demand-side attention & search.
Now live: auction signals via our in-house Barnebys / Salle data (Christie's, Sotheby's, Phillips) for the elite tier, plus CAC 40 / EUR-USD context. The hero's "trade" verb, kept.
Eurostat · OECD (ground truth + commercial weather) · Open-Meteo (confound control, never published). What anchors and validates everything else.
The brand promise is "We read markets by what they play, watch, read, trade, buy and search" — six verbs. The live stack now delivers all six: play (Spotify/Apple), watch (Netflix + Wikidata genre), read (Wikipedia / GDELT tone & themes / Google News), trade (auction signals via Barnebys/Salle + markets context), buy (Amazon), search (Google Trends via SearchApi). The remaining build-out on Trade is promoting markets from control to signal (sector rotation + brand equity for listed names). See the catalogue and roadmap.
Nine of the ten sources run through the KARLA scraping harness as discrete fetchers
(kwl_*.py), each writing to a long-format, country-partitioned, source-attributed SQLite
table. Spotify is the exception — licensed via NPILABS and queried from AWS Athena. The contract that holds them
together: every row carries a source and fetched_at, so every figure in a report can name
its origin and date. That discipline is the receipts product.
| Mechanism | Sources | Reliability | Note |
|---|---|---|---|
| Official API | Eurostat, Wikipedia, GDELT, Open-Meteo, Yahoo Finance | High | Stable contracts, no parser rot — the backbone |
| RSS / public feed | Apple Music, Netflix Tudum | High | Decade-stable endpoints; Apple serves current-only (no backfill) |
| Library wrapper | Google Trends (pytrends) | Medium | Rate-limited; exponential backoff in place |
| HTML scrape | Amazon Music | Medium | Brittle to layout change; artist enrichment pending |
| Licensed warehouse | Spotify (NPILABS Athena) | High | The deep asset; audio-features endpoint frozen by Spotify Nov 2024 |
1. France-centric. Seven of ten sources are France-only today. The schemas are country-partitioned from day one, so widening is config not rebuild — but the multi-market story is a roadmap item, not a current fact.
2. The pipeline must run. The deep asset is accumulated history — charts are ephemeral and cannot be backfilled once missed. Every day the fetchers run, the moat deepens; the cron is the most strategic line of code in the company. (See Monetisation for why this is the real asset.)