KWL · Data Book Vol. I · Jun 2026 · Internal

§ The expansion roadmap

What to add,
and why.

Every candidate below is public-domain or scrapable — no licence cost. They are ranked by (value to the read) × (ease of access), each with a status, a priority, and the reason it earns a place. The rule that governs the whole list: a source only ships if it can be named and dated in the report's receipts table. Four strategic threads run through it — close the France-only gap, harden the mood anchor, complete the trade family (stocks + auctions), and open the unclaimed city-level altitude.

① Close the geography gap

Seven of ten live sources are France-only. OECD (macro), per-language Wikipedia, per-market Trends/GDELT queries make multi-market real — the biggest single lever on enterprise value.

② Harden the mood anchor

Spotify froze audio features Nov 2024. Spotify Charts CSV + YouTube Charts + Deezer + in-house Essentia features remove the single point of failure.

③ Complete the "trade" family

The hero promises six verbs; the data keeps five. Promote stocks to a signal (sector rotation + brand equity) and integrate the ART/Salle auction data — the wealth tier. See the Data Book catalogue, sources 11 & 12.

④ Claim the city altitude

Shazam cities + Ticketmaster venues + radio playlists let Cadence publish sub-national reads. "Lyon vs Paris" is a sentence no incumbent can write.

A

Build next

high value · low friction

High value, low friction, mostly official APIs or RSS that reuse code we already have. Two of these — MusicBrainz and TMDB — are not new signal but enrichment that unblocks insights already half-built (origin classification, Netflix genre/origin).

SourceFamilyWhat it addsHowPriorityStatusWhy now
Spotify Charts CSVMoodOfficial daily/weekly Top 200 + Viral 50, 70+ marketsCSV downloadP1Not startedRemoves single-point dependency on Athena; own the anchor directly
MusicBrainzEnrichmentArtist origin/areaFree APIP1Not startedUnblocks origin classifier v3 — fixes the weakest receipt in the France report, no IAM dependency
TMDBEnrichmentGenre + origin for Netflix titlesFree keyP1Pending in fetcherTurns Netflix from films/TV into local-vs-international — the cross-medium corroboration the thesis needs
OECD data APIMacroConsumer confidence for non-EU markets (JP, KR, US…)Free SDMXP1Not startedExtends the Eurostat ground-truth pattern worldwide — the macro half of multi-market
Shazam chartsDiscoveryTop 200 per country and per cityPublic JSONP1Not startedThe only city-level music demand signal — the unclaimed sub-national altitude
Deezer APIMoodPublic chart API, strong French depthFree RESTP2Not startedAnchor redundancy + a France-flagship corroborator
YouTube ChartsMoodWeekly top songs/artists; strong in EM marketsScrape (JSON)P2Not startedCovers markets where Spotify is weak (India, Brazil depth)
Apple PodcastsNarrativeTop podcasts per country/genre — what markets think aboutRSS (reuse)P2Not startedReuses the Apple fetcher pattern; a new "discourse" angle near-free
Apple App Store chartsIntentTop apps per country — finance/dating/game surgesRSS (reuse)P2Not startedApp-category surges are behavioural mood data; same fetcher family
Ticketmaster DiscoveryLive cultureEvents, on-sales, price ranges per marketFree keyP2Not startedRevealed discretionary-spend appetite + the venue half of city-level
TikTok Creative CenterViralityTrending sounds/hashtags — where culture startsScrape (JSON)P2Not startedThe youth-culture leading edge incumbents charge a fortune for
Spotify Podcast ChartsNarrativeTop/trending podcasts per marketPublic JSONP3Not startedPairs with Apple Podcasts for a full attention-of-listening layer
B

Per edition / market

add when a report needs it

Strong, specific additions that earn their place when a particular edition or market calls. Several open whole new signal families the current stack lacks — gaming, reading, citizen discourse.

SourceFamilyWhat it addsHowPriorityWhy
National box office (CNC, KOFIC)NarrativeCinema attendance — paid attention Netflix can't givePublic CSV/pagesP2France's CNC is clean and weekly; the paid-screen complement
Steam / SteamDBGamingConcurrent players, top sellers per regionPublic JSONP2Gaming is the biggest cultural blind spot in every incumbent report
TwitchGamingTop categories/streamers, viewer countsFree APIP3Pairs with Steam for the gaming-culture layer
Reddit (country subs)DiscourseCitizen topic + sentiment, r/france etc.Free OAuthP2Complements GDELT's news tone with bottom-up citizen tone
Bluesky / MastodonDiscourseOpen social streams, post-Twitter-APIFree firehoseP3Genuinely open; good for event attribution
OnlineradioboxMood (45+)Radio airplay per station/countryScrapeP2The older cohort streaming charts miss — answers the Spotify-skew critique
Eurobarometer / ECB surveyValidationPopulation-mood ground truth beyond CCIPublic downloadP2Strengthens the proxy-validation story; another receipt for the method
Football resultsMood eventsNational-team results — datable mood shocksFree APIP3Cheap narrative receipts ("the 6 Nations weekend moved the charts")
NYT Books / bestsellersReadingBestseller lists; FNAC/Amazon booksAPI + scrapeP3Opens the "reading" family; reuses the Amazon harness
Google Play chartsIntentAndroid app charts (EM markets)ScrapeP3Pairs with Apple App Store for the full mobile picture
IMDb datasetsEnrichmentTitle metadata dumpsFree TSVP3Joins box office ↔ Netflix; supporting layer
YouTube TrendingAttentionDaily trending videos — memes, news, sportFree quotaP3Broader than music; a general attention pulse
C

Opportunistic / heavier

partnership or real engineering

Higher effort, brittle, or licence-sensitive — pursued only when a deal or an edition justifies the cost. One of these, in-house audio features, is strategically important (it's the answer to the Spotify freeze) but is real engineering, not a fetcher.

SourceSignalPriorityEffort / riskWhy it's here
Essentia / Musicnn (in-house features)MoodP1*3–4 wks engThe post-Nov-2024 audio-features replacement — strategically the most important item on any list, but it's a build not a scrape
Amazon Music streaming chartsMusicP3Playwright, brittlev2 of the live purchase fetcher; SPA-gated, 3–5 days, or pursue partnership
Genius lyrics + NLPMood (semantic)P3In-house NLPAdds what they're singing about on top of how it sounds — a new semantic mood layer
Setlist.fm / BandsintownLive cultureP3Partner keysTour density per city — the live-music half of the city altitude
OpenTable / TheForkDining outP3Fiddly scrapeBookable-slot density as a discretionary-spend proxy — original, but awkward
Vinted / eBay trendingResaleP3ToS-sensitiveSecondhand category heat = value-mood signal; legally careful
Lyst IndexFashionP3PDF-boundCite rather than ingest — quarterly, not a feed

The one rule

A source ships only if every figure derived from it can be named and dated in the report's methodology table. The pitch is traceability — a source that can't carry a receipt doesn't go in, however interesting. That discipline is why the list is short and why the new sources are mostly official APIs, not grey scrapes.