In most of the world, the people you most want to lend to are invisible to a traditional bureau. Thin file, no file, first-time borrowers — the engine of digital and emerging-market lending. A bureau score either doesn’t exist or says “insufficient history,” and a naïve lender reads that as “decline.” The opportunity is scoring those customers well with the data you actually have. Here’s how to do it without quietly building a model that blows up later.
First, what “alternate data” really means. It’s a spectrum, and the signal isn’t evenly spread:
• Cash-flow / bank-transaction data — income regularity, balance volatility, days at zero before payday, essential-vs-discretionary spend. The highest-signal source, by some distance.
• Telco, utility, and rent payments — useful proof of payment discipline.
• Device, behavioral, and digital-footprint signals — weaker, noisier, more sensitive.
• Platform / e-commerce history — strong if you own the platform.
If you take one thing from this issue: start with cash-flow data. For a digital lender it often beats the thin bureau file outright, because it shows actual ability-to-pay in near-real-time, not a stale summary.
How to build it (without the usual traps)
1. Nail the target and the sample. A crisp “bad” definition and a sensible performance window come before any modeling. Expect a cold-start problem — no history means you’ll need a champion/manual phase or proxies to gather performance before a full model is possible.
2. Engineer features that mean something. From cash flow: net-income trend, income volatility, overdraft frequency, balance-to-zero days, essential-spend ratio. Signal beats volume every time — a handful of strong cash-flow features will outperform a hundred exotic ones.
3. Hunt for leakage and selection bias. Drop any feature that won’t be available at decision time. And remember your through-the-door population is shaped by your old policy — without reject inference, you’re modeling a biased slice and won’t know it.
4. Keep it explainable. For credit you must produce a decline reason and pass fairness testing. A scorecard or a monotonic, explainable model beats a black box you can’t defend — especially as regulators classify credit scoring as “high-risk.”
5. Validate hard, then govern it. Out-of-time sample, performance by segment, disparate-impact testing, and stability monitoring. Then treat it as a governed model: documentation, independent validation, ongoing monitoring, champion/challenger.
The traps that catch people
• “More data = better model.” It doesn’t. Signal, not volume.
• Using data you don’t have clean consent for — a fast track to a regulatory problem.
• Ignoring drift. Behavioral and alt-data signals move fast; a model that’s stable today degrades quietly.
• Shipping a black box. If you can’t explain a decline, you can’t ship it.
If you only do one thing
Don’t wait to build a full alt-data model. Add a few cash-flow features as overlays and reason codes to your existing scorecard first. It’s the fastest path to measurable lift, and it builds the data and the muscle for the full model later.
Next week: IFRS 9 / ECL, explained for the people who actually have to run it.
If this was useful, forward it to someone building an underwriting model.
Views are my own and don’t represent my employer.
