← Back to Blog Data Deep Dives

What LoanTape Parses and Why It Matters

LoanTape parses public loan data from two sources: SEC filings for auto ABS securitizations and SBA FOIA releases for small business lending. We normalize everything into clean, queryable datasets and update them on a recurring schedule.

Here is what we cover, how big each dataset is, and what fields you get.

Dataset Source Records Fields Coverage Updates
ABS-EE loan-level SEC EDGAR XML 9.5M+ loans 11+ 2016-present Monthly
Form 10-D distributions SEC EDGAR 20+ issuers 9+ 2016-present Monthly
ABS remittance SEC 10-D filings 20+ issuers 8+ 2016-present Monthly
SBA 7(a) loans data.sba.gov FOIA 1.2M+ loans 14+ FY1991-present Quarterly
SBA 504 loans data.sba.gov FOIA 300K+ loans 12+ FY1991-present Quarterly

ABS-EE asset-level data

Every auto ABS trust files ABS-EE asset-level XML with the SEC under Regulation AB-II. Each loan in the pool gets its own record. We parse these XML files into a flat relational schema with 11+ normalized fields per loan:

Field What it tells you
FICO at origination Borrower credit quality when the loan was underwritten
Original balance Loan size at funding
Current balance How much is still owed
Loan-to-value (LTV) Collateral coverage at origination
Payment-to-income (PTI) Monthly payment as a share of borrower income
Interest rate Coupon on the loan
Remaining term Months left until maturity
Payment status Current, 30 DPD, 60 DPD, 90+ DPD, etc.
Geographic state Borrower location for regional risk analysis
Charged-off flag Whether the loan has defaulted
Repaid/prepaid flag Whether the borrower paid off early

As of March 2026, that is 9.5 million individual loan records. Here is how the volume breaks down by issuer:

Issuer Loans Approx. balance
Santander 1,945,000 $39.2B
CarMax 785,000 $11.7B
Honda 780,000 $12.4B
WorldOmni 770,000 $16.5B
Toyota 718,000 $13.2B
Exeter 649,000 $12.3B
GM Financial 577,000 $11.8B
Hyundai 661,000 $11.6B
Nissan 594,000 $9.1B
Ford 339,000 $9.0B
Carvana 300,000 $5.2B
Total (all issuers) 9,561,000+ $179B+

Form 10-D distribution reports

Form 10-D is the monthly distribution report that every auto ABS trust files with the SEC. Within 15 days of each payment date, the trust reports what happened in the pool that month: how much was collected from borrowers, how many loans went delinquent, how losses flowed through the waterfall, and how much each tranche of bondholders received.

We have normalized 10-D data across 20+ issuers going back to 2016, updated monthly. Every issuer reports slightly differently, so we map all fields to a common schema. You can compare Ally to Toyota to Exeter without reconciliation work.


Remittance data

We extract remittance-level data from the 10-D filings. That is 8+ fields per reporting period:

Field What it tells you
Beginning/ending pool balance How fast the pool is paying down
Collections Cash received from borrowers that month
Net losses Charged-off balances minus recoveries
Cumulative net loss (CNL) Total losses since the deal closed
Prepayment speed (CPR) How fast borrowers are paying off early
Delinquency buckets Counts and balances by 30/60/90+ days past due
Servicer advances Amounts the servicer fronted to cover shortfalls
Excess spread Interest collected minus interest owed to bondholders

Broken out by issuer and reporting period, this gives you a monthly time series of trust economics going back to 2016.


SBA 7(a) loan data

We parse SBA 7(a) loan performance data published through FOIA requests on data.sba.gov. The 7(a) program is the SBA's primary lending program: general-purpose business loans up to $5 million. Our dataset contains every 7(a) loan approved since fiscal year 1991, over 1.2 million loans with $31 billion approved in FY2024 alone.

SBA 504 loan data

The 504/CDC program covers long-term fixed-rate financing for commercial real estate and heavy equipment. We have 300,000+ loans in this dataset. The 504 program uses a three-party structure: a Certified Development Company provides the SBA-backed portion, a third-party lender covers the senior debt, and the borrower contributes equity.

Fields across both SBA programs

Field 7(a) 504 What it tells you
Approval amount Yes Yes Loan size at approval
Gross chargeoff Yes Yes Amount written off on default
Interest rate Yes Yes Coupon at origination
Loan term (months) Yes Yes Maturity length
NAICS code Yes Yes Industry classification (mapped to 20 sectors)
Lender name Yes Yes Originating bank or CDC
Borrower state Yes Yes Geographic location
Business type Yes Yes Corporation, LLC, sole proprietor, etc.
Jobs supported Yes Yes Jobs reported at approval
SBA guarantee % Yes No Portion backed by the government
Revolving flag Yes No Line of credit vs. term loan
CDC name No Yes Certified Development Company
Third-party lender No Yes Senior debt holder in the 3-party structure
Project county No Yes County-level geography

Chargeoff rates by industry

The data gets interesting when you cut it by industry. NAICS sectors have very different chargeoff profiles in the 7(a) program:

NAICS sector Example businesses Relative chargeoff risk
Accommodation & Food Services (72) Restaurants, hotels High
Retail Trade (44-45) Stores, dealerships Above average
Construction (23) Contractors, builders Above average
Healthcare (62) Clinics, dental offices Below average
Professional Services (54) Law firms, consultants Low
Finance & Insurance (52) Brokerages, agencies Low

That kind of breakdown is hard to get from the raw CSVs without cleaning up inconsistent NAICS codes and normalizing vintage-level cohorts.


What we do with it

The pipeline runs daily for auto ABS and quarterly for SBA data. Every time a new SEC filing hits EDGAR or the SBA updates their FOIA files, we parse, validate, and load the data.

On the analytics side, we build visualizations from these datasets: vintage loss curves that show how different origination years are aging, Markov chain transition matrices that model how loans move between Current, 1-29 DPD, 30 DPD, 60 DPD, 90+ DPD, Charged Off, and Cash Collected states, roll-rate heatmaps, issuer scorecards with sparklines, and FICO distribution breakdowns by issuer. You can see examples in our dashboard gallery.

This blog is where we write about what the data shows. Delinquency trends by issuer. How subprime pools compare to prime. Which SBA lenders have the best track records. What vintage curves tell you about credit tightening. Every number in every article comes from the pipeline.

Who this is for

We built LoanTape because we needed a clean, normalized version of these public datasets and couldn't find one. If you work in structured finance, credit risk, or SBA lending and have spent time wrestling with raw EDGAR XML or inconsistent SBA CSVs, you already know why this exists.

Browse the datasets or subscribe to the RSS feed for new articles.