What LoanTape Parses and Why It Matters
LoanTape parses public loan data from two sources: SEC filings for auto ABS securitizations and SBA FOIA releases for small business lending. We normalize everything into clean, queryable datasets and update them on a recurring schedule.
Here is what we cover, how big each dataset is, and what fields you get.
| Dataset | Source | Records | Fields | Coverage | Updates |
|---|---|---|---|---|---|
| ABS-EE loan-level | SEC EDGAR XML | 9.5M+ loans | 11+ | 2016-present | Monthly |
| Form 10-D distributions | SEC EDGAR | 20+ issuers | 9+ | 2016-present | Monthly |
| ABS remittance | SEC 10-D filings | 20+ issuers | 8+ | 2016-present | Monthly |
| SBA 7(a) loans | data.sba.gov FOIA | 1.2M+ loans | 14+ | FY1991-present | Quarterly |
| SBA 504 loans | data.sba.gov FOIA | 300K+ loans | 12+ | FY1991-present | Quarterly |
ABS-EE asset-level data
Every auto ABS trust files ABS-EE asset-level XML with the SEC under Regulation AB-II. Each loan in the pool gets its own record. We parse these XML files into a flat relational schema with 11+ normalized fields per loan:
| Field | What it tells you |
|---|---|
| FICO at origination | Borrower credit quality when the loan was underwritten |
| Original balance | Loan size at funding |
| Current balance | How much is still owed |
| Loan-to-value (LTV) | Collateral coverage at origination |
| Payment-to-income (PTI) | Monthly payment as a share of borrower income |
| Interest rate | Coupon on the loan |
| Remaining term | Months left until maturity |
| Payment status | Current, 30 DPD, 60 DPD, 90+ DPD, etc. |
| Geographic state | Borrower location for regional risk analysis |
| Charged-off flag | Whether the loan has defaulted |
| Repaid/prepaid flag | Whether the borrower paid off early |
As of March 2026, that is 9.5 million individual loan records. Here is how the volume breaks down by issuer:
| Issuer | Loans | Approx. balance |
|---|---|---|
| Santander | 1,945,000 | $39.2B |
| CarMax | 785,000 | $11.7B |
| Honda | 780,000 | $12.4B |
| WorldOmni | 770,000 | $16.5B |
| Toyota | 718,000 | $13.2B |
| Exeter | 649,000 | $12.3B |
| GM Financial | 577,000 | $11.8B |
| Hyundai | 661,000 | $11.6B |
| Nissan | 594,000 | $9.1B |
| Ford | 339,000 | $9.0B |
| Carvana | 300,000 | $5.2B |
| Total (all issuers) | 9,561,000+ | $179B+ |
Form 10-D distribution reports
Form 10-D is the monthly distribution report that every auto ABS trust files with the SEC. Within 15 days of each payment date, the trust reports what happened in the pool that month: how much was collected from borrowers, how many loans went delinquent, how losses flowed through the waterfall, and how much each tranche of bondholders received.
We have normalized 10-D data across 20+ issuers going back to 2016, updated monthly. Every issuer reports slightly differently, so we map all fields to a common schema. You can compare Ally to Toyota to Exeter without reconciliation work.
Remittance data
We extract remittance-level data from the 10-D filings. That is 8+ fields per reporting period:
| Field | What it tells you |
|---|---|
| Beginning/ending pool balance | How fast the pool is paying down |
| Collections | Cash received from borrowers that month |
| Net losses | Charged-off balances minus recoveries |
| Cumulative net loss (CNL) | Total losses since the deal closed |
| Prepayment speed (CPR) | How fast borrowers are paying off early |
| Delinquency buckets | Counts and balances by 30/60/90+ days past due |
| Servicer advances | Amounts the servicer fronted to cover shortfalls |
| Excess spread | Interest collected minus interest owed to bondholders |
Broken out by issuer and reporting period, this gives you a monthly time series of trust economics going back to 2016.
SBA 7(a) loan data
We parse SBA 7(a) loan performance data published through FOIA requests on data.sba.gov. The 7(a) program is the SBA's primary lending program: general-purpose business loans up to $5 million. Our dataset contains every 7(a) loan approved since fiscal year 1991, over 1.2 million loans with $31 billion approved in FY2024 alone.
SBA 504 loan data
The 504/CDC program covers long-term fixed-rate financing for commercial real estate and heavy equipment. We have 300,000+ loans in this dataset. The 504 program uses a three-party structure: a Certified Development Company provides the SBA-backed portion, a third-party lender covers the senior debt, and the borrower contributes equity.
Fields across both SBA programs
| Field | 7(a) | 504 | What it tells you |
|---|---|---|---|
| Approval amount | Yes | Yes | Loan size at approval |
| Gross chargeoff | Yes | Yes | Amount written off on default |
| Interest rate | Yes | Yes | Coupon at origination |
| Loan term (months) | Yes | Yes | Maturity length |
| NAICS code | Yes | Yes | Industry classification (mapped to 20 sectors) |
| Lender name | Yes | Yes | Originating bank or CDC |
| Borrower state | Yes | Yes | Geographic location |
| Business type | Yes | Yes | Corporation, LLC, sole proprietor, etc. |
| Jobs supported | Yes | Yes | Jobs reported at approval |
| SBA guarantee % | Yes | No | Portion backed by the government |
| Revolving flag | Yes | No | Line of credit vs. term loan |
| CDC name | No | Yes | Certified Development Company |
| Third-party lender | No | Yes | Senior debt holder in the 3-party structure |
| Project county | No | Yes | County-level geography |
Chargeoff rates by industry
The data gets interesting when you cut it by industry. NAICS sectors have very different chargeoff profiles in the 7(a) program:
| NAICS sector | Example businesses | Relative chargeoff risk |
|---|---|---|
| Accommodation & Food Services (72) | Restaurants, hotels | High |
| Retail Trade (44-45) | Stores, dealerships | Above average |
| Construction (23) | Contractors, builders | Above average |
| Healthcare (62) | Clinics, dental offices | Below average |
| Professional Services (54) | Law firms, consultants | Low |
| Finance & Insurance (52) | Brokerages, agencies | Low |
That kind of breakdown is hard to get from the raw CSVs without cleaning up inconsistent NAICS codes and normalizing vintage-level cohorts.
What we do with it
The pipeline runs daily for auto ABS and quarterly for SBA data. Every time a new SEC filing hits EDGAR or the SBA updates their FOIA files, we parse, validate, and load the data.
On the analytics side, we build visualizations from these datasets: vintage loss curves that show how different origination years are aging, Markov chain transition matrices that model how loans move between Current, 1-29 DPD, 30 DPD, 60 DPD, 90+ DPD, Charged Off, and Cash Collected states, roll-rate heatmaps, issuer scorecards with sparklines, and FICO distribution breakdowns by issuer. You can see examples in our dashboard gallery.
This blog is where we write about what the data shows. Delinquency trends by issuer. How subprime pools compare to prime. Which SBA lenders have the best track records. What vintage curves tell you about credit tightening. Every number in every article comes from the pipeline.
Who this is for
We built LoanTape because we needed a clean, normalized version of these public datasets and couldn't find one. If you work in structured finance, credit risk, or SBA lending and have spent time wrestling with raw EDGAR XML or inconsistent SBA CSVs, you already know why this exists.
Browse the datasets or subscribe to the RSS feed for new articles.