What's in Our Auto ABS Loan-Level Dataset
The SEC has required auto ABS issuers to publish loan-level data since 2014. Every loan. Every month. 65 standardized fields per loan per period: FICO at origination, vehicle year, delinquency status, actual cash collected, chargeoff amount, modification flags. All of it, machine-readable, posted publicly on EDGAR.
The data exists. Getting it into a form you can actually use is another matter entirely.
That work — across 18 issuers — took years to get right. The result is 45.8 million unique loans, 1.14 billion monthly performance rows, and roughly 74 billion individual data points, cleaned, resolved, and ready to query.
What Regulation AB-II actually requires
The SEC's Regulation AB-II, finalized in 2014, was designed to fix a specific problem: investors in ABS couldn't see what they owned. They got pool-level summaries (weighted-average FICO, aggregate delinquency rate, monthly loss) but no visibility into individual loans or the composition of the collateral they were exposed to.
The regulation solved this with two new filing types for every monthly reporting period:
- Form 10-D: covers the distribution period at the pool level (cash flows, credit enhancement triggers, collateral performance)
- ABS-EE: the loan-level exhibit, filed alongside the 10-D, with one row per loan per reporting period
The ABS-EE is where the data lives. Every issuer on a public ABS shelf is required to file it monthly in XML format, to a standardized schema the SEC publishes. The schema covers roughly 65 fields per loan: origination attributes, current performance, cash collected, losses, hardship flags.
The SEC's intent was transparency. What it produced, from a data engineering standpoint, was a distributed archive: hundreds of XML files per month, one per trust, across dozens of issuers, going back over a decade.
What "just downloading it from EDGAR" actually looks like
This is where most institutional data projects quietly stall.
Each issuer has multiple trusts. Toyota alone has issued over 30 auto trust series since 2016. Santander has issued more. Every trust files its own ABS-EE XML every month, independently. To get a complete picture of one issuer's portfolio, you're looking at dozens of XML files per month. For 18 issuers over 9+ years, that's thousands of files just to get started.
The XML isn't trivial to parse. The Reg AB-II schema is well-defined, but real-world filings aren't always clean. Fields are sometimes missing, sometimes null where they shouldn't be, and the spec allows issuers to amend prior filings, which means the same loan can appear with different values across different accession numbers.
Then there's the cross-issuer normalization problem. Even within the same schema, field semantics drift. One issuer reports vehicle_value_source_code consistently; another leaves it blank for half their trusts. The current_delinquency_status bucket definitions are standardized, but how issuers apply them varies slightly at the edges. None of this is obvious until you're deep in the data.
And then there's infrastructure: you need somewhere to store a billion rows of monthly data, tooling to keep it current as new filings drop, and a process for detecting when a prior-period amendment changes historical values.
We built all of that. It took years and it's still running.
The issuers
We cover 18 issuers currently filing under Regulation AB-II:
| Issuer | Shelf | Segment |
|---|---|---|
| Ally | AMCAR | Prime |
| BMW | BBART | Prime |
| Bridgecrest | DRIVE | Near-prime / Subprime |
| Capital One | COPAR | Prime |
| CarMax | CARMX | Prime |
| Carvana | CZABS | Near-prime |
| Drive (Westlake) | DRIVE | Subprime |
| Exeter | EART | Subprime |
| Ford Motor Credit | FORDO | Prime |
| Fifth Third | FTABS | Prime |
| GM Financial | AMCAR/GMCAR | Prime |
| Harley-Davidson | HDMOT | Specialty |
| Honda | HAROT | Prime |
| Hyundai | HAOT | Prime |
| Mercedes-Benz | MBALT | Prime |
| Nissan | NAROT | Prime |
| Santander | SDART | Subprime |
| Toyota | TAOT | Prime |
| Volkswagen | VWALT | Prime |
| World Omni | WOART | Prime |
Between the captive finance arms (Toyota, Ford, Honda, Hyundai, BMW, Mercedes, Nissan, VW, Harley), the bank issuers (Ally, Capital One, Fifth Third, World Omni), and the non-prime lenders (Santander, Exeter, Drive, Bridgecrest, Carvana), you get a full cross-section of the market. Two additional issuers, California Republic and USAA, are in our config but disabled after their shelves wound down.
The field inventory
Reg AB-II splits auto ABS fields into two categories: static attributes set at origination, and performance attributes reported each month for the life of the loan.
What the SEC requires at origination
These 48 fields describe the borrower, the vehicle, and the loan terms as of the origination date. They shouldn't change after the fact. (More on when they do below.)
On the borrower side: obligor_credit_score (FICO at origination) and obligor_credit_score_type (the scoring model: FICO 8, VantageScore, etc.), obligor_employment_verification_code, obligor_income_verification_level (stated, verified, documented), co_obligor_indicator, payment_to_income_percentage, estimated_monthly_income, and geographic_location (borrower state).
For the vehicle: vehicle_make, vehicle_model, vehicle_model_year, vehicle_type_code (car, truck, SUV, motorcycle), vehicle_new_used_code, vehicle_value_amount, and vehicle_value_source_code (the appraisal method).
Loan terms: original_loan_amount, original_loan_term (months), scheduled_payment_amount, original_interest_rate_percentage, original_interest_rate_type_code (fixed vs. variable), loan_to_value_ratio, original_loan_to_value_ratio, original_down_payment_amount, and subvented (manufacturer rate subsidy flag).
Origination metadata: origination_date, maturity_date, originator_name, dealer_name, underwriting_program_name, lease_indicator, prepayment_penalty_indicator, balloon_indicator, interest_only_indicator, residual_value_amount.
What the SEC requires each month
Performance fields are reported once per loan per reporting period. A loan originated in 2020 with a 60-month term has 60 rows of monthly data, one for every month it was active. Across 45.8 million loans, that's where the 1.14 billion rows come from.
Each month starts with balance and delinquency status: reporting_period_beginning_loan_balance_amount, current_loan_balance_amount, current_delinquency_status (bucketed as current, 30-59, 60-89, 90-119, 120+), remaining_term_to_maturity_number, current_interest_rate_percentage.
One of the more useful parts of the spec is actual cash collected, reported separately from scheduled: total_actual_paid_amount, actual_principal_collected_amount, actual_interest_collected_amount, actual_other_collected_amount, scheduled_principal_amount, scheduled_interest_amount, periodic_rate_percentage. The gap between scheduled and actual tells you a lot about a pool before the delinquency buckets even move.
For losses and recoveries: chargedoff_principal_amount, chargeoff_date, recovered_amount, cumulative_recoveries_amount, liquidation_amount, deficiency_balance_amount, repossession_indicator, repossession_date, bankruptcy_indicator.
The hardship and modification flags are where a lot of analysts miss something important: modification_indicator, modification_type_code, forbearance_indicator, deferment_indicator, extension_indicator, skip_payment_indicator, payment_extended_number. A loan flagged as modified in the same month it shows 30+ DPD is not the same credit event as an unmodified loan at the same bucket. Most pool-level reports don't make that distinction. This data does.
Lifecycle: zero_balance_code (the reason the loan closed: paid off, charged off, repurchased), zero_balance_effective_date, paid_in_full_date, next_payment_due_date.
Servicing: primary_loan_servicer_name, servicing_fee_percentage, servicer_advancing_method, grace_period_number.
That's 63 fields per loan per month.
The restated-field problem
The Reg AB-II spec permits issuers to amend static fields after origination: a restated FICO, a corrected vehicle value, an updated origination date. Over a trust's life, the same loan can appear with different values across different accession numbers, all for what should be a fixed attribute.
This is not a rare edge case. We see it regularly, particularly for obligor_credit_score, vehicle_value_amount, and origination_date. The raw data has no built-in resolution mechanism. If you're joining on the latest filing, you might be using an amended value from a filing that also amended other fields inconsistently.
For every field with documented resolution logic, we track the resolved value, the resolution mode (earliest non-null, last reported, majority, etc.), the source accession number, and the filing date it came from. Every number in the dataset traces to a specific SEC filing. When someone asks where a number came from, we can answer.
The numbers
| What | Count |
|---|---|
| Lifetime unique loans | 45.8 million |
| Issuers | 18 active |
| Monthly performance rows | 1.14 billion |
| Total data points | ~74 billion |
| Coverage start | 2016 |
| Max depth per loan | 108 months |
| FICO bands | 7 (sub-560 through 720+, Unscored, Missing) |
Some loans paid off in 24 months. Subprime paper regularly runs to 72 or 84. Every monthly filing cycle adds tens of millions of new rows.
What you can do with it
Vintage curves are the starting point for most credit work. Origination date, original balance, and monthly chargeoff amounts are all there, so you can build cumulative loss curves by issuer, vintage year, FICO band, vehicle segment, or whatever cut you need. We pre-aggregate this so you're not writing the SQL from scratch.
Roll rates and cure rates come from joining consecutive months on current_delinquency_status. You get the full Markov transition matrix: what percentage of current loans stay current, how many 30-day loans cure, how many roll to 60. The below-prime version of that analysis is published on the dataset page.
The modification flags are where a lot gets missed. 2020 made this obvious: reported delinquency rates during forbearance looked nothing like organic delinquency rates once you stripped out loans on active hardship treatment. You can isolate that here. Pool-level data gives you one number. This gives you both.
Collateral cuts use vehicle_make, vehicle_model_year, vehicle_value_amount, and loan_to_value_ratio. Used-vehicle loans above 120% LTV have a different loss profile than new-vehicle loans below 90%. You can verify it and segment on it.
For cash flow verification: actual principal and interest collected at the loan level are reported each month. Sum by pool, reconcile against the 10-D distribution figures. When they don't match, the loan-level data tells you where to look.
Full field reference
Field names below match the standardized XML element names from the Reg AB-II ABS-EE schema, the same names you'll find in the raw EDGAR filings.
Origination / static fields
| Field | Description |
|---|---|
obligor_credit_score |
FICO or credit score at origination |
obligor_credit_score_type |
Scoring model (FICO 8, VantageScore, etc.) |
obligor_employment_verification_code |
Employment verification method |
obligor_income_verification_level |
Income verification level (stated, verified, documented) |
co_obligor_indicator |
Co-borrower present |
payment_to_income_percentage |
Payment-to-income ratio at origination |
estimated_monthly_income |
Borrower monthly income estimate |
geographic_location |
Borrower state |
vehicle_make |
Vehicle manufacturer |
vehicle_model |
Vehicle model |
vehicle_model_year |
Model year |
vehicle_type_code |
Car, truck, SUV, motorcycle, etc. |
vehicle_new_used_code |
New or used at origination |
vehicle_value_amount |
Collateral value at origination |
vehicle_value_source_code |
Appraisal method |
original_loan_amount |
Original principal balance |
original_loan_term |
Loan term in months |
scheduled_payment_amount |
Scheduled monthly payment |
original_interest_rate_percentage |
Interest rate at origination |
original_interest_rate_type_code |
Fixed or variable |
loan_to_value_ratio |
LTV using current collateral value |
original_loan_to_value_ratio |
LTV using original collateral value |
original_down_payment_amount |
Down payment at origination |
subvented |
Manufacturer rate subsidy flag |
origination_date |
Loan origination date |
maturity_date |
Scheduled maturity date |
originator_name |
Loan originator |
dealer_name |
Originating dealer |
underwriting_program_name |
Underwriting program |
lease_indicator |
Lease vs. loan |
prepayment_penalty_indicator |
Prepayment penalty present |
balloon_indicator |
Balloon payment structure |
interest_only_indicator |
Interest-only period flag |
residual_value_amount |
Residual value (leases) |
Monthly performance fields
| Field | Description |
|---|---|
reporting_period_beginning_loan_balance_amount |
Opening balance for the period |
current_loan_balance_amount |
Closing balance for the period |
current_delinquency_status |
Days delinquent (current, 30-59, 60-89, 90-119, 120+) |
remaining_term_to_maturity_number |
Remaining term in months |
current_interest_rate_percentage |
Current note rate |
current_interest_rate_type_code |
Fixed or variable |
total_actual_paid_amount |
Total cash received |
actual_principal_collected_amount |
Actual principal collected |
actual_interest_collected_amount |
Actual interest collected |
actual_other_collected_amount |
Other actual collections |
actual_payment_collected_amount |
Actual payment received |
scheduled_principal_amount |
Scheduled principal due |
scheduled_interest_amount |
Scheduled interest due |
periodic_rate_percentage |
Periodic interest rate |
other_principal_adjustment_amount |
Principal adjustments |
other_assessed_uncollected_servicer_fee_amount |
Uncollected servicer fees |
chargedoff_principal_amount |
Principal charged off this period |
chargeoff_date |
Date of chargeoff |
chargeoff_amount |
Total chargeoff amount |
recovered_amount |
Recoveries received this period |
cumulative_recoveries_amount |
Cumulative recoveries to date |
liquidation_amount |
Liquidation proceeds |
deficiency_balance_amount |
Balance remaining after liquidation |
repossession_indicator |
Vehicle repossessed |
repossession_date |
Date of repossession |
repurchase_indicator |
Loan repurchased |
repurchase_amount |
Repurchase amount |
repurchase_date |
Date of repurchase |
bankruptcy_indicator |
Obligor bankruptcy filed |
modification_indicator |
Loan was modified this period |
modification_type_code |
Type of modification |
forbearance_indicator |
Forbearance granted |
deferment_indicator |
Payment deferment granted |
extension_indicator |
Term extension granted |
skip_payment_indicator |
Skip-payment granted |
payment_extended_number |
Number of payments extended |
zero_balance_code |
Reason loan reached zero balance |
zero_balance_effective_date |
Date loan reached zero balance |
paid_in_full_date |
Paid-in-full date |
next_payment_due_date |
Next scheduled payment date |
primary_loan_servicer_name |
Current servicer |
servicing_fee_percentage |
Servicing fee rate |
servicer_advancing_method |
Advancing method |
grace_period_number |
Grace period in days |
Access
The origination dataset (all 48 static fields, back to 2016) is available through LoanTape's ABS-EE subscription. The monthly performance series (63 fields, 1.14 billion rows) is through the ABS remittance data product. Both are queryable via API or Databricks-compatible exports.
If you've looked at building this yourself and decided the engineering cost isn't worth it, that's probably the right call. The raw filing infrastructure exists, the XML is public, and the schema is documented. But the pipeline to turn it into something you can actually query takes years to build and needs to keep running every month. That's what you're paying for.
Pricing is at /#pricing. The dataset page has the full field catalog with coverage rates by issuer.