← Back to Blog Data Deep Dives

What's in Our Auto ABS Loan-Level Dataset

The SEC has required auto ABS issuers to publish loan-level data since 2014. Every loan. Every month. 65 standardized fields per loan per period: FICO at origination, vehicle year, delinquency status, actual cash collected, chargeoff amount, modification flags. All of it, machine-readable, posted publicly on EDGAR.

The data exists. Getting it into a form you can actually use is another matter entirely.

That work — across 18 issuers — took years to get right. The result is 45.8 million unique loans, 1.14 billion monthly performance rows, and roughly 74 billion individual data points, cleaned, resolved, and ready to query.


What Regulation AB-II actually requires

The SEC's Regulation AB-II, finalized in 2014, was designed to fix a specific problem: investors in ABS couldn't see what they owned. They got pool-level summaries (weighted-average FICO, aggregate delinquency rate, monthly loss) but no visibility into individual loans or the composition of the collateral they were exposed to.

The regulation solved this with two new filing types for every monthly reporting period:

  • Form 10-D: covers the distribution period at the pool level (cash flows, credit enhancement triggers, collateral performance)
  • ABS-EE: the loan-level exhibit, filed alongside the 10-D, with one row per loan per reporting period

The ABS-EE is where the data lives. Every issuer on a public ABS shelf is required to file it monthly in XML format, to a standardized schema the SEC publishes. The schema covers roughly 65 fields per loan: origination attributes, current performance, cash collected, losses, hardship flags.

The SEC's intent was transparency. What it produced, from a data engineering standpoint, was a distributed archive: hundreds of XML files per month, one per trust, across dozens of issuers, going back over a decade.


What "just downloading it from EDGAR" actually looks like

This is where most institutional data projects quietly stall.

Each issuer has multiple trusts. Toyota alone has issued over 30 auto trust series since 2016. Santander has issued more. Every trust files its own ABS-EE XML every month, independently. To get a complete picture of one issuer's portfolio, you're looking at dozens of XML files per month. For 18 issuers over 9+ years, that's thousands of files just to get started.

The XML isn't trivial to parse. The Reg AB-II schema is well-defined, but real-world filings aren't always clean. Fields are sometimes missing, sometimes null where they shouldn't be, and the spec allows issuers to amend prior filings, which means the same loan can appear with different values across different accession numbers.

Then there's the cross-issuer normalization problem. Even within the same schema, field semantics drift. One issuer reports vehicle_value_source_code consistently; another leaves it blank for half their trusts. The current_delinquency_status bucket definitions are standardized, but how issuers apply them varies slightly at the edges. None of this is obvious until you're deep in the data.

And then there's infrastructure: you need somewhere to store a billion rows of monthly data, tooling to keep it current as new filings drop, and a process for detecting when a prior-period amendment changes historical values.

We built all of that. It took years and it's still running.


The issuers

We cover 18 issuers currently filing under Regulation AB-II:

Issuer Shelf Segment
Ally AMCAR Prime
BMW BBART Prime
Bridgecrest DRIVE Near-prime / Subprime
Capital One COPAR Prime
CarMax CARMX Prime
Carvana CZABS Near-prime
Drive (Westlake) DRIVE Subprime
Exeter EART Subprime
Ford Motor Credit FORDO Prime
Fifth Third FTABS Prime
GM Financial AMCAR/GMCAR Prime
Harley-Davidson HDMOT Specialty
Honda HAROT Prime
Hyundai HAOT Prime
Mercedes-Benz MBALT Prime
Nissan NAROT Prime
Santander SDART Subprime
Toyota TAOT Prime
Volkswagen VWALT Prime
World Omni WOART Prime

Between the captive finance arms (Toyota, Ford, Honda, Hyundai, BMW, Mercedes, Nissan, VW, Harley), the bank issuers (Ally, Capital One, Fifth Third, World Omni), and the non-prime lenders (Santander, Exeter, Drive, Bridgecrest, Carvana), you get a full cross-section of the market. Two additional issuers, California Republic and USAA, are in our config but disabled after their shelves wound down.


The field inventory

Reg AB-II splits auto ABS fields into two categories: static attributes set at origination, and performance attributes reported each month for the life of the loan.

What the SEC requires at origination

These 48 fields describe the borrower, the vehicle, and the loan terms as of the origination date. They shouldn't change after the fact. (More on when they do below.)

On the borrower side: obligor_credit_score (FICO at origination) and obligor_credit_score_type (the scoring model: FICO 8, VantageScore, etc.), obligor_employment_verification_code, obligor_income_verification_level (stated, verified, documented), co_obligor_indicator, payment_to_income_percentage, estimated_monthly_income, and geographic_location (borrower state).

For the vehicle: vehicle_make, vehicle_model, vehicle_model_year, vehicle_type_code (car, truck, SUV, motorcycle), vehicle_new_used_code, vehicle_value_amount, and vehicle_value_source_code (the appraisal method).

Loan terms: original_loan_amount, original_loan_term (months), scheduled_payment_amount, original_interest_rate_percentage, original_interest_rate_type_code (fixed vs. variable), loan_to_value_ratio, original_loan_to_value_ratio, original_down_payment_amount, and subvented (manufacturer rate subsidy flag).

Origination metadata: origination_date, maturity_date, originator_name, dealer_name, underwriting_program_name, lease_indicator, prepayment_penalty_indicator, balloon_indicator, interest_only_indicator, residual_value_amount.

What the SEC requires each month

Performance fields are reported once per loan per reporting period. A loan originated in 2020 with a 60-month term has 60 rows of monthly data, one for every month it was active. Across 45.8 million loans, that's where the 1.14 billion rows come from.

Each month starts with balance and delinquency status: reporting_period_beginning_loan_balance_amount, current_loan_balance_amount, current_delinquency_status (bucketed as current, 30-59, 60-89, 90-119, 120+), remaining_term_to_maturity_number, current_interest_rate_percentage.

One of the more useful parts of the spec is actual cash collected, reported separately from scheduled: total_actual_paid_amount, actual_principal_collected_amount, actual_interest_collected_amount, actual_other_collected_amount, scheduled_principal_amount, scheduled_interest_amount, periodic_rate_percentage. The gap between scheduled and actual tells you a lot about a pool before the delinquency buckets even move.

For losses and recoveries: chargedoff_principal_amount, chargeoff_date, recovered_amount, cumulative_recoveries_amount, liquidation_amount, deficiency_balance_amount, repossession_indicator, repossession_date, bankruptcy_indicator.

The hardship and modification flags are where a lot of analysts miss something important: modification_indicator, modification_type_code, forbearance_indicator, deferment_indicator, extension_indicator, skip_payment_indicator, payment_extended_number. A loan flagged as modified in the same month it shows 30+ DPD is not the same credit event as an unmodified loan at the same bucket. Most pool-level reports don't make that distinction. This data does.

Lifecycle: zero_balance_code (the reason the loan closed: paid off, charged off, repurchased), zero_balance_effective_date, paid_in_full_date, next_payment_due_date.

Servicing: primary_loan_servicer_name, servicing_fee_percentage, servicer_advancing_method, grace_period_number.

That's 63 fields per loan per month.

The restated-field problem

The Reg AB-II spec permits issuers to amend static fields after origination: a restated FICO, a corrected vehicle value, an updated origination date. Over a trust's life, the same loan can appear with different values across different accession numbers, all for what should be a fixed attribute.

This is not a rare edge case. We see it regularly, particularly for obligor_credit_score, vehicle_value_amount, and origination_date. The raw data has no built-in resolution mechanism. If you're joining on the latest filing, you might be using an amended value from a filing that also amended other fields inconsistently.

For every field with documented resolution logic, we track the resolved value, the resolution mode (earliest non-null, last reported, majority, etc.), the source accession number, and the filing date it came from. Every number in the dataset traces to a specific SEC filing. When someone asks where a number came from, we can answer.


The numbers

What Count
Lifetime unique loans 45.8 million
Issuers 18 active
Monthly performance rows 1.14 billion
Total data points ~74 billion
Coverage start 2016
Max depth per loan 108 months
FICO bands 7 (sub-560 through 720+, Unscored, Missing)

Some loans paid off in 24 months. Subprime paper regularly runs to 72 or 84. Every monthly filing cycle adds tens of millions of new rows.


What you can do with it

Vintage curves are the starting point for most credit work. Origination date, original balance, and monthly chargeoff amounts are all there, so you can build cumulative loss curves by issuer, vintage year, FICO band, vehicle segment, or whatever cut you need. We pre-aggregate this so you're not writing the SQL from scratch.

Roll rates and cure rates come from joining consecutive months on current_delinquency_status. You get the full Markov transition matrix: what percentage of current loans stay current, how many 30-day loans cure, how many roll to 60. The below-prime version of that analysis is published on the dataset page.

The modification flags are where a lot gets missed. 2020 made this obvious: reported delinquency rates during forbearance looked nothing like organic delinquency rates once you stripped out loans on active hardship treatment. You can isolate that here. Pool-level data gives you one number. This gives you both.

Collateral cuts use vehicle_make, vehicle_model_year, vehicle_value_amount, and loan_to_value_ratio. Used-vehicle loans above 120% LTV have a different loss profile than new-vehicle loans below 90%. You can verify it and segment on it.

For cash flow verification: actual principal and interest collected at the loan level are reported each month. Sum by pool, reconcile against the 10-D distribution figures. When they don't match, the loan-level data tells you where to look.


Full field reference

Field names below match the standardized XML element names from the Reg AB-II ABS-EE schema, the same names you'll find in the raw EDGAR filings.

Origination / static fields

Field Description
obligor_credit_score FICO or credit score at origination
obligor_credit_score_type Scoring model (FICO 8, VantageScore, etc.)
obligor_employment_verification_code Employment verification method
obligor_income_verification_level Income verification level (stated, verified, documented)
co_obligor_indicator Co-borrower present
payment_to_income_percentage Payment-to-income ratio at origination
estimated_monthly_income Borrower monthly income estimate
geographic_location Borrower state
vehicle_make Vehicle manufacturer
vehicle_model Vehicle model
vehicle_model_year Model year
vehicle_type_code Car, truck, SUV, motorcycle, etc.
vehicle_new_used_code New or used at origination
vehicle_value_amount Collateral value at origination
vehicle_value_source_code Appraisal method
original_loan_amount Original principal balance
original_loan_term Loan term in months
scheduled_payment_amount Scheduled monthly payment
original_interest_rate_percentage Interest rate at origination
original_interest_rate_type_code Fixed or variable
loan_to_value_ratio LTV using current collateral value
original_loan_to_value_ratio LTV using original collateral value
original_down_payment_amount Down payment at origination
subvented Manufacturer rate subsidy flag
origination_date Loan origination date
maturity_date Scheduled maturity date
originator_name Loan originator
dealer_name Originating dealer
underwriting_program_name Underwriting program
lease_indicator Lease vs. loan
prepayment_penalty_indicator Prepayment penalty present
balloon_indicator Balloon payment structure
interest_only_indicator Interest-only period flag
residual_value_amount Residual value (leases)

Monthly performance fields

Field Description
reporting_period_beginning_loan_balance_amount Opening balance for the period
current_loan_balance_amount Closing balance for the period
current_delinquency_status Days delinquent (current, 30-59, 60-89, 90-119, 120+)
remaining_term_to_maturity_number Remaining term in months
current_interest_rate_percentage Current note rate
current_interest_rate_type_code Fixed or variable
total_actual_paid_amount Total cash received
actual_principal_collected_amount Actual principal collected
actual_interest_collected_amount Actual interest collected
actual_other_collected_amount Other actual collections
actual_payment_collected_amount Actual payment received
scheduled_principal_amount Scheduled principal due
scheduled_interest_amount Scheduled interest due
periodic_rate_percentage Periodic interest rate
other_principal_adjustment_amount Principal adjustments
other_assessed_uncollected_servicer_fee_amount Uncollected servicer fees
chargedoff_principal_amount Principal charged off this period
chargeoff_date Date of chargeoff
chargeoff_amount Total chargeoff amount
recovered_amount Recoveries received this period
cumulative_recoveries_amount Cumulative recoveries to date
liquidation_amount Liquidation proceeds
deficiency_balance_amount Balance remaining after liquidation
repossession_indicator Vehicle repossessed
repossession_date Date of repossession
repurchase_indicator Loan repurchased
repurchase_amount Repurchase amount
repurchase_date Date of repurchase
bankruptcy_indicator Obligor bankruptcy filed
modification_indicator Loan was modified this period
modification_type_code Type of modification
forbearance_indicator Forbearance granted
deferment_indicator Payment deferment granted
extension_indicator Term extension granted
skip_payment_indicator Skip-payment granted
payment_extended_number Number of payments extended
zero_balance_code Reason loan reached zero balance
zero_balance_effective_date Date loan reached zero balance
paid_in_full_date Paid-in-full date
next_payment_due_date Next scheduled payment date
primary_loan_servicer_name Current servicer
servicing_fee_percentage Servicing fee rate
servicer_advancing_method Advancing method
grace_period_number Grace period in days

Access

The origination dataset (all 48 static fields, back to 2016) is available through LoanTape's ABS-EE subscription. The monthly performance series (63 fields, 1.14 billion rows) is through the ABS remittance data product. Both are queryable via API or Databricks-compatible exports.

If you've looked at building this yourself and decided the engineering cost isn't worth it, that's probably the right call. The raw filing infrastructure exists, the XML is public, and the schema is documented. But the pipeline to turn it into something you can actually query takes years to build and needs to keep running every month. That's what you're paying for.

Pricing is at /#pricing. The dataset page has the full field catalog with coverage rates by issuer.