Select Page

Invoice Amount Analysis

Exploratory Regression (Excel)

 

Project Context

This project explores whether simple operational variables in Accounts Payable can help explain or predict invoice amounts. Using a simulated but realistic AP data (14,756 invoices), I built a regression model in Excel’s Analysis ToolPak and visualised the results in Power BI.
This was my first modelling project, created as part of my data analytics learning journey.

Dataset

Variables used in the model:

Variable Description
InvoiceAmount Target variable
ProcessingDays Number of days required to process the invoice
DelayedFlag 0 = on time, 1 = delayed
Category Department category (visualisation only)

Regression Model

I used Excel’s Analysis ToolPak to run a simple linear regression with two predictors:

  • ProcessingDays
  • DelayedFlag

Regression Output (Excel)

  • R²: 0.06
  • Adjusted R²: 0.06
  • Standard Error: 1558.95
  • Observations: 14,756

Coefficients

  • Intercept: 530.66
  • ProcessingDays: –0.734
  • DelayedFlag: +874.788

Predicted Invoice Amount = 530.66 – 0.734 × ProcessingDays + 874.788 × DelayedFlag

Interpretation

  • Start at £530.66
    Baseline invoice amount for a non‑delayed invoice with minimal processing time.
  • ProcessingDays: –£0.73 per day
    A very small negative effect. Statistically significant due to the large dataset, but not practically meaningful.
  • DelayedFlag: +£875
    Delayed invoices tend to be around £875 higher on average.

Example Prediction

For an invoice with:

  • 12 processing days
  • DelayedFlag = 1

[ 530.66 – (0.734 * 12) + 874.788 * 1 ]

This is the model’s predicted amount.

Model Performance

Metric Value
0.06
Observations 14,756
Standard Error 1558.95

The model is statistically significant, but the predictive power is low:

  • R² = 0.06 → The model explains only 6% of the variation in invoice amounts.
  • Residuals show a wide vertical spread → Predictions are often far from actual values.
  • Predicted amounts cluster between £500–£1,400 → Because the model only uses two simple variables.

This is expected for a simple model applied to complex financial data.

Visual Analysis

  1. Processing Days vs Invoice Amount

Shows no strong linear relationship — invoice amounts vary widely regardless of processing time.

  1. Predicted Amount vs Residual

Each dot = one invoice.

  • X‑axis: model’s predicted amount
  • Y‑axis: residual (error)
  • Colours: invoice categories

The residuals are centred around zero (good), but spread out vertically (large errors).
This confirms the model is unbiased but not highly accurate.

Key Insights

Key Insights

• Delayed invoices tend to have higher average values
• Processing time has minimal practical impact on invoice value
• Invoice values vary widely across departments
• Additional features (supplier, contract type, purchase order status) could significantly improve predictions