Cost Estimation Methods

Level: Intermediate Module: Cost Terms & Cost Behavior 7 min read Lesson 7 of 67

Overview

What you’ll learn: The high-low method for estimating cost functions, simple regression analysis, interpreting R-squared, choosing between estimation methods, and the data requirements for reliable cost estimation.
Prerequisites: Lesson 6 — Determining How Costs Behave
Estimated reading time: 20 minutes

Introduction

The Grand Historian records: In Lesson 6, we surveyed the landscape of cost estimation — the industrial engineering method, account analysis, and the scatter plot. Each has its merits, but each also has a fatal flaw: subjectivity. Two analysts examining the same scatter plot may draw different lines and arrive at different cost functions. The management accountant who presents subjective estimates to the CFO had better have an excellent explanation — or an updated resume.

This lesson introduces two quantitative methods that replace human judgment with mathematical precision. The high-low method offers quick-and-dirty estimation using only two data points. Regression analysis offers rigorous, statistically defensible estimation using all available data points. Together, they form the quantitative backbone of cost estimation. The high-low method is the cavalry scout — fast but imprecise. Regression is the siege artillery — slower to deploy but devastatingly accurate.

The High-Low Method

The high-low method estimates the variable and fixed components of a mixed cost using only the highest and lowest activity observations in the dataset. It draws a straight line between the two extreme points.

Step-by-Step Process

Given the following monthly maintenance cost data:

Month	Machine Hours (X)	Maintenance Cost (Y)
January	1,500	$10,200
February	2,000	$12,400
March	2,500	$13,800
April	3,000	$16,200
May	3,500	$17,600
June	4,000	$19,800

Step 1: Identify the highest and lowest activity levels:

Highest: June — 4,000 machine hours, $19,800
Lowest: January — 1,500 machine hours, $10,200

Step 2: Calculate the variable cost per unit of activity (the slope):

b = (Y_high – Y_low) / (X_high – X_low) = ($19,800 – $10,200) / (4,000 – 1,500) = $9,600 / 2,500 = $3.84 per machine hour

Step 3: Calculate the fixed cost component (the y-intercept):

a = Y_high – b(X_high) = $19,800 – $3.84(4,000) = $19,800 – $15,360 = $4,440

Step 4: Write the cost function:

Y = $4,440 + $3.84X

Advantages of the High-Low Method

Simple, fast, requires no special software — a calculator suffices
Easy to understand and explain to non-accountants
Useful for quick preliminary estimates

Limitations of the High-Low Method

Uses only two data points: All other observations are ignored. If the two extreme points happen to be outliers, the estimate will be severely distorted.
Sensitive to outliers: A single unusual month at the high or low end can skew the entire cost function.
No measure of goodness of fit: Unlike regression, there is no R-squared or statistical test to evaluate accuracy.
Ignores the pattern: The relationship suggested by the middle observations is completely ignored.

The high-low method is acceptable for quick estimates and exam problems, but in professional practice, regression analysis is almost always preferred.

Regression Analysis

Regression analysis is a statistical method that fits a line to the data by minimizing the sum of squared differences between the actual and predicted values. It uses all data points, not just two, and provides statistical measures of the quality of the fit.

Simple Linear Regression

Simple linear regression estimates the equation Y = a + bX where:

Y = dependent variable (total cost)
a = intercept (estimated fixed cost)
b = slope (estimated variable cost per unit of activity)
X = independent variable (cost driver / activity level)

The regression algorithm finds the values of a and b that minimize the sum of squared residuals — the vertical distances between each data point and the regression line. This is called the least-squares method, and it produces the mathematically “best” line through the data.

Interpreting Regression Output

Most spreadsheet programs (Excel, Google Sheets) and statistical software produce regression output that includes:

Statistic	What It Tells You
Intercept (a)	Estimated fixed cost component
Slope (b)	Estimated variable cost per unit of the cost driver
R-squared (R²)	Proportion of cost variation explained by the cost driver (0 to 1)
t-statistic	Tests whether the slope is statistically different from zero
p-value	Probability that the relationship is due to chance (lower is better; < 0.05 is standard)
Standard error	Average size of the prediction errors

R-Squared: The Coefficient of Determination

R-squared (R²) is the single most important statistic for evaluating a cost function. It measures the percentage of variation in the dependent variable (cost) that is explained by the independent variable (cost driver):

R² = 0.90 means 90% of the variation in cost is explained by the cost driver — excellent fit.
R² = 0.50 means only 50% is explained — moderate fit, consider other drivers.
R² = 0.10 means only 10% is explained — poor fit, the wrong driver was chosen.

An R-squared close to 1.0 is ideal, but context matters. In some industries, R² = 0.70 may be the best achievable. The key question is always: “Is this the best cost driver available?”

Comparing High-Low and Regression

Using our maintenance cost data, the regression output might show:

Intercept (a) = $4,120 (vs. $4,440 from high-low)
Slope (b) = $3.92/hour (vs. $3.84 from high-low)
R² = 0.993 (excellent fit)

The regression estimates differ from the high-low estimates because regression uses all six data points, not just two. In this case, the differences are modest, but with noisier data or outliers, the gap can be enormous.

Choosing the Right Method

Method	Data Required	Speed	Accuracy	Best For
Industrial engineering	Physical specs, time studies	Slow	Varies	New products, standard costing
Account analysis	One period’s ledger data	Fast	Low-moderate	Preliminary estimates, simple cost structures
High-low	Highest and lowest observations	Very fast	Moderate	Quick estimates, classroom exercises
Regression	All historical observations	Moderate	High	Professional practice, important decisions

In professional practice, regression analysis is the gold standard. It uses all available data, provides statistical measures of fit (R², t-statistics, p-values), and can be extended to multiple drivers (multiple regression, covered in Lesson 8).

Data Requirements for Reliable Estimation

Even regression analysis cannot rescue bad data. For reliable results:

Sufficient observations: At least 15–20 data points for simple regression; more for multiple regression. Fewer observations reduce statistical reliability.
Adequate variation: The cost driver must vary meaningfully across observations. If machine hours are nearly the same every month, the data cannot reveal the cost-activity relationship.
Consistent environment: All observations should come from the same operating environment. If the company switched to a new production process midway through the data period, use only post-switch data.
Matched periods: Costs and activity must be measured in the same time period. If maintenance is billed with a one-month lag, adjust the data so costs align with the activity that caused them.
Outlier investigation: Do not mechanically exclude outliers. Investigate each one — it may be a data error (exclude it), an unusual but real event (consider excluding), or a signal that the cost structure has changed (segment the data).

Key Takeaways

The high-low method estimates variable and fixed costs using only the highest and lowest activity observations — fast but ignores all other data and has no measure of fit.
Regression analysis uses all data points and the least-squares method to find the best-fit line, providing R-squared, t-statistics, and p-values for evaluation.
R-squared measures the proportion of cost variation explained by the cost driver — closer to 1.0 is better.
Regression is the professional standard for cost estimation; high-low is a quick preliminary tool.
Reliable estimation requires sufficient observations, adequate variation, a consistent environment, matched periods, and careful outlier investigation.

What’s Next

In Lesson 8, we venture into advanced territory: multiple regression (using two or more cost drivers simultaneously), nonlinear cost functions, learning curves, and the criteria for choosing cost drivers in complex environments. This is where cost estimation becomes both an art and a science.

繁體中文

概述

學習目標：高低法、迴歸分析、R 平方之解讀、估計方法之選擇，以及可靠成本估計之資料需求。
先決條件：第 6 課——判斷成本如何變動
預計閱讀時間：20 分鐘

簡介

太史公曰：第 6 課概覽了成本估計之版圖——工業工程法、帳戶分析、散佈圖。各有其長，亦各有致命之缺：主觀性。兩位分析師面對同一散佈圖，可能畫出不同的線，得出不同的成本函數。向財務長呈報主觀估計之管理會計師，最好備妥絕佳解釋——或更新過的履歷。

本課引入兩種以數學精確度取代人為判斷之定量方法。高低法僅用兩個資料點提供速戰速決之估計。迴歸分析使用所有資料點提供嚴謹、統計上站得住腳之估計。

高低法

使用資料集中活動量最高與最低之觀察值來估計混合成本之變動與固定組成部分。

步驟：

識別最高與最低活動水準
計算斜率 b =（Y高 – Y低）/（X高 – X低）
計算截距 a = Y高 – b × X高
寫出成本函數 Y = a + bX

優點：簡單快速，無需特殊軟體。缺點：僅用兩個資料點，對離群值敏感，無擬合優度衡量。

迴歸分析

使用所有資料點之統計方法，透過最小化預測值與實際值之平方差總和來擬合直線（最小平方法）。

解讀迴歸輸出

統計量	意義
截距（a）	估計之固定成本
斜率（b）	估計之單位變動成本
R²	成本動因解釋之成本變異比例（0 至 1）
t 統計量	檢驗斜率是否顯著不為零
p 值	關係歸因於偶然之機率（< 0.05 為佳）

R 平方：決定係數

R² = 0.90 表示 90% 之成本變異由成本動因解釋——優良擬合。R² = 0.10 表示僅 10%——擬合不佳，選錯了動因。

方法選擇

方法	速度	準確度	最適用於
工業工程法	慢	不一	新產品、標準成本
帳戶分析	快	低至中	初步估計
高低法	極快	中	速估、課堂練習
迴歸分析	中	高	專業實務、重要決策

可靠估計之資料需求

充足觀察值（簡單迴歸至少 15–20 個）
成本動因須有足夠變異
觀察值來自一致之營運環境
成本與活動之期間須匹配
調查離群值——勿機械式排除

重點摘要

高低法僅用最高與最低活動觀察值——快速但忽略其他資料。
迴歸分析使用所有資料點，提供 R²、t 統計量與 p 值。
R² 衡量成本動因所解釋之成本變異比例。
迴歸為專業標準；高低法為快速初步工具。
可靠估計需要充足觀察值、足夠變異、一致環境與期間匹配。

下一步

第 8 課進入進階領域：多元迴歸、非線性成本函數、學習曲線，以及複雜環境中成本動因之選擇準則。

日本語

概要

学習内容：高低法、回帰分析、R²の解釈、推定方法の選択、信頼性の高いコスト推定のためのデータ要件。
前提条件：レッスン6——コスト態様の判定
推定読了時間：20分

はじめに

太史公曰く：レッスン6ではコスト推定の全体像を概観した——IE法、勘定科目分析、散布図。それぞれに長所はあるが、致命的な欠点がある：主観性。二人のアナリストが同じ散布図を見て異なる線を引き、異なるコスト関数に到達しうる。本課では人間の判断を数学的精密さで置き換える二つの定量的手法を紹介する。高低法は二点のみを使う迅速な推定。回帰分析はすべてのデータ点を使う厳密で統計的に防御可能な推定である。

高低法

データセット中の最高・最低活動量の観察値のみを使って混合費の変動・固定要素を推定する。

手順：

最高・最低活動レベルを特定
傾き b =（Y高 – Y低）/（X高 – X低）を計算
切片 a = Y高 – b × X高を計算
コスト関数 Y = a + bX を記述

利点：単純・迅速、特別なソフトウェア不要。欠点：二点のみ使用、外れ値に敏感、適合度の指標なし。

回帰分析

すべてのデータ点を使い、予測値と実際値の差の二乗和を最小化して直線を当てはめる統計手法（最小二乗法）。

回帰出力の解釈

統計量	意味
切片（a）	推定固定費
傾き（b）	推定単位変動費
R²	コストドライバーが説明するコスト変動の割合（0～1）
t統計量	傾きがゼロと統計的に異なるかの検定
p値	関係が偶然による確率（0.05未満が望ましい）

R²：決定係数

R²=0.90は90%のコスト変動がドライバーで説明される——優れたフィット。R²=0.10は10%のみ——不良フィット。

方法の選択

方法	速度	精度	最適用途
IE法	遅い	場合による	新製品・標準原価
勘定科目分析	速い	低〜中	予備的推定
高低法	非常に速い	中	迅速推定・授業演習
回帰分析	中程度	高い	実務・重要意思決定

信頼性の高い推定のデータ要件

十分な観察数（単純回帰で最低15〜20）
コストドライバーの十分な変動
一貫した運営環境からの観察
コストと活動の期間を一致させる
外れ値を調査——機械的に除外しない

重要ポイント

高低法は最高・最低活動の観察値のみ使用——迅速だが他データを無視。
回帰分析はすべてのデータ点を使い、R²・t統計量・p値を提供。
R²はコストドライバーが説明するコスト変動の割合を測定。
回帰は実務の標準、高低法は迅速な予備ツール。
信頼性には十分な観察・変動・一貫した環境・期間一致が必要。

次のステップ

レッスン8では上級領域に踏み入る：重回帰、非線形コスト関数、学習曲線、コストドライバー選択基準。

Lessons