On optimal and data-based histograms

DW Scott - Biometrika, 1979 - academic.oup.com
Biometrika, 1979academic.oup.com
In this paper the formula for the optimal histogram bin width is derived which asymptotically
minimizes the integrated mean squared error. Monte Carlo methods are used to verify the
usefulness of this formula for small samples. A data-based procedure for choosing the bin
width parameter is proposed, which assumes a Gaussian reference standard and requires
only the sample size and an estimate of the standard deviation. The sensitivity of the
procedure is investigated using several probability models which violate the Gaussian …
Abstract
In this paper the formula for the optimal histogram bin width is derived which asymptotically minimizes the integrated mean squared error. Monte Carlo methods are used to verify the usefulness of this formula for small samples. A data-based procedure for choosing the bin width parameter is proposed, which assumes a Gaussian reference standard and requires only the sample size and an estimate of the standard deviation. The sensitivity of the procedure is investigated using several probability models which violate the Gaussian assumption.
Oxford University Press