Given a Gaussian function that depicts the well-known bell-shaped curve.

We can compute the integral.

Dividing the Gaussian function by this factor scales the area under the curve to one which gives us the probability density function of the standard normal distribution.

One can see that including the factor of \(\frac{1}{2}\) in the Gaussian function gives us the established definition of the standard normal distribution having \(\mu = 0\) and \(\sigma = 1\).

If one wonders why the normalization constant includes \(\pi\) in the denominator, the answer is a bit more involved but the short version is that when computing the Gaussian Integral the variables are transformed to polar coordinates.

I stumbled over this derivation while peeking into The Principles of Deep Learning Theory.

The text is rather dense, but appears to be a wortwhile read.