Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

 View Only

Theory of Errors, the Birth of Normal Distribution

By Moloy De posted Thu March 05, 2020 09:43 PM

  
Histogram of errors is found to be bell shaped.

The bell shaped curve was discovered by Carl Friedrich Gauss (1777-1855), whom many mathematical historians consider to have been the greatest mathematician of all time. Gauss was working as the royal surveyor for the king of Prussia. Surveyors measure distances. For instance, a survey crew may measure a distance to be 135.674m. To tell if that is the correct distance, they would check their work by measuring it again. The second time, they might get an answer of 135.677m. the second time they measured it. So is it 135.674m. or 135.677m.? They would have to measure it again. The next time, they might get an answer of 135.675m. Which one is it? Each time they measure they have gotten a different answer. Gauss would have them measured it about 15 times, and they would get
135.674, 135.677, 135.675, 135.675, 135.676, 135.672, 135.675, 135.674, 135.676, 135.675,
135.676, 135.674, 135.675, 135.676, 135.675

If we make up a histogram for this data, we get,

At this point we consider the true value would be the average 135.675m. accurate to the nearest millimeter.

These data are approximately normally distributed. We will get a normal distribution if there is a true answer for the distance, but as we shoot for this distance, since, to err is human, we are likely to miss the target. we are more likely to land on or near the target. As we get farther from the true value, the chances of landing there gets less and less. We can express this by saying that the rate at which the frequencies fall off is proportional to the distance from the true value.

If this were the end of the story, the histogram would be parabolically shaped, and as you got farther and farther from the true value, the frequencies would eventually become negative, and we can't have negative frequencies. We can get the frequencies to level off as they asymptotically approach zero by further requiring that the rate at which the frequencies fall off is also proportional to the frequencies themselves. Then as the frequencies approach zero, slope of the histogram also approaches zero, and the curve levels off as we get into the tail end of the curve. This gives us the following

Definition: Data are said to be normally distributed if the rate at which the frequencies fall off is proportional to the distance of the score from the mean, and to the frequencies themselves.

In practice, the value of the bell shaped curve is that we can find the proportion of the scores which lie over a certain interval. In a probability distribution, this is the area under the curve over the interval: a typical calculus problem. However, in order to use calculus to find these areas, we need a formula for the curve. We can find such a formula because our definition gives us the following differential equation.

Where k is a positive constant. Note that to the right of the mean, the curve will be decreasing and to the left, it will be increasing. We can separate the variables

and we can integrate both sides.


Take exponential of both sides.


One may equate the total probability to 1 and deduce the density function of Normal Distribution as follows.

Reference: A Derivation of the Normal Distribution
#GlobalAIandDataScience
#GlobalDataScience
0 comments
11 views

Permalink