- Machine Learning
- Machine Learning Tutorial
- Getting Started
- Mean Median Mode
- Standard Deviation
- Percentile
- Data Distribution
- Normal Data Distribution
- Scatter Plot
- Linear Regression
- Polynomial Regression
- Multiple Regression
- Scale
- Train-Test
- Decision Tree
- Confusion Matrix
- Hierarchical Clustering
- Logistic Regression
- Grid Search
- Categorical Data
- K-means
- Bootstrap Aggregation
- Cross Validation
- AUC - ROC Curve
- K-nearest neighbors
- Python MySQL
- Python - MySQL
- MySQL Get Started
- MySQL Create Database
- MySQL Create Table
- MySQL Insert
- MySQL Select
- MySQL Where
- MySQL Order By
- MySQL Delete
- MySQL Drop Table
- MySQL Update
- MySQL Limit
- MySQL Join
- Python MongoDB
- Python - MongoDB
- MongoDB Get Started
- MongoDB Create DB
- MongoDB Collection
- MongoDB Insert
- MongoDB Find
- MongoDB Query
- MongoDB Sort
- MongoDB Delete
- MongoDB Drop Collection
- MongoDB Update
- MongoDB Limit
- Selected Reading
- Q&A
Normal Data Distribution:
In the previous chapter we learned how to create a completely random array, of a given size, and between two given values.
In this chapter we will learn how to create an array where the values are concentrated around a given value.
In probability theory this kind of data distribution is known as the normal data distribution, or the Gaussian data distribution, after the mathematician Carl Friedrich Gauss who came up with the formula of this data distribution.
Example
A typical normal data distribution:
import numpy
import matplotlib.pyplot as plt
x =
numpy.random.normal(5.0, 1.0, 100000)
plt.hist(x, 100)
plt.show()
Result:
Note: A normal distribution graph is also known as the bell curve because of it's characteristic shape of a bell.
Histogram Explained
We use the array from the numpy.random.normal()
method, with 100000 values, to draw a histogram with 100 bars.
We specify that the mean value is 5.0, and the standard deviation is 1.0.
Meaning that the values should be concentrated around 5.0, and rarely further away than 1.0 from the mean.
And as you can see from the histogram, most values are between 4.0 and 6.0, with a top at approximately 5.0.