Data Sets — The Fuel of ML - Python Machine Learning Lesson

Car	Colour	Age	Speed	AutoPass
BMW	red	5	99	Y
Volvo	black	7	86	Y
VW	gray	8	87	N
Ford	white	2	111	Y

The real power of data sets is not what you can see at a glance — it is the questions you can ask and the answers hiding inside.

A data set is like an unread diary — the stories are in there, but you have to ask the right questions. Can we predict a car''s AutoPass status from its age and speed? Only the data knows.

Asking questions with Python:

speeds = [99, 86, 87, 88, 111, 86, 103, 87, 94, 78, 77, 85, 86]
# Q: What is the average speed?
avg = sum(speeds) / len(speeds)
print(round(avg, 2))
# Output: 89.77
# Q: How many cars are faster than 90?
fast = [s for s in speeds if s > 90]
print(len(fast))
# Output: 4
# Q: What percentage are above average?
above = [s for s in speeds if s > avg]
print(f"{len(above)}/{len(speeds)}")
# Output: 5/13

With Pandas — even more powerful:

import pandas as pd
df = pd.DataFrame({"Age": [5,7,,,], : [,,,,]})
(df.describe())

DevLoom

DevLoom

DevLoom

DevLoom

Data Sets — The Fuel of ML

Data Sets — The Fuel of ML

Lesson Contents

References

From Course

Share

Topics

What Is a Data Set?

Reading Data Sets in Python

Exploring and Asking Questions of Your Data