Visualizing Data with Plots
Data visualization is a powerful tool to explore and understand data. It helps us see patterns, outliers, and distributions that might not be obvious when looking at raw numbers. The most common plots include histograms, line plots, and box plots.
Simple Example: Let’s visualize the student scores using a simple histogram. A histogram shows the distribution of values in a dataset:
import matplotlib.pyplot as plt plt.hist(student_scores['Math'], bins=3, color='blue', edgecolor='black') plt.title('Distribution of Math Scores') plt.xlabel('Scores') plt.ylabel('Frequency') plt.show()
This plot will show how frequently students scored within specific ranges in Math. If you have more bins, you get a more detailed view of the score distribution.
Next, let’s create a line plot to compare scores across subjects:
plt.plot(student_scores['Math'], label='Math') plt.plot(student_scores['Science'], label='Science') plt.plot(student_scores['English'], label='English') plt.legend() plt.title('Student Scores in Different Subjects') plt.xlabel('Student Index') plt.ylabel('Scores') plt.show()
This plot shows how each student performed in different subjects.
Exoplanet Data: In the exoplanet dataset, we use histograms and line plots to explore the flux values for stars and exoplanets. For instance, we can visualize the label distribution (how many objects are exoplanets and how many are stars):
plt.hist(train_data['LABEL'], bins=[0, 0.5, 1], color='skyblue', edgecolor='black') plt.xticks([0, 1], ['0 (Star)', '1 (Exoplanet)']) plt.title('Exoplanet and Star Label Distribution') plt.show()
We can also visualize the flux values for specific stars:
plt.plot(train_data.iloc[0, 1:], label='Star 1') plt.plot(train_data.iloc[1, 1:], label='Star 2') plt.legend() plt.title('Flux Values for Different Stars') plt.show()
Visualizing the data helps us spot trends and irregularities, making it easier to understand how the model will learn from the data.