Building and Evaluating a Model

Concept: Now that the data is preprocessed and ready, we can build a machine learning model. Models like K-Nearest Neighbors (KNN), Decision Trees, and Logistic Regression help us classify or predict based on the input data. We also evaluate how well the model performs using metrics like accuracy, precision, and recall.

Simple Example: Let’s build a model that predicts whether a student will pass or fail based on their scores. First, we train the model with some sample data:

from sklearn.neighbors import KNeighborsClassifier
x_train = [[50, 55], [60, 65], [70, 75], [85, 80]] # Scores in two subjects
y_train = [0, 0, 1, 1] # 0 = Fail, 1 = Pass
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(x_train, y_train)

Once the model is trained, we can use it to predict whether a new student will pass or fail:

x_test = [[65, 70]]
prediction = knn.predict(x_test)
print('Predicted:', 'Pass' if prediction == 1 else 'Fail')

Exoplanet Data: In the exoplanet project, we use machine learning models to classify stars and exoplanets. Let’s start with K-Nearest Neighbors (KNN):

knn_model = KNeighborsClassifier()
knn_model.fit(train_X, train_y)
predictions = knn_model.predict(test_X)

We also evaluate the model’s performance using metrics like confusion matrices and ROC curves to see how accurately the model distinguishes between stars and exoplanets:

from sklearn.metrics import confusion_matrix
print(confusion_matrix(test_y, predictions))

Building and evaluating models is the final step in the machine learning pipeline, and it’s crucial to choose the right model and evaluate its effectiveness using appropriate metrics.