张芷铭的个人博客

scikit-learn 是 Python 机器学习库,提供分类、回归、聚类、降维等算法,以及数据预处理和模型评估工具。

官方文档:https://scikit-learn.org/stable/

主要功能

功能算法示例
分类KNN、决策树、随机森林、SVM
回归线性回归、岭回归、Lasso
聚类K-means、层次聚类、DBSCAN
降维PCA、t-SNE
模型选择交叉验证、网格搜索
预处理标准化、归一化、缺失值填充

基本使用流程

安装

1
pip install scikit-learn

完整示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from joblib import dump, load

# 加载数据
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 训练模型
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)

# 评估
y_pred = model.predict(X_test)
print(f"准确率: {accuracy_score(y_test, y_pred):.2f}")

# 保存模型
dump(model, 'model.joblib')

模型评估

1
2
3
4
5
6
7
8
9
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score

# 混淆矩阵
cm = confusion_matrix(y_test, y_pred)

# 分类指标
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')

交叉验证

1
2
cv_scores = cross_val_score(model, X, y, cv=5)
print(f"交叉验证准确率: {cv_scores.mean():.2f}")

超参数调优

1
2
3
4
5
param_grid = {'n_neighbors': [3, 5, 7, 9]}
grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(f"最佳参数: {grid_search.best_params_}")
print(f"最佳得分: {grid_search.best_score_:.2f}")

总结

sklearn 提供统一的 API 接口:

  • fit():训练模型
  • predict():预测
  • transform():数据转换
  • score():评估得分

适合快速构建和评估机器学习模型。

Comments