学习感知图像块相似度 (LPIPS)：迈向人类视觉感知的图像质量评估

LPIPS（Learned Perceptual Image Patch Similarity）通过预训练神经网络提取特征计算感知相似度，比传统 PSNR/SSIM 更符合人类视觉感知。

定义与发展

LPIPS 来源于 CVPR 2018 论文《The Unreasonable Effectiveness of Deep Features as a Perceptual Metric》。利用深度 CNN 提取图像特征，计算特征空间中的距离评估图像相似性。

核心优势：值越低表示两张图像越相似，比 PSNR/SSIM 更符合人眼感知。

数学原理

$d (x, x_{0}) = \sum_{l} \frac{1}{H _{l} W _{l}} \sum_{h, w} ∣∣ w_{l} ⊙ (\overset{y}{^}_{h w}^{l} - \overset{y}{^}_{0 h w}^{l}) ∣ ∣_{2}^{2}$

其中 $\overset{y}{^}^{l}, \overset{y}{^}_{0}^{l}$ 是第 $l$ 层提取的激活特征， $w_{l}$ 是缩放权重向量。

网络选择

网络	模型大小	特点
AlexNet	9.1MB	速度最快，默认推荐
VGG	58.9MB	精度较高
SqueezeNet	2.8MB	轻量级

使用方法

import lpips
import torch
 
loss_fn = lpips.LPIPS(net='alex')
 
def preprocess_image(image_path):
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
    ])
    image = Image.open(image_path).convert('RGB')
    return transform(image).unsqueeze(0)
 
img0 = preprocess_image('path/to/image0.jpg')
img1 = preprocess_image('path/to/image1.jpg')
distance = loss_fn(img0, img1)

关键注意事项

图像预处理：RGB 格式，归一化到 [-1, 1]
图像尺寸：两张图像需相同空间尺寸
结果解读：值越小越相似

应用场景

图像质量评估
GAN 训练感知损失
图像风格迁移
图像修复评估
NeRF 新视角合成评估

最佳实践

结合 PSNR、SSIM 多角度评估
默认使用 net='alex' 平衡速度与精度
批量处理提高效率

知识花园

探索