iOS 11 使用vision开始物体跟踪

ios 11 新出了Vision 框架,提供了人脸识别、物体检测、物体跟踪等技术。本文将通过一个Demo简单介绍如何使用Vision框架进行物体检测和物体跟踪。本文Demo可以在Github上下载。

1. 关于Vision框架

Vision 是伴随ios 11 推出的基于CoreML的图形处理框架。运用高性能图形处理和视觉技术,可以对图像和视频进行人脸检测、特征点检测和场景识别等。

img

2. 使用vision 进行物体识别

环境

Xcode 9 + ios 11

获取图像数据

该步骤假设你已经调起系统相机,并获得 CMSampleBufferRef 数据。注意返回的simpleBuffer 方向和UIView 显示方向不一致,所以先对simpleBuffer 旋转到正确的方向。

当然也可以不进行旋转,但是要保证后续坐标转换的一致性。

/*
* 注意旋转SampleBuffer 为argb或者bgra格式,其他格式可能不支持
* rotationConstant:
* 0 -- rotate 0 degrees (simply copy the data from src to dest)
* 1 -- rotate 90 degrees counterclockwise
* 2 -- rotate 180 degress
* 3 -- rotate 270 degrees counterclockwise
*/
+ (CVPixelBufferRef)rotateBuffer:(CMSampleBufferRef)sampleBuffer withConstant:(uint8_t)rotationConstant
{
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CVPixelBufferLockBaseAddress(imageBuffer, 0);

OSType pixelFormatType = CVPixelBufferGetPixelFormatType(imageBuffer);

// NSAssert(pixelFormatType == kCVPixelFormatType_32ARGB, @"Code works only with 32ARGB format. Test/adapt for other formats!");

const size_t kAlignment_32ARGB = 32;
const size_t kBytesPerPixel_32ARGB = 4;

size_t bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer);
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);

BOOL rotatePerpendicular = (rotationConstant == 1) || (rotationConstant == 3); // Use enumeration values here
const size_t outWidth = rotatePerpendicular ? height : width;
const size_t outHeight = rotatePerpendicular ? width : height;

size_t bytesPerRowOut = kBytesPerPixel_32ARGB * ceil(outWidth * 1.0 / kAlignment_32ARGB) * kAlignment_32ARGB;

const size_t dstSize = bytesPerRowOut * outHeight * sizeof(unsigned char);

void *srcBuff = CVPixelBufferGetBaseAddress(imageBuffer);

unsigned char *dstBuff = (unsigned char *)malloc(dstSize);

vImage_Buffer inbuff = {srcBuff, height, width, bytesPerRow};
vImage_Buffer outbuff = {dstBuff, outHeight, outWidth, bytesPerRowOut};

uint8_t bgColor[4] = {0, 0, 0, 0};

vImage_Error err = vImageRotate90_ARGB8888(&inbuff, &outbuff, rotationConstant, bgColor, 0);
if (err != kvImageNoError)
{
NSLog(@"%ld", err);
}

CVPixelBufferUnlockBaseAddress(imageBuffer, 0);

CVPixelBufferRef rotatedBuffer = NULL;
CVPixelBufferCreateWithBytes(NULL,
outWidth,
outHeight,
pixelFormatType,
outbuff.data,
bytesPerRowOut,
freePixelBufferDataAfterRelease,
NULL,
NULL,
&rotatedBuffer);

return rotatedBuffer;
}

void freePixelBufferDataAfterRelease(void *releaseRefCon, const void *baseAddress)
{
// Free the memory we malloced for the vImage rotation
free((void *)baseAddress);
}


物体检测

拿到图像数据后就可以进行物体检测,物体检测流程很简单:

  1. 创建一个物体检测请求 VNDetectRectanglesRequest
  2. 根据数据源(pixelBuffer 或者 UIImage)创建一个 VNImageRequestHandler
  3. 调用[VNImageRequestHandler performRequests] 执行检测

- (void)detectObjectWithPixelBuffer:(CVPixelBufferRef)pixelBuffer
{
CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();

void (^ VNRequestCompletionHandler)(VNRequest *request, NSError * _Nullable error) = ^(VNRequest *request, NSError * _Nullable error)
{
CFAbsoluteTime end = CFAbsoluteTimeGetCurrent();

NSLog(@"检测耗时: %f", end - start);
if (!error && request.results.count > 0) {
// TODO 这里处理检测结果
return ;
}
};

VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithCVPixelBuffer:pixelBuffer options:@{}];
VNDetectRectanglesRequest *request = [[VNDetectRectanglesRequest alloc] initWithCompletionHandler:VNRequestCompletionHandler];
request.minimumAspectRatio = 0.1; // 最小长宽比设为0.1
request.maximumObservations = 0; // 不限制检测结果
[handler performRequests:@[request] error:nil];
}

显示检测结果

物体检测返回结果是一个 VNDetectedObjectObservation 的结果集,包含confidence, uuidboundingBox三种属性。 因为vision坐标系类似opengl的纹理坐标系,以屏幕左下角为坐标原点,并做了归一化。所以将显示结果投影到屏幕时,还需要进行坐标系的转换。

三种坐标系的区别:

坐标系 原点 长宽
UIKit坐标系 左上角 屏幕大小
AVFoundation坐标系 左上角 0 - 1
Vision坐标系 左下角 0 - 1

显示代码如下,使用CGAffineTransform 进行坐标转换,并根据转换后矩形绘制红色边框。同时打印confidence信息到屏幕上。


- (void)overlayImageWithSize:(CGSize)size
{

NSDictionary *lastObsercationDicCopy = [NSDictionary dictionaryWithDictionary:self.lastObsercationsDic];
NSArray *keyArr = [lastObsercationDicCopy allKeys];

UIGraphicsImageRenderer *renderer = [[UIGraphicsImageRenderer alloc] initWithSize:CGSizeMake(size.width, size.height)];

void (^UIGraphicsImageDrawingActions)(UIGraphicsImageRendererContext *rendererContext) = ^(UIGraphicsImageRendererContext *rendererContext)
{
// 将vision坐标转换为屏幕坐标
CGAffineTransform transform = CGAffineTransformIdentity;
transform = CGAffineTransformScale(transform, size.width, -size.height);
transform = CGAffineTransformTranslate(transform, 0, -1);

for (NSString *uuid in keyArr) {
VNDetectedObjectObservation *rectangleObservation = lastObsercationDicCopy[uuid];

// 绘制红框
[[UIColor redColor] setStroke];
UIBezierPath *path = [UIBezierPath bezierPathWithRect:CGRectApplyAffineTransform(rectangleObservation.boundingBox, transform)];
path.lineWidth = 4.0f;
[path stroke];

}
};

UIImage *overlayImage = [renderer imageWithActions:UIGraphicsImageDrawingActions];

NSMutableString *trackInfoStr = [NSMutableString string];

for (NSString *uuid in keyArr) {
VNDetectedObjectObservation *rectangleObservation = lastObsercationDicCopy[uuid];

[trackInfoStr appendFormat:@"置信度 : %.2f \n", rectangleObservation.confidence];
}

dispatch_async(dispatch_get_main_queue(), ^{

self.highlightView.image = overlayImage;

self.infoLabel.text = trackInfoStr;
});
}


3. 物体跟踪

物体跟踪需要处理连续的视频帧,所以需要创建VNSequenceRequestHandler处理多帧图像。同时还需要一个VNDetectedObjectObservation对象 做为参考源。你可以使用物体检测的结果,或者指定一个矩形作为物体跟踪的参考源。注意因为坐标系不同,如果直接指定矩形作为参考源时,需要事先进行正确的坐标转换。

跟踪多物体时,可以使用VNDetectedObjectObservation.uuid区分跟踪对象,并做相应处理。


- (void)objectTrackWithPixelBuffer:(CVPixelBufferRef)pixelBuffer
{

if (!self.sequenceHandler) {
self.sequenceHandler = [[VNSequenceRequestHandler alloc] init];
}

NSArray<NSString *> *obsercationKeys = self.lastObsercationsDic.allKeys;

NSMutableArray<VNTrackObjectRequest *> *obsercationRequest = [NSMutableArray array];

CFAbsoluteTime start = CFAbsoluteTimeGetCurrent();
for (NSString *key in obsercationKeys) {

VNDetectedObjectObservation *obsercation = self.lastObsercationsDic[key];

VNTrackObjectRequest *trackObjectRequest = [[VNTrackObjectRequest alloc] initWithDetectedObjectObservation:obsercation completionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {

CFAbsoluteTime end = CFAbsoluteTimeGetCurrent();
NSLog(@"跟踪耗时: %f", end - start);

if (nil == error && request.results.count > 0) {

// TODO 处理跟踪结果


} else {
// 跟踪失败处理

}
}];
trackObjectRequest.trackingLevel = VNRequestTrackingLevelAccurate;

[obsercationRequest addObject:trackObjectRequest];
}


NSError *error = nil;
[self.sequenceHandler performRequests:obsercationRequest onCVPixelBuffer:pixelBuffer error:&error];

}

效果图

img

4. 性能

测试机型

iphone6p ios 11.0(15A5318g)

1/10 取帧率

物体检测

内存

稳定在40M左右

img

耗时

平均在50ms左右

img

物体跟踪

内存

和物体检测一样在40M左右

img

耗时

相对低些,20-40ms不等

img

5. 总结

Vision是一个比较好用的框架,性能也不错。除了物体跟踪,Vision还提供图像分类人脸识别人脸特征提取人脸追踪文字识别等功能,使用方法和物体检测类似,本文就不再进行过多描述。

参考文档

Getting Started with Vision