ios 11 新出了Vision 框架,提供了人脸识别、物体检测、物体跟踪等技术。本文将通过一个Demo简单介绍如何使用Vision框架进行物体检测和物体跟踪。本文Demo可以在Github 上下载。
1. 关于Vision框架 Vision 是伴随ios 11 推出的基于CoreML的图形处理框架。运用高性能图形处理和视觉技术,可以对图像和视频进行人脸检测、特征点检测和场景识别等。
2. 使用vision 进行物体识别 环境 Xcode 9 + ios 11
获取图像数据 该步骤假设你已经调起系统相机,并获得 CMSampleBufferRef
数据。注意返回的simpleBuffer 方向和UIView 显示方向不一致,所以先对simpleBuffer 旋转到正确的方向。
当然也可以不进行旋转,但是要保证后续坐标转换的一致性。
/* * 注意旋转SampleBuffer 为argb或者bgra格式,其他格式可能不支持 * rotationConstant: * 0 -- rotate 0 degrees (simply copy the data from src to dest) * 1 -- rotate 90 degrees counterclockwise * 2 -- rotate 180 degress * 3 -- rotate 270 degrees counterclockwise */ + (CVPixelBufferRef)rotateBuffer:(CMSampleBufferRef)sampleBuffer withConstant:(uint8_t)rotationConstant { CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer); CVPixelBufferLockBaseAddress(imageBuffer, 0); OSType pixelFormatType = CVPixelBufferGetPixelFormatType(imageBuffer); // NSAssert(pixelFormatType == kCVPixelFormatType_32ARGB, @"Code works only with 32ARGB format. Test/adapt for other formats!"); const size_t kAlignment_32ARGB = 32; const size_t kBytesPerPixel_32ARGB = 4; size_t bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer); size_t width = CVPixelBufferGetWidth(imageBuffer); size_t height = CVPixelBufferGetHeight(imageBuffer); BOOL rotatePerpendicular = (rotationConstant == 1) || (rotationConstant == 3); // Use enumeration values here const size_t outWidth = rotatePerpendicular ? height : width; const size_t outHeight = rotatePerpendicular ? width : height; size_t bytesPerRowOut = kBytesPerPixel_32ARGB * ceil(outWidth * 1.0 / kAlignment_32ARGB) * kAlignment_32ARGB; const size_t dstSize = bytesPerRowOut * outHeight * sizeof(unsigned char); void *srcBuff = CVPixelBufferGetBaseAddress(imageBuffer); unsigned char *dstBuff = (unsigned char *)malloc(dstSize); vImage_Buffer inbuff = {srcBuff, height, width, bytesPerRow}; vImage_Buffer outbuff = {dstBuff, outHeight, outWidth, bytesPerRowOut}; uint8_t bgColor[4] = {0, 0, 0, 0}; vImage_Error err = vImageRotate90_ARGB8888(&inbuff, &outbuff, rotationConstant, bgColor, 0); if (err != kvImageNoError) { NSLog(@"%ld", err); } CVPixelBufferUnlockBaseAddress(imageBuffer, 0); CVPixelBufferRef rotatedBuffer = NULL; CVPixelBufferCreateWithBytes(NULL, outWidth, outHeight, pixelFormatType, outbuff.data, bytesPerRowOut, freePixelBufferDataAfterRelease, NULL, NULL, &rotatedBuffer); return rotatedBuffer; } void freePixelBufferDataAfterRelease(void *releaseRefCon, const void *baseAddress) { // Free the memory we malloced for the vImage rotation free((void *)baseAddress); }
物体检测 拿到图像数据后就可以进行物体检测,物体检测流程很简单:
创建一个物体检测请求 VNDetectRectanglesRequest
根据数据源(pixelBuffer 或者 UIImage)创建一个 VNImageRequestHandler
调用[VNImageRequestHandler performRequests] 执行检测
- (void)detectObjectWithPixelBuffer:(CVPixelBufferRef)pixelBuffer { CFAbsoluteTime start = CFAbsoluteTimeGetCurrent(); void (^ VNRequestCompletionHandler)(VNRequest *request, NSError * _Nullable error) = ^(VNRequest *request, NSError * _Nullable error) { CFAbsoluteTime end = CFAbsoluteTimeGetCurrent(); NSLog(@"检测耗时: %f", end - start); if (!error && request.results.count > 0) { // TODO 这里处理检测结果 return ; } }; VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithCVPixelBuffer:pixelBuffer options:@{}]; VNDetectRectanglesRequest *request = [[VNDetectRectanglesRequest alloc] initWithCompletionHandler:VNRequestCompletionHandler]; request.minimumAspectRatio = 0.1; // 最小长宽比设为0.1 request.maximumObservations = 0; // 不限制检测结果 [handler performRequests:@[request] error:nil]; }
显示检测结果 物体检测返回结果是一个 VNDetectedObjectObservation
的结果集,包含confidence
, uuid
和 boundingBox
三种属性。 因为vision坐标系类似opengl的纹理坐标系,以屏幕左下角为坐标原点,并做了归一化。所以将显示结果投影到屏幕时,还需要进行坐标系的转换。
三种坐标系的区别:
坐标系
原点
长宽
UIKit坐标系
左上角
屏幕大小
AVFoundation坐标系
左上角
0 - 1
Vision坐标系
左下角
0 - 1
显示代码如下,使用CGAffineTransform
进行坐标转换,并根据转换后矩形绘制红色边框。同时打印confidence
信息到屏幕上。
- (void)overlayImageWithSize:(CGSize)size { NSDictionary *lastObsercationDicCopy = [NSDictionary dictionaryWithDictionary:self.lastObsercationsDic]; NSArray *keyArr = [lastObsercationDicCopy allKeys]; UIGraphicsImageRenderer *renderer = [[UIGraphicsImageRenderer alloc] initWithSize:CGSizeMake(size.width, size.height)]; void (^UIGraphicsImageDrawingActions)(UIGraphicsImageRendererContext *rendererContext) = ^(UIGraphicsImageRendererContext *rendererContext) { // 将vision坐标转换为屏幕坐标 CGAffineTransform transform = CGAffineTransformIdentity; transform = CGAffineTransformScale(transform, size.width, -size.height); transform = CGAffineTransformTranslate(transform, 0, -1); for (NSString *uuid in keyArr) { VNDetectedObjectObservation *rectangleObservation = lastObsercationDicCopy[uuid]; // 绘制红框 [[UIColor redColor] setStroke]; UIBezierPath *path = [UIBezierPath bezierPathWithRect:CGRectApplyAffineTransform(rectangleObservation.boundingBox, transform)]; path.lineWidth = 4.0f; [path stroke]; } }; UIImage *overlayImage = [renderer imageWithActions:UIGraphicsImageDrawingActions]; NSMutableString *trackInfoStr = [NSMutableString string]; for (NSString *uuid in keyArr) { VNDetectedObjectObservation *rectangleObservation = lastObsercationDicCopy[uuid]; [trackInfoStr appendFormat:@"置信度 : %.2f \n", rectangleObservation.confidence]; } dispatch_async(dispatch_get_main_queue(), ^{ self.highlightView.image = overlayImage; self.infoLabel.text = trackInfoStr; }); }
3. 物体跟踪 物体跟踪需要处理连续的视频帧,所以需要创建VNSequenceRequestHandler
处理多帧图像。同时还需要一个VNDetectedObjectObservation
对象 做为参考源。你可以使用物体检测的结果,或者指定一个矩形作为物体跟踪的参考源。注意因为坐标系不同,如果直接指定矩形作为参考源时,需要事先进行正确的坐标转换。
跟踪多物体时,可以使用VNDetectedObjectObservation.uuid
区分跟踪对象,并做相应处理。
- (void)objectTrackWithPixelBuffer:(CVPixelBufferRef)pixelBuffer { if (!self.sequenceHandler) { self.sequenceHandler = [[VNSequenceRequestHandler alloc] init]; } NSArray<NSString *> *obsercationKeys = self.lastObsercationsDic.allKeys; NSMutableArray<VNTrackObjectRequest *> *obsercationRequest = [NSMutableArray array]; CFAbsoluteTime start = CFAbsoluteTimeGetCurrent(); for (NSString *key in obsercationKeys) { VNDetectedObjectObservation *obsercation = self.lastObsercationsDic[key]; VNTrackObjectRequest *trackObjectRequest = [[VNTrackObjectRequest alloc] initWithDetectedObjectObservation:obsercation completionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) { CFAbsoluteTime end = CFAbsoluteTimeGetCurrent(); NSLog(@"跟踪耗时: %f", end - start); if (nil == error && request.results.count > 0) { // TODO 处理跟踪结果 } else { // 跟踪失败处理 } }]; trackObjectRequest.trackingLevel = VNRequestTrackingLevelAccurate; [obsercationRequest addObject:trackObjectRequest]; } NSError *error = nil; [self.sequenceHandler performRequests:obsercationRequest onCVPixelBuffer:pixelBuffer error:&error]; }
效果图
4. 性能 测试机型 iphone6p ios 11.0(15A5318g)
1/10 取帧率
物体检测 内存 稳定在40M左右
耗时 平均在50ms左右
物体跟踪 内存 和物体检测一样在40M左右
耗时 相对低些,20-40ms不等
5. 总结 Vision是一个比较好用的框架,性能也不错。除了物体跟踪,Vision还提供图像分类 、人脸识别 、人脸特征提取 、人脸追踪 、文字识别 等功能,使用方法和物体检测类似,本文就不再进行过多描述。
参考文档 Getting Started with Vision