在音视频开发中,我们首先看到的就是视频的采集,在视频采集里,我们也要区分平台,例如android,iOS,PC。在本章的介绍中,我们集中介绍下iOS音视频开发相关能力。

音视频全链路流程

从图里,我们可以看到,在整个直播架构体系里面,最开始的就是采集,之后就是编码,封装,推流,转发,拉流,解码,渲染。

我们今天为你们系统介绍iOS视频采集相关流程

一、视频采集流程

iOS采集器的基本结构图如下:

iOS采集相关架构图

从图里可以看到,我们可以通过AVCapture Device Input创建输入资源,通过Session搭配AVCaptureMovieFileOutput(或者AVCaptureStillImageOutput)来进行资源的输出,也可以通过AVCaptureVideoPreviewLayer来进行预览。本章,我们就简要的介绍下这全流程。

​创建Session

// 管理输入和输出映射的采集器
AVCaptureSession* session = [[AVCaptureSession alloc] init];
获取系统设备指针
// 获取系统设备信息
AVCaptureDeviceDiscoverySession* deviceDiscoverySession = [AVCaptureDeviceDiscoverySession discoverySessionWithDeviceTypes:@[AVCaptureDeviceTypeBuiltInWideAngleCamera] mediaType:AVMediaTypeVideo position:self.config.position];
NSArray* devices = deviceDiscoverySession.devices;
for (AVCaptureDevice* device in devices) {
   if (device.position == self.config.position) {
       self.device = device;
       break;
   }
}

相关函数原型介绍:

/*!
 * @brief 创建相关的采集设备
 * @param deviceTypes 设备的类型,可参考AVCaptureDeviceType相关变量,后续在做详细的解释。
 * @param mediaType 需要采集的视频格式,音频或者视频。
 * @param position 采集摄像头的方位,前置或者后置。
 * @return 成功则返回相关的采集设备。
 */
+ (instancetype)discoverySessionWithDeviceTypes:(NSArray<AVCaptureDeviceType> *)deviceTypes mediaType:(nullable AVMediaType)mediaType position:(AVCaptureDevicePosition)position;
@end

到此,可以获取到相关的采集设备指针,该指针可用于创建创建输入。

  • 配置Session

之后,我们需要配置Session,以至于其能够很好的对接从device过来的输入,然后转换为我们需要的输出。

[self.session beginConfiguration];

// 从设备中创建输入,之后需要设置到session
NSError* error = nil;
self.videoInput = [[AVCaptureDeviceInput alloc] initWithDevice:self.device error:&error];
if (error) {
    NSLog(@"%s:%d init input error!!!", __func__, __LINE__);
    return;
}

// 设置session的输入
if ([self.session canAddInput:self.videoInput]) {
    [self.session addInput:self.videoInput];
}

// 配置session的输出
self.videoOutput = [[AVCaptureVideoDataOutput alloc] init];

// 禁止丢帧
self.videoOutput.alwaysDiscardsLateVideoFrames = NO;

// 设置输出的PixelBuffer的类型,这里可以设置为:
// kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
// kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
// kCVPixelFormatType_32BGRA
[self.videoOutput setVideoSettings:@{(__bridge NSString*)kCVPixelBufferPixelFormatTypeKey:@(self.config.pixelBufferType)}];

// 设置output的数据回调,需要为AVCaptureVideoDataOutputSampleBufferDelegate协议的实现者。
dispatch_queue_t captureQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
[self.videoOutput setSampleBufferDelegate:self queue:captureQueue];
if ([self.session canAddOutput:self.videoOutput]) {
    [self.session addOutput:self.videoOutput];
}

// 设置连接器
AVCaptureConnection* connect = [self.videoOutput connectionWithMediaType:AVMediaTypeVideo];
// 设置图源的显示方位,具体可以参考AVCaptureVideoOrientation枚举。
connect.videoOrientation = self.config.orientation;
if ([connect isVideoStabilizationSupported]) {
    connect.preferredVideoStabilizationMode = AVCaptureVideoStabilizationModeAuto;
}
// 设置图片的缩放程度,实际上的效果不如设置Layer的顶点位置。
connect.videoScaleAndCropFactor = connect.videoMaxScaleAndCropFactor;

[self.session commitConfiguration];

开始采集

- (void)startCapture {
    if (self.session) {
        [self.session startRunning];
    }
}

停止采集

- (void)stopCapture {
    if (self.session) {
        [self.session stopRunning];
    }
}

配置数据回调

#pragma mark AVCaptureVideoDataOutputSampleBufferDelegate
- (void)captureOutput:(AVCaptureOutput *)output didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
    if (self.delegate && [self.delegate respondsToSelector:@selector(onVideoWithSampleBuffer:)]) {
        [self.delegate onVideoWithSampleBuffer:sampleBuffer];
    }
}

结果展示图

【文章福利】:小编整理了一些个人觉得比较好的音视频学习书籍、视频资料共享在群文件里面,有需要的可以自行添加哦!~点击群666064665加入(需要自取)

企鹅群666064665领取资料

二、美颜

音视频使用美颜滤镜,我们会选择GPUImage来获取视频数据。

GPUImage是一个可以为录制视频添加实时滤镜的一个著名第三方库。

该框架大概原理是,使用OpenGL着色器对视频图像进行颜色处理,然后存到frameBuffer,之后可以对此数据再次处理。重复上述过程,即可达到多重滤镜效果。

具体实现不细说,这里简要介绍一下GPUImage的使用,如何美颜,如何获取音视频数据。

使用GPUImage

GPUImage的主要代码在 AWGPUImageAVCapture 这个类中。

初始化AWAVCaptureManager对象时将captureType设为AWAVCaptureTypeGPUImage,就会自动调用AWGPUImageAVCapture类来捕获视频数据。

代码在 onInit 方法中:

-(void)onInit{
    //摄像头初始化
    // AWGPUImageVideoCamera 继承自 GPUImageVideoCamera。继承是为了获取音频数据,原代码中,默认情况下音频数据发送给了 audioEncodingTarget。
    // 这个东西一看类型是GPUImageMovieWriter,应该是文件写入功能。果断覆盖掉processAudioSampleBuffer方法,拿到音频数据后自己处理。
    // 音频就这样可以了,GPUImage主要工作还是在视频处理这里。
    // 设置预览分辨率 self.captureSessionPreset是根据AWVideoConfig的设置,获取的分辨率。设置前置、后置摄像头。
    _videoCamera = [[AWGPUImageVideoCamera alloc] initWithSessionPreset:self.captureSessionPreset cameraPosition:AVCaptureDevicePositionFront];

    //开启捕获声音
    [_videoCamera addAudioInputsAndOutputs];

    //设置输出图像方向,可用于横屏推流。
    _videoCamera.outputImageOrientation = UIInterfaceOrientationPortrait;

    //镜像策略,这里这样设置是最自然的。跟系统相机默认一样。
    _videoCamera.horizontallyMirrorRearFacingCamera = NO;
    _videoCamera.horizontallyMirrorFrontFacingCamera = YES;

    //设置预览view
    _gpuImageView = [[GPUImageView alloc] initWithFrame:self.preview.bounds];
    [self.preview addSubview:_gpuImageView];

    //初始化美颜滤镜
    _beautifyFilter = [[GPUImageBeautifyFilter alloc] init];

    //相机获取视频数据输出至美颜滤镜
    [_videoCamera addTarget:_beautifyFilter];

    //美颜后输出至预览
    [_beautifyFilter addTarget:_gpuImageView];

    // 到这里我们已经能够打开相机并预览了。
    // 因为要推流,除了预览之外,我们还要截取到视频数据。这就需要使用GPUImage中的GPUImageRawDataOutput,它能将美颜后的数据输出,便于我们处理后发送出去。
    // AWGPUImageAVCaptureDataHandler继承自GPUImageRawDataOutput,从 newFrameReadyAtTime 方法中就可以获取到美颜后输出的数据。
    // 输出的图片格式为BGRA。
    _dataHandler = [[AWGPUImageAVCaptureDataHandler alloc] initWithImageSize:CGSizeMake(self.videoConfig.width, self.videoConfig.height) resultsInBGRAFormat:YES capture:self];
    [_beautifyFilter addTarget:_dataHandler];

    // 令AWGPUImageAVCaptureDataHandler实现AWGPUImageVideoCameraDelegate协议,并且让camera的awAudioDelegate指向_dataHandler对象。
    // 将音频数据转到_dataHandler中处理。然后音视频数据就可以都在_dataHandler中处理了。
    _videoCamera.awAudioDelegate = _dataHandler;

    //开始捕获视频
    [self.videoCamera startCameraCapture];

    //修改帧率
    [self updateFps:self.videoConfig.fps];
}

三、编码(视频编码和音频编码)

3.1视频编码

有硬编码和软编码。

区别: 软编码:使用CPU进行编码 硬编码:使用非CPU进行编码,如显卡GPU、专用的DSP、FPGA、ASIC芯片等

比较: 软编码:实现直接、简单,参数调整方便,升级易,但CPU负载重,性能较硬编码低,低码率下质量通常比硬编码要好一点。 硬编码:性能高,低码率下通常质量低于软编码器,但部分产品在GPU硬件平台移植了优秀的软编码算法(如X264)的,质量基本等同于软编码。

下面主要讲iOS 的硬编码。配置编码器。

 VTCompressionSessionRef compressionSession = NULL;
    OSStatus status = VTCompressionSessionCreate(NULL,width,height, kCMVideoCodecType_H264, NULL, NULL, NULL, VideoCompressonOutputCallback, (__bridge void *)self, &compressionSession);
    
    CGFloat videoBitRate = 800*1024;
    CGFloat videoFrameRate = 24;
    CGFloat videoMaxKeyframeInterval = 48;
 
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(videoMaxKeyframeInterval));
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration, (__bridge CFTypeRef)@(videoMaxKeyframeInterval/videoFrameRate));
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_ExpectedFrameRate, (__bridge CFTypeRef)@(videoFrameRate));
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_AverageBitRate, (__bridge CFTypeRef)@(videoBitRate));
    NSArray *limit = @[@(videoBitRate * 1.5/8), @(1)];
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_DataRateLimits, (__bridge CFArrayRef)limit);
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue);
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_ProfileLevel, kVTProfileLevel_H264_Main_AutoLevel);
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_AllowFrameReordering, kCFBooleanTrue);
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_H264EntropyMode, kVTH264EntropyMode_CABAC);
    VTCompressionSessionPrepareToEncodeFrames(compressionSession);

有几个概念比较重要:

  • 码率 简单来说就是指在压缩视频的时候给这个视频指定一个参数,用以告诉压缩软件期望的压缩后视频的大小。码率的英文名为bps(bit per second),就是用平均每秒多少bit来衡量一个视频大小。

  • 帧率 用于测量显示帧数的量度。所谓的测量单位为每秒显示帧数(Frames per Second,简称:FPS)或“赫兹”(Hz)

代码中配置的码率是800 * 1024 bit/s ,帧率是24。还有一个叫最大关键帧间隔,可设置为帧率的两倍。VideoCompressonOutputCallback 是硬编码后的回调,

static void VideoCompressonOutputCallback(void *VTref, void *VTFrameRef, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer){
    if (!sampleBuffer) return;
    CFArrayRef array = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true);
    if (!array) return;
    CFDictionaryRef dic = (CFDictionaryRef)CFArrayGetValueAtIndex(array, 0);
    if (!dic) return;
 
    BOOL keyframe = !CFDictionaryContainsKey(dic, kCMSampleAttachmentKey_NotSync);
    uint64_t timeStamp = [((__bridge_transfer NSNumber *)VTFrameRef) longLongValue];
 
    HardwareVideoEncoder *videoEncoder = (__bridge HardwareVideoEncoder *)VTref;
    if (status != noErr) {
        return;
    }
 
    if (keyframe && !videoEncoder->sps) {
        CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);
 
        size_t sparameterSetSize, sparameterSetCount;
        const uint8_t *sparameterSet;
        OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0);
        if (statusCode == noErr) {
            size_t pparameterSetSize, pparameterSetCount;
            const uint8_t *pparameterSet;
            OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pparameterSet, &pparameterSetSize, &pparameterSetCount, 0);
            if (statusCode == noErr) {
                videoEncoder->sps = [NSData dataWithBytes:sparameterSet length:sparameterSetSize];
                videoEncoder->pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize];
            }
        }
    }
 
 
    CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
    size_t length, totalLength;
    char *dataPointer;
    OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer);
    if (statusCodeRet == noErr) {
        size_t bufferOffset = 0;
        static const int AVCCHeaderLength = 4;
        while (bufferOffset < totalLength - AVCCHeaderLength) {
            uint32_t NALUnitLength = 0;
            memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength);
 
            NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);
 
            VideoFrame *videoFrame = [VideoFrame new];
            videoFrame.timestamp = timeStamp;
            videoFrame.data = [[NSData alloc] initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength];
            videoFrame.isKeyFrame = keyframe;
            videoFrame.sps = videoEncoder->sps;
            videoFrame.pps = videoEncoder->pps;
 
            if (videoEncoder.h264Delegate && [videoEncoder.h264Delegate respondsToSelector:@selector(videoEncoder:videoFrame:)]) {
                [videoEncoder.h264Delegate videoEncoder:videoEncoder videoFrame:videoFrame];
            }
 
            bufferOffset += AVCCHeaderLength + NALUnitLength;
        }
    }
}

程序调用:encodeVideoData:timeStamp 进行硬编码,数据处理完之后会从VideoCompressonOutputCallback输出。

- (void)encodeVideoData:(CVPixelBufferRef)pixelBuffer timeStamp:(uint64_t)timeStamp {
    if(_isBackGround) return;
    frameCount++;
    CMTime presentationTimeStamp = CMTimeMake(frameCount, (int32_t)videoFrameRate);
    VTEncodeInfoFlags flags;
    CMTime duration = CMTimeMake(1, (int32_t)videoFrameRate);
    NSDictionary *properties = nil;
    if (frameCount % (int32_t)_configuration.videoMaxKeyframeInterval == 0) {
        properties = @{(__bridge NSString *)kVTEncodeFrameOptionKey_ForceKeyFrame: @YES};
    }
    NSNumber *timeNumber = @(timeStamp);
 
    OSStatus status = VTCompressionSessionEncodeFrame(compressionSession, pixelBuffer, presentationTimeStamp, duration, (__bridge CFDictionaryRef)properties, (__bridge_retained void *)timeNumber, &flags);
    if(status != noErr){
        [self resetCompressionSession];
    }
}

整个过程 ,从摄像机拿到的视频数据,使用CompressionSession进行硬编码,最后输出准备用于传输的videoFrame。

在硬编码这个过程中,有一点需要注意的。当手机遇到电量较低、充电时,必然会导致手机电池严重发热发烫;此种情况下iPhone手机的h264硬编码性能有相当大概率的性能衰减,编码输出帧率严重下降;

手机H264编码器编码统计得到的实际输出帧率低于预期帧率,如摄像头采集帧率30fps、H264硬编码预期帧率20fps、实际输出帧率小于15fps;手机发热后性能H264硬编码器性能下降,二输入帧率30fps加剧编码器的能耗压力;

解决:采取主动平滑丢帧的策略,将输入帧率降低到编码器实际输出帧率以上,如实际输出帧率15fps,输入帧率调整成18fps,以降低编码器压力;待编码器实际编码输出帧率逐渐升高到18fps后,再提高输入帧率使编码实际输出帧率符合设计预期。

3.2音频编码

  • 创建编码器

  • AudioConverterRef m_converter;
    AudioStreamBasicDescription inputFormat = {0};
    inputFormat.mSampleRate = 44100;
    inputFormat.mFormatID = kAudioFormatLinearPCM;
    inputFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
    inputFormat.mChannelsPerFrame = (UInt32)2;
    inputFormat.mFramesPerPacket = 1;
    inputFormat.mBitsPerChannel = 16;
    inputFormat.mBytesPerFrame = inputFormat.mBitsPerChannel / 8 * inputFormat.mChannelsPerFrame;
    inputFormat.mBytesPerPacket = inputFormat.mBytesPerFrame * inputFormat.mFramesPerPacket;
     
    AudioStreamBasicDescription outputFormat; // 这里开始是输出音频格式
    memset(&outputFormat, 0, sizeof(outputFormat));
    outputFormat.mSampleRate =44100;       // 采样率保持一致
    outputFormat.mFormatID = kAudioFormatMPEG4AAC;            // AAC编码 kAudioFormatMPEG4AAC kAudioFormatMPEG4AAC_HE_V2
    outputFormat.mChannelsPerFrame = 2;
    outputFormat.mFramesPerPacket = 1024;                     // AAC一帧是1024个字节
     
    const OSType subtype = kAudioFormatMPEG4AAC;
    AudioClassDescription requestedCodecs[2] = {
        {
            kAudioEncoderComponentType,
            subtype,
            kAppleSoftwareAudioCodecManufacturer
        },
        {
            kAudioEncoderComponentType,
            subtype,
            kAppleHardwareAudioCodecManufacturer
        }
    };
     
    OSStatus result = AudioConverterNewSpecific(&inputFormat, &outputFormat, 2, requestedCodecs, &m_converter);;
    UInt32 outputBitrate = 96000;
    UInt32 propSize = sizeof(outputBitrate);
     
     
    if(result == noErr) {
        result = AudioConverterSetProperty(m_converter, kAudioConverterEncodeBitRate, propSize, &outputBitrate);
    }

    编码器准备好了,可以对采集的PCM数据进行编码。这里会有两个缓冲区:AudioBuffe,AACBuffer,大小都是audioBufferSize。当PCM的数据和AudioBuffer中的数据大小超过AudioBuffer 的大小,才送到AAC Encoder。

  • - (void)encodeAudioData:(nullable NSData*)audioData timeStamp:(uint64_t)timeStamp {
        if(leftLength + audioData.length >= audioBufferSize){
            NSInteger totalSize = leftLength + audioData.length;
            NSInteger encodeCount = totalSize / audioBufferSize;
            char *totalBuf = malloc(totalSize);
            char *p = totalBuf;
            
            memset(totalBuf, (int)totalSize, 0);
            memcpy(totalBuf, audioBuffer, leftLength);
            memcpy(totalBuf + leftLength, audioData.bytes, audioData.length);
            
            for(NSInteger index = 0;index < encodeCount;index++){
                [self encodeBuffer:p  timeStamp:timeStamp];
                p += audioBufferSize;
            }
            
            leftLength = totalSize % audioBufferSize;
            memset(audioBuffer, 0, audioBufferSize);
            memcpy(audioBuffer, totalBuf + (totalSize -leftLength), leftLength);
            free(totalBuf);
        }else{
            memcpy(audioBuffer+leftLength, audioData.bytes, audioData.length);
            leftLength = leftLength + audioData.length;
        }
    }

    接下来就是编码从AudioBuffer中送过来的数据。其中inputDataProc 为编码器转化数据时的回调。

    - (void)encodeBuffer:(char*)buf timeStamp:(uint64_t)timeStamp{
        
        AudioBuffer inBuffer;
        inBuffer.mNumberChannels = 1;
        inBuffer.mData = buf;
        inBuffer.mDataByteSize = audioBufferSize;
        
        AudioBufferList buffers;
        buffers.mNumberBuffers = 1;
        buffers.mBuffers[0] = inBuffer;
        
        
        // 初始化一个输出缓冲列表
        AudioBufferList outBufferList;
        outBufferList.mNumberBuffers = 1;
        outBufferList.mBuffers[0].mNumberChannels = inBuffer.mNumberChannels;
        outBufferList.mBuffers[0].mDataByteSize = inBuffer.mDataByteSize;   // 设置缓冲区大小
        outBufferList.mBuffers[0].mData = aacBuffer;           // 设置AAC缓冲区
        UInt32 outputDataPacketSize = 1;
        if (AudioConverterFillComplexBuffer(m_converter, inputDataProc, &buffers, &outputDataPacketSize, &outBufferList, NULL) != noErr) {
            return;
        }
        
        AudioFrame *audioFrame = [AudioFrame new];
        audioFrame.timestamp = timeStamp;
        audioFrame.data = [NSData dataWithBytes:aacBuffer length:outBufferList.mBuffers[0].mDataByteSize];
        
        char exeData[2];
        / flv编码音频头 44100 为0x12 0x10
        exeData[0] = 0x12;
        exeData[1] = 0x10;
        audioFrame.audioInfo = [NSData dataWithBytes:exeData length:2];
        if (self.aacDeleage && [self.aacDeleage respondsToSelector:@selector(audioEncoder:audioFrame:)]) {
            [self.aacDeleage audioEncoder:self audioFrame:audioFrame];
        }
        
    }
    
    OSStatus inputDataProc(AudioConverterRef inConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription * *outDataPacketDescription, void *inUserData)
    {
        AudioBufferList bufferList = *(AudioBufferList *)inUserData;
        ioData->mBuffers[0].mNumberChannels = 1;
        ioData->mBuffers[0].mData = bufferList.mBuffers[0].mData;
        ioData->mBuffers[0].mDataByteSize = bufferList.mBuffers[0].mDataByteSize;
        return noErr;
    }

    那么44100 为0x12 0x10 ,这数据是怎么得出的?

  • 获取sampleIndex

  • //https://wiki.multimedia.cx/index.php?title=MPEG-4_Audio
    - (NSInteger)sampleRateIndex:(NSInteger)frequencyInHz {
        NSInteger sampleRateIndex = 0;
        switch (frequencyInHz) {
        case 96000:
            sampleRateIndex = 0;
            break;
        case 88200:
            sampleRateIndex = 1;
            break;
        case 64000:
            sampleRateIndex = 2;
            break;
        case 48000:
            sampleRateIndex = 3;
            break;
        case 44100:
            sampleRateIndex = 4;
            break;
        case 32000:
            sampleRateIndex = 5;
            break;
        case 24000:
            sampleRateIndex = 6;
            break;
        case 22050:
            sampleRateIndex = 7;
            break;
        case 16000:
            sampleRateIndex = 8;
            break;
        case 12000:
            sampleRateIndex = 9;
            break;
        case 11025:
            sampleRateIndex = 10;
            break;
        case 8000:
            sampleRateIndex = 11;
            break;
        case 7350:
            sampleRateIndex = 12;
            break;
        default:
            sampleRateIndex = 15;
        }
        return sampleRateIndex;
    }

    根据公式就可以计算出44100的asc。

  • asc[0] = 0x10 | ((sampleRateIndex>>1) & 0x7);
    asc[1] = ((sampleRateIndex & 0x1)<<7) | ((numberOfChannels & 0xF) << 3);

    asc[0] = 0x10 | ((4>>1) & 0x7) = 0x12

  • asc[1] = ((4 & 0x1)<<7) | ((2 & 0xF) << 3) = 0x10

结论

经过音视频编码后,最终得到的VideoFrame和Video Frame ,它们包含了当前数据的时间戳和数据。

四、推流

我们的推流工具使用的是librtmp,用来推rtmp协议流。

  1. 先跟流服务器建立连接,连接成功后,这个过程需要在初始化整个推流器时完成。

  2. 建立连接后需要把视频和音频元数据即音频和视频的相关参数推送过去,方便服务端解析。

- (void)sendMetaData {
    PILI_RTMPPacket packet;

    char pbuf[2048], *pend = pbuf + sizeof(pbuf);

    packet.m_nChannel = 0x03;                   // control channel (invoke)
    packet.m_headerType = RTMP_PACKET_SIZE_LARGE;
    packet.m_packetType = RTMP_PACKET_TYPE_INFO;
    packet.m_nTimeStamp = 0;
    packet.m_nInfoField2 = _rtmp->m_stream_id;
    packet.m_hasAbsTimestamp = TRUE;
    packet.m_body = pbuf + RTMP_MAX_HEADER_SIZE;

    char *enc = packet.m_body;
    enc = AMF_EncodeString(enc, pend, &av_setDataFrame);
    enc = AMF_EncodeString(enc, pend, &av_onMetaData);

    *enc++ = AMF_OBJECT;

    enc = AMF_EncodeNamedNumber(enc, pend, &av_duration, 0.0);
    enc = AMF_EncodeNamedNumber(enc, pend, &av_fileSize, 0.0);

    // videosize
    enc = AMF_EncodeNamedNumber(enc, pend, &av_width, _stream.videoConfiguration.videoSize.width);
    enc = AMF_EncodeNamedNumber(enc, pend, &av_height, _stream.videoConfiguration.videoSize.height);

    // video
    enc = AMF_EncodeNamedString(enc, pend, &av_videocodecid, &av_avc1);

    enc = AMF_EncodeNamedNumber(enc, pend, &av_videodatarate, _stream.videoConfiguration.videoBitRate / 1000.f);
    enc = AMF_EncodeNamedNumber(enc, pend, &av_framerate, _stream.videoConfiguration.videoFrameRate);

    // audio
    enc = AMF_EncodeNamedString(enc, pend, &av_audiocodecid, &av_mp4a);
    enc = AMF_EncodeNamedNumber(enc, pend, &av_audiodatarate, _stream.audioConfiguration.audioBitrate);

    enc = AMF_EncodeNamedNumber(enc, pend, &av_audiosamplerate, _stream.audioConfiguration.audioSampleRate);
    enc = AMF_EncodeNamedNumber(enc, pend, &av_audiosamplesize, 16.0);
    enc = AMF_EncodeNamedBoolean(enc, pend, &av_stereo, _stream.audioConfiguration.numberOfChannels == 2);

    // sdk version
    enc = AMF_EncodeNamedString(enc, pend, &av_encoder, &av_SDKVersion);

    *enc++ = 0;
    *enc++ = 0;
    *enc++ = AMF_OBJECT_END;

    packet.m_nBodySize = (uint32_t)(enc - packet.m_body);
    if (!PILI_RTMP_SendPacket(_rtmp, &packet, FALSE, &_error)) {
        return;
    }
}
3.接下来把Frame放到一个数组中,对数组根据时间排序,保证推流顺序
4.把数组第一条数据加入推流缓冲区。
5.从缓冲区取出第一条数据
6.如果是视频帧(LFVideoFrame),判断是否发送头信息,头信息即pps和sps,意思是推送帧数据时要先推pps和sps 否则后端无法正常解析数据,这个头信息只需要在一次连接会话中发送一次,如果中间出现断开重连需要重新推送pps和sps数据。
推送pps和sps有格式要求的,代码如下:

  - (void)sendVideoHeader:(LFVideoFrame *)videoFrame {

    unsigned char *body = NULL;
    NSInteger iIndex = 0;
    NSInteger rtmpLength = 1024;
    const char *sps = videoFrame.sps.bytes;
    const char *pps = videoFrame.pps.bytes;
    NSInteger sps_len = videoFrame.sps.length;
    NSInteger pps_len = videoFrame.pps.length;

    body = (unsigned char *)malloc(rtmpLength);
    memset(body, 0, rtmpLength);

    body[iIndex++] = 0x17;
    body[iIndex++] = 0x00;

    body[iIndex++] = 0x00;
    body[iIndex++] = 0x00;
    body[iIndex++] = 0x00;

    body[iIndex++] = 0x01;
    body[iIndex++] = sps[1];
    body[iIndex++] = sps[2];
    body[iIndex++] = sps[3];
    body[iIndex++] = 0xff;

    /*sps*/
    body[iIndex++] = 0xe1;
    body[iIndex++] = (sps_len >> 8) & 0xff;
    body[iIndex++] = sps_len & 0xff;
    memcpy(&body[iIndex], sps, sps_len);
    iIndex += sps_len;

    /*pps*/
    body[iIndex++] = 0x01;
    body[iIndex++] = (pps_len >> 8) & 0xff;
    body[iIndex++] = (pps_len) & 0xff;
    memcpy(&body[iIndex], pps, pps_len);
    iIndex += pps_len;

    [self sendPacket:RTMP_PACKET_TYPE_VIDEO data:body size:iIndex nTimestamp:0];
    free(body);
}

然后再发送帧数据:

- (void)sendVideo:(LFVideoFrame *)frame {

    NSInteger i = 0;
    NSInteger rtmpLength = frame.data.length + 9;
    unsigned char *body = (unsigned char *)malloc(rtmpLength);
    memset(body, 0, rtmpLength);

    if (frame.isKeyFrame) {
        body[i++] = 0x17;        // 1:Iframe  7:AVC
    } else {
        body[i++] = 0x27;        // 2:Pframe  7:AVC
    }
    body[i++] = 0x01;    // AVC NALU
    body[i++] = 0x00;
    body[i++] = 0x00;
    body[i++] = 0x00;
    body[i++] = (frame.data.length >> 24) & 0xff;
    body[i++] = (frame.data.length >> 16) & 0xff;
    body[i++] = (frame.data.length >>  8) & 0xff;
    body[i++] = (frame.data.length) & 0xff;
    memcpy(&body[i], frame.data.bytes, frame.data.length);

    [self sendPacket:RTMP_PACKET_TYPE_VIDEO data:body size:(rtmpLength) nTimestamp:frame.timestamp];
    free(body);
}
- (NSInteger)sendPacket:(unsigned int)nPacketType data:(unsigned char *)data size:(NSInteger)size nTimestamp:(uint64_t)nTimestamp {
    NSInteger rtmpLength = size;
    PILI_RTMPPacket rtmp_pack;
    PILI_RTMPPacket_Reset(&rtmp_pack);
    PILI_RTMPPacket_Alloc(&rtmp_pack, (uint32_t)rtmpLength);

    rtmp_pack.m_nBodySize = (uint32_t)size;
    memcpy(rtmp_pack.m_body, data, size);
    rtmp_pack.m_hasAbsTimestamp = 0;
    rtmp_pack.m_packetType = nPacketType;
    if (_rtmp) rtmp_pack.m_nInfoField2 = _rtmp->m_stream_id;
    rtmp_pack.m_nChannel = 0x04;
    rtmp_pack.m_headerType = RTMP_PACKET_SIZE_LARGE;
    if (RTMP_PACKET_TYPE_AUDIO == nPacketType && size != 4) {
        rtmp_pack.m_headerType = RTMP_PACKET_SIZE_MEDIUM;
    }
    rtmp_pack.m_nTimeStamp = (uint32_t)nTimestamp;

    NSInteger nRet = [self RtmpPacketSend:&rtmp_pack];

    PILI_RTMPPacket_Free(&rtmp_pack);
    return nRet;
}

推送时需要注意要把时间戳带上

[self sendPacket:RTMP_PACKET_TYPE_VIDEO data:body size:(rtmpLength) nTimestamp:frame.timestamp];

rtmp是以包为单位进行发送数据。

7.如果是音频帧, 同样的逻辑,需要先推送头信息,才能推帧数据,代码如下

- (void)sendAudioHeader:(LFAudioFrame *)audioFrame {

    NSInteger rtmpLength = audioFrame.audioInfo.length + 2;     /*spec data长度,一般是2*/
    unsigned char *body = (unsigned char *)malloc(rtmpLength);
    memset(body, 0, rtmpLength);

    /*AF 00 + AAC RAW data*/
    body[0] = 0xAF;
    body[1] = 0x00;
    memcpy(&body[2], audioFrame.audioInfo.bytes, audioFrame.audioInfo.length);          /*spec_buf是AAC sequence header数据*/
    [self sendPacket:RTMP_PACKET_TYPE_AUDIO data:body size:rtmpLength nTimestamp:0];
    free(body);
}

- (void)sendAudio:(LFFrame *)frame {

    NSInteger rtmpLength = frame.data.length + 2;    /*spec data长度,一般是2*/
    unsigned char *body = (unsigned char *)malloc(rtmpLength);
    memset(body, 0, rtmpLength);

    /*AF 01 + AAC RAW data*/
    body[0] = 0xAF;
    body[1] = 0x01;
    memcpy(&body[2], frame.data.bytes, frame.data.length);
    [self sendPacket:RTMP_PACKET_TYPE_AUDIO data:body size:rtmpLength nTimestamp:frame.timestamp];
    free(body);
}

五、转发

数据传输: 将编码完成后的音视频数据进行传输,早期的音视频通过同轴电缆之类的线缆进行传输,IP网络发展后,使用IP网络优传输 涉及技术或协议: 传输协议:RTP与RTCP、RTSP、RTMP、HTTP、HLS(HTTP Live Streaming)等 控制信令:SIP和SDP、SNMP等

六、拉流

拉流,指服务器已有直播内容,用指定地址进行拉取的过程。根据协议类型(如RTMP、RTP、RTSP、HTTP等),与服务器建立连接并接收数据;

流程如下:

解析二进制数据,从中找到相关流信息;

根据不同的封装格式(如FLV、TS)解复用(demux);

分别得到已编码的H.264视频数据和AAC音频数据;

使用硬解码(对应系统的API)或软解码(FFMpeg)来解压音视频数据;

经过解码后得到原始的视频数据(YUV)和音频数据(AAC);

因为音频和视频解码是分开的,所以我们得把它们同步起来,否则会出现音视频不同步的现象,比如别人说话会跟口型对不上;

最后把同步的音频数据送到耳机或外放,视频数据送到屏幕上显示。

七、解码

简易流程:通过FFmpeg解码

  • 获取文件流的解码器上下文: formatContext->streams[a/v index]->codec;

  • 通过解码器上下文找到解码器: AVCodec *avcodec_find_decoder(enum AVCodecID id);

  • 打开解码器: int avcodec_open2(AVCodecContext *avctx, const AVCodec *codec, AVDictionary **options);

  • 将文件中音视频数据发送给解码器: int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt);

  • 循环接收解码后的音视频数据: int avcodec_receive_frame(AVCodecContext *avctx, AVFrame *frame);

  • 如果是音频数据可能需要重新采样以便转成设备支持的格式播放.(借助SwrContext)

通过VideoToolbox解码视频

  • 将从FFmpeg中parse到的extra data中分离提取中NALU头关键信息sps,pps等

  • 通过上面提取的关键信息创建视频描述信息:CMVideoFormatDescriptionRef, CMVideoFormatDescriptionCreateFromH264ParameterSets / CMVideoFormatDescriptionCreateFromHEVCParameterSets

  • 创建解码器:VTDecompressionSessionCreate,并指定一系列相关参数.

  • 将压缩数据放入CMBlockBufferRef中:CMBlockBufferCreateWithMemoryBlock

  • 开始解码: VTDecompressionSessionDecodeFrame

  • 在回调中接收解码后的视频数据

通过AudioConvert解码音频

  • 通过原始数据与解码后数据格式的ASBD结构体创建解码器: AudioConverterNewSpecific

  • 指定解码器类型AudioClassDescription

  • 开始解码: AudioConverterFillComplexBuffer

  • 注意: 解码的前提是每次需要有1024个采样点才能完成一次解码操作.

解码难道有点大,需要的可以进群666064665,领取视频教学

八、同步

因为这里解码的是本地文件中的音视频, 也就是说只要本地文件中音视频的时间戳打的完全正确,我们解码出来的数据是可以直接播放以实现同步的效果.而我们要做的仅仅是保证音视频解码后同时渲染. 注意: 比如通过一个RTMP地址拉取的流因为存在网络原因可能造成某个时间段数据丢失,造成音视频不同步,所以需要有一套机制来纠正时间戳.大体机制即为视频追赶音频,后面会有文件专门介绍,这里不作过多说明.

九、渲染

通过上面的步骤获取到的视频原始数据即可通过封装好的OpenGL ES直接渲染到屏幕上,苹果原生框架中也有GLKViewController可以完成屏幕渲染.音频这里通过Audio Queue接收音频帧数据以完成播放.

使用FFmpeg解码 首先根据文件地址初始化FFmpeg以实现parse音视频流.然后利用FFmpeg中的解码器解码音视频数据,这里需要注意的是,我们将从读取到的第一个I帧开始作为起点,以实现音视频同步.解码后的音频要先装入传输队列中,因为audio queue player设计模式是不断从传输队列中取数据以实现播放.视频数据即可直接进行渲染.

- (void)startRenderAVByFFmpegWithFileName:(NSString *)fileName {
    NSString *path = [[NSBundle mainBundle] pathForResource:fileName ofType:@"MOV"];
    
    XDXAVParseHandler *parseHandler = [[XDXAVParseHandler alloc] initWithPath:path];
    
    XDXFFmpegVideoDecoder *videoDecoder = [[XDXFFmpegVideoDecoder alloc] initWithFormatContext:[parseHandler getFormatContext] videoStreamIndex:[parseHandler getVideoStreamIndex]];
    videoDecoder.delegate = self;
    
    XDXFFmpegAudioDecoder *audioDecoder = [[XDXFFmpegAudioDecoder alloc] initWithFormatContext:[parseHandler getFormatContext] audioStreamIndex:[parseHandler getAudioStreamIndex]];
    audioDecoder.delegate = self;
    
    static BOOL isFindIDR = NO;
    
    [parseHandler startParseGetAVPackeWithCompletionHandler:^(BOOL isVideoFrame, BOOL isFinish, AVPacket packet) {
        if (isFinish) {
            isFindIDR = NO;
            [videoDecoder stopDecoder];
            [audioDecoder stopDecoder];
            dispatch_async(dispatch_get_main_queue(), ^{
                self.startWorkBtn.hidden = NO;
            });
            return;
        }
        
        if (isVideoFrame) { // Video
            if (packet.flags == 1 && isFindIDR == NO) {
                isFindIDR = YES;
            }
            
            if (!isFindIDR) {
                return;
            }
            
            [videoDecoder startDecodeVideoDataWithAVPacket:packet];
        }else {             // Audio
            [audioDecoder startDecodeAudioDataWithAVPacket:packet];
        }
    }];
}

-(void)getDecodeVideoDataByFFmpeg:(CMSampleBufferRef)sampleBuffer {
    CVPixelBufferRef pix = CMSampleBufferGetImageBuffer(sampleBuffer);
    [self.previewView displayPixelBuffer:pix];
}

- (void)getDecodeAudioDataByFFmpeg:(void *)data size:(int)size pts:(int64_t)pts isFirstFrame:(BOOL)isFirstFrame {
//    NSLog(@"demon test - %d",size);
    // Put audio data from audio file into audio data queue
    [self addBufferToWorkQueueWithAudioData:data size:size pts:pts];

    // control rate
    usleep(14.5*1000);
}

使用原生框架解码 首先根据文件地址初始化FFmpeg以实现parse音视频流.这里首先根据文件中实际的音频流数据构造ASBD结构体以初始化音频解码器,然后将解码后的音视频数据分别渲染即可.这里需要注意的是,如果要拉取的文件视频是H.265编码格式的,解码出来的数据的因为含有B帧所以时间戳是乱序的,我们需要借助一个链表对其排序,然后再将排序后的数据渲染到屏幕上.

- (void)startRenderAVByOriginWithFileName:(NSString *)fileName {
    NSString *path = [[NSBundle mainBundle] pathForResource:fileName ofType:@"MOV"];
    XDXAVParseHandler *parseHandler = [[XDXAVParseHandler alloc] initWithPath:path];
    
    XDXVideoDecoder *videoDecoder = [[XDXVideoDecoder alloc] init];
    videoDecoder.delegate = self;

    // Origin file aac format
    AudioStreamBasicDescription audioFormat = {
        .mSampleRate         = 48000,
        .mFormatID           = kAudioFormatMPEG4AAC,
        .mChannelsPerFrame   = 2,
        .mFramesPerPacket    = 1024,
    };
    
    XDXAduioDecoder *audioDecoder = [[XDXAduioDecoder alloc] initWithSourceFormat:audioFormat
                                                                     destFormatID:kAudioFormatLinearPCM
                                                                       sampleRate:48000
                                                              isUseHardwareDecode:YES];
    
    [parseHandler startParseWithCompletionHandler:^(BOOL isVideoFrame, BOOL isFinish, struct XDXParseVideoDataInfo *videoInfo, struct XDXParseAudioDataInfo *audioInfo) {
        if (isFinish) {
            [videoDecoder stopDecoder];
            [audioDecoder freeDecoder];
            
            dispatch_async(dispatch_get_main_queue(), ^{
                self.startWorkBtn.hidden = NO;
            });
            return;
        }
        
        if (isVideoFrame) {
            [videoDecoder startDecodeVideoData:videoInfo];
        }else {
            [audioDecoder decodeAudioWithSourceBuffer:audioInfo->data
                                     sourceBufferSize:audioInfo->dataSize
                                      completeHandler:^(AudioBufferList * _Nonnull destBufferList, UInt32 outputPackets, AudioStreamPacketDescription * _Nonnull outputPacketDescriptions) {
                                          // Put audio data from audio file into audio data queue
                                          [self addBufferToWorkQueueWithAudioData:destBufferList->mBuffers->mData size:destBufferList->mBuffers->mDataByteSize pts:audioInfo->pts];

                                          // control rate
                                          usleep(16.8*1000);
                                      }];
        }
    }];
}

- (void)getVideoDecodeDataCallback:(CMSampleBufferRef)sampleBuffer isFirstFrame:(BOOL)isFirstFrame {
    if (self.hasBFrame) {
        // Note : the first frame not need to sort.
        if (isFirstFrame) {
            CVPixelBufferRef pix = CMSampleBufferGetImageBuffer(sampleBuffer);
            [self.previewView displayPixelBuffer:pix];
            return;
        }
        
        [self.sortHandler addDataToLinkList:sampleBuffer];
    }else {
        CVPixelBufferRef pix = CMSampleBufferGetImageBuffer(sampleBuffer);
        [self.previewView displayPixelBuffer:pix];
    }
}

#pragma mark - Sort Callback
- (void)getSortedVideoNode:(CMSampleBufferRef)sampleBuffer {
    int64_t pts = (int64_t)(CMTimeGetSeconds(CMSampleBufferGetPresentationTimeStamp(sampleBuffer)) * 1000);
    static int64_t lastpts = 0;
//    NSLog(@"Test marigin - %lld",pts - lastpts);
    lastpts = pts;
    
    [self.previewView displayPixelBuffer:CMSampleBufferGetImageBuffer(sampleBuffer)];
}

Logo

致力于链接即构和开发者,提供实时互动和元宇宙领域的前沿洞察、技术分享和丰富的开发者活动,共建实时互动世界。

更多推荐