WebRTC视频采集、编码和发送过程

二、摄像头采集、时间戳设置以及数据传递过程VideoCaptureImpl是视频采集的实现类，各个平台都会实现它的子类，子类中会做平台相关的具体实现。子类中采集到的Frame数据都是通过VideoCaptureImpl::IncomingFrame传递进来。如Android平台具体实现的子类为VideoCaptureAndroid，Linux平台为VideoCaptureModuleV4L2。

lincai2018

7730人浏览 · 2020-04-11 21:14:41

lincai2018 · 2020-04-11 21:14:41 发布

一、时间戳定义

1、 NTP时间

2、本地时间

二、摄像头采集、时间戳设置以及数据传递过程

1、传递至编码器

一、时间戳定义

首先，需要罗列下代码中对时间计算的定义，便于后面阅读代码有更好的理解思路。

1、 NTP时间

NtpTime RealTimeClock::CurrentNtpTime() //获取从1900-01-01 00:00.00到当前时刻经过的时间
int64_t RealTimeClock::CurrentNtpInMilliseconds() //获取从1900-01-01 00:00.00到当前时刻经过的毫秒数，ms
int64_t rtc::TimeUTCMicros() //获取从1970-01-01 00:00.00到当前时刻经过的时间，us
int64_t rtc::TimeUTCMillis() //获取从1970-01-01 00:00.00到当前时刻经过的时间，ms
int64_t NtpOffsetMsCalledOnce() //获取ntp时间与本机时间的差值，ms
int64_t NtpOffsetMs() //同NtpOffsetMsCalledOnce()
NtpTime TimeMicrosToNtp(int64_t time_us) //转换本机时间为ntp时间

2、本地时间

从系统启动这一刻起开始计时，不受系统时间被用户改变的影响。

int64_t rtc::TimeMillis() //获取毫秒 ms
int64_t rtc::TimeMicros() //获取微秒 us
int64_t rtc::TimeNanos()  //获取纳秒 ns
int64_t RealTimeClock::TimeInMilliseconds() //获取毫秒 ms
int64_t RealTimeClock::TimeInMicroseconds() //获取微秒 us

二、摄像头采集、时间戳设置以及数据传递过程

VideoCaptureImpl是视频采集的实现类，各个平台都会实现它的子类，子类中会做平台相关的具体实现。子类中采集到的Frame数据都是通过VideoCaptureImpl::IncomingFrame传递进来。如Android平台具体实现的子类为VideoCaptureAndroid，Linux平台为VideoCaptureModuleV4L2。

下面以Linux平台为例：

VideoCaptureModuleV4L2采集到数据之后通过如下接口返回：

int32_t VideoCaptureImpl::IncomingFrame(
    uint8_t* videoFrame,
    int32_t videoFrameLength,
    const VideoCaptureCapability& frameInfo,
    int64_t captureTime/*=0*/) // must be specified in the NTP time format in milliseconds.

上述接口中captureTime若有值，必须为NTP时间，VideoCaptureModuleV4L2在调用时未传参，因此使用默认值0。

设置frame的时间戳timestamp_us_，并回调至接收者

VideoCaptureImpl::IncomingFrame(captureTime = 0) // captureTime在VideoCaptureModuleV4L2下，传入了默认值0
{
    captureFrame.set_timestamp_ms(rtc::TimeMillis())//设置此帧的时间戳，为本地时间；timestamp_rtp_和ntp_time_ms_还未赋值，都为0；
    {
        VideoCaptureImpl::DeliverCapturedFrame(captureFrame)// 传递采集的视频帧
        {
            _dataCallBack->OnFrame(captureFrame);//将帧回调出去
        }
    }
}

1、传递至编码器

void VideoStreamEncoder::OnFrame(const VideoFrame& video_frame)

设置frame的ntp时间ntp_time_ms_

// Capture time may come from clock with an offset and drift from clock_.
int64_t capture_ntp_time_ms;
if (video_frame.ntp_time_ms() > 0) {//值为0，不会进入
  capture_ntp_time_ms = video_frame.ntp_time_ms();
} else if (video_frame.render_time_ms() != 0) {//render_time_ms由timestamp_us_换算过来，本地时间。在采集的时候已经赋值
  capture_ntp_time_ms = video_frame.render_time_ms() + delta_ntp_internal_ms_;
} else {
  capture_ntp_time_ms = current_time_ms + delta_ntp_internal_ms_;
}
incoming_frame.set_ntp_time_ms(capture_ntp_time_ms);

delta_ntp_internal_ms_的值，在类对象构造函数内进行初始化，为ntp时间与本地时间的差值：

delta_ntp_internal_ms_(clock_->CurrentNtpInMilliseconds() - clock_->TimeInMilliseconds())

设置frame的rtp时间timestamp_rtp_

// Convert NTP time, in ms, to RTP timestamp.
const int kMsToRtpTimestamp = 90;
incoming_frame.set_timestamp(
    kMsToRtpTimestamp * static_cast<uint32_t>(incoming_frame.ntp_time_ms()));

至此，此帧的渲染时间戳（timestamp_us_）、采集ntp时间（ntp_time_ms_）和rtp时间戳（timestamp_rtp_）都有值。他们都是表示视频的时间戳，只是不同的表示方式。

忽略编码出现拥塞而丢帧的情况，视频帧将会传递至MaybeEncodeVideoFrame(video_frame)进行编码。

void VideoStreamEncoder::MaybeEncodeVideoFrame(const VideoFrame& video_frame,
                                               int64_t time_when_posted_us) {
    // skip other code
    EncodeVideoFrame(video_frame, time_when_posted_us);
}

void VideoStreamEncoder::EncodeVideoFrame(const VideoFrame& video_frame,
                                          int64_t time_when_posted_us) {
    // skip other code
    VideoFrame out_frame(video_frame);
    encoder_->Encode(out_frame, &next_frame_types_);
}

encoder_的创建，跟踪代码，是由InternalEncoderFactory创建

std::unique_ptr<VideoEncoder> InternalEncoderFactory::CreateVideoEncoder(
    const SdpVideoFormat& format) {
  if (absl::EqualsIgnoreCase(format.name, cricket::kVp8CodecName))
    return VP8Encoder::Create();
  if (absl::EqualsIgnoreCase(format.name, cricket::kVp9CodecName))
    return VP9Encoder::Create(cricket::VideoCodec(format));
  if (absl::EqualsIgnoreCase(format.name, cricket::kH264CodecName))
    return H264Encoder::Create(cricket::VideoCodec(format));
  if (kIsLibaomAv1EncoderSupported &&
      absl::EqualsIgnoreCase(format.name, cricket::kAv1CodecName))
    return CreateLibaomAv1Encoder();
  RTC_LOG(LS_ERROR) << "Trying to created encoder of unsupported format "
                    << format.name;
  return nullptr;
}

std::unique_ptr<H264Encoder> H264Encoder::Create(
    const cricket::VideoCodec& codec) {
  RTC_DCHECK(H264Encoder::IsSupported());
#if defined(WEBRTC_USE_H264)
  RTC_CHECK(g_rtc_use_h264);
  RTC_LOG(LS_INFO) << "Creating H264EncoderImpl.";
  return std::make_unique<H264EncoderImpl>(codec);
#else
  RTC_NOTREACHED();
  return nullptr;
#endif
}

以编码h264为例，编码过程在如下函数中进行：

int32_t H264EncoderImpl::Encode(
    const VideoFrame& input_frame,
    const std::vector<VideoFrameType>* frame_types) {

    rtc::scoped_refptr<const I420BufferInterface> frame_buffer =
          input_frame.video_frame_buffer()->ToI420();

    // Encode image for each layer.
    for (size_t i = 0; i < encoders_.size(); ++i) {
        // EncodeFrame input.
        pictures_[i] = {0};
        pictures_[i].iPicWidth = configurations_[i].width;
        pictures_[i].iPicHeight = configurations_[i].height;
        pictures_[i].iColorFormat = EVideoFormatType::videoFormatI420;
        pictures_[i].uiTimeStamp = input_frame.ntp_time_ms();//编码时间戳使用了ntp时间
        // Downscale images on second and ongoing layers.
        if (i == 0) {
          pictures_[i].iStride[0] = frame_buffer->StrideY();
          pictures_[i].iStride[1] = frame_buffer->StrideU();
          pictures_[i].iStride[2] = frame_buffer->StrideV();
          pictures_[i].pData[0] = const_cast<uint8_t*>(frame_buffer->DataY());
          pictures_[i].pData[1] = const_cast<uint8_t*>(frame_buffer->DataU());
          pictures_[i].pData[2] = const_cast<uint8_t*>(frame_buffer->DataV());
        } else {
            // skip the code               
        }

        // Encode!
        encoders_[i]->EncodeFrame(&pictures_[i], &info);

        encoded_images_[i]._encodedWidth = configurations_[i].width;
        encoded_images_[i]._encodedHeight = configurations_[i].height;
        encoded_images_[i].SetTimestamp(input_frame.timestamp());//设置rtp时间timestamp_rtp_。capture_time_ms_未进行设置，默认为0
        encoded_images_[i]._frameType = ConvertToVideoFrameType(info.eFrameType);
        encoded_images_[i].SetSpatialIndex(configurations_[i].simulcast_idx);

        // Split encoded image up into fragments. This also updates
        // |encoded_image_|.
        // 编码后，编码数据保存在info中，RtpFragmentize将编码数据拷贝到encoded_images_[i]中，并将其中的nalu信息统计在frag_header内
        RTPFragmentationHeader frag_header;
        RtpFragmentize(&encoded_images_[i], &info, &frag_header);

        // 编码成功后，将数据回调出去，接收者即为VideoStreamEncoder
        encoded_image_callback_->OnEncodedImage(encoded_images_[i],
                                              &codec_specific, &frag_header);
    }
}

回调回VideoStreamEncoder

EncodedImageCallback::Result VideoStreamEncoder::OnEncodedImage(
    const EncodedImage& encoded_image,
    const CodecSpecificInfo* codec_specific_info,
    const RTPFragmentationHeader* fragmentation) {

    EncodedImageCallback::Result result = sink_->OnEncodedImage(
        image_copy, codec_info_copy ? codec_info_copy.get() : codec_specific_info,
        fragmentation_copy ? fragmentation_copy.get() : fragmentation);
    
}

回调至VideoSendStreamImpl，由rtp_video_sender_进行数据包的封装和发送

EncodedImageCallback::Result VideoSendStreamImpl::OnEncodedImage(
    const EncodedImage& encoded_image,
    const CodecSpecificInfo* codec_specific_info,
    const RTPFragmentationHeader* fragmentation) {
    
    EncodedImageCallback::Result result(EncodedImageCallback::Result::OK);
    result = rtp_video_sender_->OnEncodedImage(encoded_image, codec_specific_info,
                                               fragmentation);
}

RtpVideoSender进行rtp包的发送和rtcp sr包的发送

EncodedImageCallback::Result RtpVideoSender::OnEncodedImage(
    const EncodedImage& encoded_image,
    const CodecSpecificInfo* codec_specific_info,
    const RTPFragmentationHeader* fragmentation) {

  // 计算rtp时间戳。需要添加StartTimestamp的增量，StartTimestamp是默认值是一个随机数
  uint32_t rtp_timestamp =
      encoded_image.Timestamp() +
      rtp_streams_[stream_index].rtp_rtcp->StartTimestamp();

  // RTCPSender has it's own copy of the timestamp offset, added in
  // RTCPSender::BuildSR, hence we must not add the in the offset for this call.
  // TODO(nisse): Delete RTCPSender:timestamp_offset_, and see if we can confine
  // knowledge of the offset to a single place.
  // RTCPSender内部已经存在一份timestamp offset，在OnSendingRtpFrame传入Timestamp的时候
  // 无需添加offset。
  // TODO: 删除RTCPSender:timestamp_offset_，限制只在一处放置此offset值
  if (!rtp_streams_[stream_index].rtp_rtcp->OnSendingRtpFrame(
          encoded_image.Timestamp(), encoded_image.capture_time_ms_,// 未发现capture_time_ms_赋值处
          rtp_config_.payload_type,
          encoded_image._frameType == VideoFrameType::kVideoFrameKey)) {
    // The payload router could be active but this module isn't sending.
    return Result(Result::ERROR_SEND_FAILED);
  }

  bool send_result = rtp_streams_[stream_index].sender_video->SendEncodedImage(
      rtp_config_.payload_type, codec_type_, rtp_timestamp, encoded_image,
      fragmentation,
      params_[stream_index].GetRtpVideoHeader(
          encoded_image, codec_specific_info, shared_frame_id_),
      expected_retransmission_time_ms);

}

bool RTPSenderVideo::SendEncodedImage(
    int payload_type,
    absl::optional<VideoCodecType> codec_type,
    uint32_t rtp_timestamp,
    const EncodedImage& encoded_image,
    const RTPFragmentationHeader* fragmentation,
    RTPVideoHeader video_header,
    absl::optional<int64_t> expected_retransmission_time_ms) {

  return SendVideo(payload_type, codec_type, rtp_timestamp,
                   encoded_image.capture_time_ms_, encoded_image, fragmentation,
                   video_header, expected_retransmission_time_ms);
}

bool RTPSenderVideo::SendVideo(
    int payload_type,
    absl::optional<VideoCodecType> codec_type,
    uint32_t rtp_timestamp,
    int64_t capture_time_ms,
    rtc::ArrayView<const uint8_t> payload,
    const RTPFragmentationHeader* fragmentation,
    RTPVideoHeader video_header,
    absl::optional<int64_t> expected_retransmission_time_ms) {

  std::unique_ptr<RtpPacketToSend> single_packet =
      rtp_sender_->AllocatePacket();
  RTC_DCHECK_LE(packet_capacity, single_packet->capacity());
  single_packet->SetPayloadType(payload_type);//设置pt
  single_packet->SetTimestamp(rtp_timestamp);//设置时间戳
  single_packet->set_capture_time_ms(capture_time_ms);

  // skip other code
  bool first_frame = first_frame_sent_();
  std::vector<std::unique_ptr<RtpPacketToSend>> rtp_packets;
  for (size_t i = 0; i < num_packets; ++i) {
    RtpPacketToSend* packet;
    int expected_payload_capacity;
    // Choose right packet template:
    if (num_packets == 1) {
      packet = std::move(single_packet);
      expected_payload_capacity =
          limits.max_payload_len - limits.single_packet_reduction_len;
    } else if (i == 0) {
      packet = std::move(first_packet);
      expected_payload_capacity =
          limits.max_payload_len - limits.first_packet_reduction_len;
    } else if (i == num_packets - 1) {
      packet = std::move(last_packet);
      expected_payload_capacity =
          limits.max_payload_len - limits.last_packet_reduction_len;
    } else {
      packet = std::make_unique<RtpPacketToSend>(*middle_packet);
      expected_payload_capacity = limits.max_payload_len;
    }

    packet->set_first_packet_of_frame(i == 0);

    if (!packetizer->NextPacket(packet.get()))// RtpPacketizerH264，取出一个数据包，payload填入packet中
      return false;
    RTC_DCHECK_LE(packet->payload_size(), expected_payload_capacity);
    if (!rtp_sender_->AssignSequenceNumber(packet.get()))// 设置sequence number
      return false;

    // No FEC protection for upper temporal layers, if used.
    bool protect_packet = temporal_id == 0 || temporal_id == kNoTemporalIdx;

    packet->set_allow_retransmission(allow_retransmission);

    // Put packetization finish timestamp into extension.
    if (packet->HasExtension<VideoTimingExtension>()) {
      packet->set_packetization_finish_time_ms(clock_->TimeInMilliseconds());
    }

    // fec 逻辑
    if (protect_packet && fec_generator_) {
      if (red_enabled() &&
          exclude_transport_sequence_number_from_fec_experiment_) {
        // See comments at the top of the file why experiment
        // "WebRTC-kExcludeTransportSequenceNumberFromFec" is needed in
        // conjunction with datagram transport.
        // TODO(sukhanov): We may also need to implement it for flexfec_sender
        // if we decide to keep this approach in the future.
        uint16_t transport_senquence_number;
        if (packet->GetExtension<webrtc::TransportSequenceNumber>(
                &transport_senquence_number)) {
          if (!packet->RemoveExtension(webrtc::TransportSequenceNumber::kId)) {
            RTC_NOTREACHED()
                << "Failed to remove transport sequence number, packet="
                << packet->ToString();
          }
        }
      }

      fec_generator_->AddPacketAndGenerateFec(*packet);
    }

    if (red_enabled()) {
      // 发送冗余包
      std::unique_ptr<RtpPacketToSend> red_packet(new RtpPacketToSend(*packet));
      BuildRedPayload(*packet, red_packet.get());
      red_packet->SetPayloadType(*red_payload_type_);

      // Send |red_packet| instead of |packet| for allocated sequence number.
      red_packet->set_packet_type(RtpPacketMediaType::kVideo);
      red_packet->set_allow_retransmission(packet->allow_retransmission());
      rtp_packets.emplace_back(std::move(red_packet));
    } else {
      // 发送原始包
      packet->set_packet_type(RtpPacketMediaType::kVideo);
      rtp_packets.emplace_back(std::move(packet));
    }

    if (first_frame) {
      if (i == 0) {
        RTC_LOG(LS_INFO)
            << "Sent first RTP packet of the first video frame (pre-pacer)";
      }
      if (i == num_packets - 1) {
        RTC_LOG(LS_INFO)
            << "Sent last RTP packet of the first video frame (pre-pacer)";
      }
    }
  }

  if (fec_generator_) {
    // 取出所有fec包，放入发送包队列中
    // Fetch any FEC packets generated from the media frame and add them to
    // the list of packets to send.
    auto fec_packets = fec_generator_->GetFecPackets();

    // TODO(bugs.webrtc.org/11340): Move sequence number assignment into
    // UlpfecGenerator.
    const bool generate_sequence_numbers = !fec_generator_->FecSsrc();
    for (auto& fec_packet : fec_packets) {
      if (generate_sequence_numbers) {
        rtp_sender_->AssignSequenceNumber(fec_packet.get());
      }
      rtp_packets.emplace_back(std::move(fec_packet));
    }
  }

  // 发送rtp包
  LogAndSendToNetwork(std::move(rtp_packets), unpacketized_payload_size);
}

最终rtp发送

void RtpSenderEgress::SendPacket(RtpPacketToSend* packet,
                                 const PacedPacketInfo& pacing_info) {
    const bool send_success = SendPacketToNetwork(*packet, options, pacing_info);
}

bool RtpSenderEgress::SendPacketToNetwork(const RtpPacketToSend& packet,
                                          const PacketOptions& options,
                                          const PacedPacketInfo& pacing_info) {
  int bytes_sent = -1;
  if (transport_) {
    UpdateRtpOverhead(packet);
    bytes_sent = transport_->SendRtp(packet.data(), packet.size(), options)
                     ? static_cast<int>(packet.size())
                     : -1;
    if (event_log_ && bytes_sent > 0) {
      event_log_->Log(std::make_unique<RtcEventRtpPacketOutgoing>(
          packet, pacing_info.probe_cluster_id));
    }
  }
}

对于Rtcp的SR包，其构建过程如下：

std::unique_ptr<rtcp::RtcpPacket> RTCPSender::BuildSR(const RtcpContext& ctx) {
  // Timestamp shouldn't be estimated before first media frame.
  RTC_DCHECK_GE(last_frame_capture_time_ms_, 0);
  // The timestamp of this RTCP packet should be estimated as the timestamp of
  // the frame being captured at this moment. We are calculating that
  // timestamp as the last frame's timestamp + the time since the last frame
  // was captured.
  int rtp_rate = rtp_clock_rates_khz_[last_payload_type_];
  if (rtp_rate <= 0) {
    rtp_rate =
        (audio_ ? kBogusRtpRateForAudioRtcp : kVideoPayloadTypeFrequency) /
        1000;
  }
  // Round now_us_ to the closest millisecond, because Ntp time is rounded
  // when converted to milliseconds,
  uint32_t rtp_timestamp =
      timestamp_offset_ + last_rtp_timestamp_ +
      ((ctx.now_us_ + 500) / 1000 - last_frame_capture_time_ms_) * rtp_rate;

  rtcp::SenderReport* report = new rtcp::SenderReport();
  report->SetSenderSsrc(ssrc_);
  report->SetNtp(TimeMicrosToNtp(ctx.now_us_));//由当前本机时间转换出ntp时间。生成此SR包的ntp时间
  report->SetRtpTimestamp(rtp_timestamp);//最后一个包的rtp_timestamp时间+offset增量+经过的时间得到
  report->SetPacketCount(ctx.feedback_state_.packets_sent);
  report->SetOctetCount(ctx.feedback_state_.media_bytes_sent);
  report->SetReportBlocks(CreateReportBlocks(ctx.feedback_state_));

  return std::unique_ptr<rtcp::RtcpPacket>(report);
}

以下非最新代码
--> 设置Frame的render_time_ms
1.由于传入的captureTime值为0，设置为本机时间戳
captureFrame.set_render_time_ms(TickTime::MillisecondTimestamp())
2.否则，设置为NTP值减去（NTP值与本机时间戳的差值）

类对象在构造函数内会初始化NTP值与本机时间戳的差值：

delta_ntp_internal_ms_(
Clock::GetRealTimeClock()->CurrentNtpInMilliseconds() - TickTime::MillisecondTimestamp())

captureFrame.set_render_time_ms(capture_time - delta_ntp_internal_ms_);

可以看出计算得到的值其实就是本机时间戳

--> last_capture_time_ = captureFrame.render_time_ms();保存最新的采集时间戳，本地时间戳
--> ViECapturer::OnIncomingCapturedFrame(I420VideoFrame& video_frame)
--> // Make sure we render this frame earlier since we know the render time set
// is slightly off since it's being set when the frame has been received from

// the camera, and not when the camera actually captured the frame.

去除采集的延迟，Android为190ms

video_frame.set_render_time_ms(video_frame.render_time_ms() - FrameDelay());

三、视频帧从采集到编码过程

ViECapturer::ViECaptureProcess()
--> ViECapturer::DeliverI420Frame(I420VideoFrame* video_frame)
--> ViEFrameProviderBase::DeliverFrame()
--> ViERenderer::DeliverFrame() 视频回显窗口
--> ViEEncoder::DeliverFrame（）传递到编码器
--> Convert render time, in ms, to RTP timestamp.
const int kMsToRtpTimestamp = 90;
const uint32_t time_stamp = kMsToRtpTimestamp * static_cast<uint32_t>(video_frame->render_time_ms());
video_frame->set_timestamp(time_stamp);
--> VideoCodingModuleImpl::AddVideoFrame()
--> VideoSender::AddVideoFrame
--> VCMGenericEncoder::Encode
--> VideoEncoder::Encode() 对于VP8，这里应该是VideoEncoder子类VP8Encoder
对于H264，这里应该是VideoEncoder子类H264Encoder
--> VCMEncodedFrameCallback::Encoded()
--> VCMPacketizationCallback::SendData()
--> ViEEncoder::SendData()