Author | Ren Xuelong

guide

As a basic capability of the Internet, the webcast function has become more and more important, and the live broadcast function in mobile phones has become more and more perfect. Various live broadcast types such as e-commerce live broadcast, news live broadcast, and entertainment live broadcast have provided users with rich live broadcast content. With the popularity of live broadcasting, it is becoming more and more important to provide users with an extremely fast and smooth live broadcast viewing experience.

The full text is 6657 words, and the expected reading time is 17 minutes.

Baidu APP, as Baidu’s aircraft carrier-level application, provides users with comprehensive mobile services, and live broadcast is also one of the necessary functions to provide users with content. As the structure and business capabilities of the live broadcast room gradually mature, the optimization of live broadcast room playback indicators is becoming more and more important. When a user clicks on a live broadcast resource, being able to quickly see the live broadcast image is one of the core experiences, and the start-up speed has become a key indicator in the optimization of the live broadcast room.

Due to the size of the package and other reasons, the live broadcast function in the Android version of Baidu APP is accessed through a plug-in, and the live broadcast module will only be loaded when the user actually uses the live broadcast function. In order to solve the situation that users need to wait for the plug-in download, installation, loading and other stages when clicking the live broadcast function, and the download of compatible plug-ins fails, the live broadcast team has extracted core capabilities such as playback and IM into an independent, small first-level plug-in and built it into Baidu. In the APP, the business capabilities such as pendants, gifts, attention, and likes in the live broadcast room are in another large-scale secondary plug-in. Special plug-in logic and complex business scenarios make the performance of the Android version’s overall start-up duration indicators unsatisfactory.

In 2022, the 80th percentile of the overall start-up time index of the Q1 live broadcast room is about 3s, and the second jump (up and down in the live broadcast room) scene is about 1s. Enter the live broadcast room to carry the stream address (this address will be used to pre-start the broadcast after the page is launched, and it will be executed synchronously with the live broadcast list loading). There is a significant increase in the start of the scene, from about 1.5s at the beginning of the release of the version, and gradually within two weeks as the version converges Grow to 2.5s+. That is to say, when clicking live resources outside the live broadcast room to enter the live broadcast room online, a large number of users still need to wait for 3 seconds or even longer after clicking to actually see the live broadcast screen. This duration has a very large negative impact on users’ use of the live broadcast function, and the start-up duration index urgently needs to be optimized.

picture △Starting link

A brief description of the broadcast start process is that the user clicks the live broadcast resource, opens the live broadcast page, requests the broadcast start address, calls the kernel to start the broadcast, the kernel broadcast is completed, the kernel notifies the service, and the service start is completed. From the monitoring of the start-up time of the kernel, it takes about 600-700ms for the start-up of live resources in the kernel. Considering the loss of other stages in the link and the scene of the second jump (up and down in the live room), it can be started in advance when sliding broadcast, the overall starting time target is 1.5 seconds; considering that some locations that enter the live broadcast room already have the starting stream address, the stage of “requesting the starting address” can be omitted in some scenarios. In the scene where the start broadcast address has been obtained outside, the start broadcast duration target is set at 1.1 seconds.

Due to the special plug-in logic and complex business scenarios, the starting link of the Android version will not be exactly the same every time it enters the live broadcast. When there is only the first-level plug-in and the second-level plug-in is not ready, request the live broadcast data and start broadcasting in the first-level plug-in. When both the first and second plug-ins have been loaded, use the second-level plug-in to request live data and process the start-up, and enter the live room to carry the stream address In order to realize instant opening, a player is created after the Activity is started, and the streaming address carried outside the live broadcast room is used to start broadcasting. In addition to these types of links, there are some other situations. The complicated start-up link leads to that. Although there are time stamps between the main nodes during the start-up process, and there are also 80th percentile reports on the time spent by two adjacent nodes at the day level, the start-up chains reported in different scenarios online The paths cannot be exhaustive, and the real time-consuming position in the start link of the live broadcast market cannot be analyzed using the existing reports. It is necessary to establish a new monitoring plan and find time-consuming points before designing a targeted plan to optimize each time-consuming location.

5.1 Design a new report and locate time-consuming points

picture △Schematic diagram of the start-up link when there is a start-up address in one hop

Since the existing reports cannot meet the positioning of the time-consuming stage of the inception link, a new monitoring solution needs to be designed. Observe the flow chart of the scene with streaming addresses when opening the live room (above). After entering the live room, the list of live rooms and the pre-starting of the player will be created synchronously. When the list of live rooms is created and the player receives the first frame The initiating process ends when notified. Although the user clicks to the onCreate of the page Activity, there may be multiple nodes (first-level plug-in installation, loading, etc.), there may be multiple nodes in the page onCreate calling the player pre-play, and there may be multiple nodes in the kernel completion to the live service notification. nodes, resulting in the inability to exhaustively enumerate the entire initiating link. But we can find that there must be a path from user click to onCreate, and there must be a path from onCreate to player creation. This shows that although the number of nodes and links between two key nodes cannot be determined, the order of the two key nodes is certain and will definitely exist. From this, we can design a query report for the starting point of the custom link and the end point of the custom link, and obtain the time consumption between two arbitrary nodes through the difference between the end point and the starting point timestamp, and calculate all the differences between the two nodes on the line The 80th percentile, you can get the time spent between the two nodes in the online start time. By calculating the time consumption of all core key nodes in the initiating link, we can find the segment with abnormal time-consuming in the entire initiating link.

After developing a new report according to the above ideas, the time consumption of each stage of the above link will be clearer, as shown in the figure below, so that we can break through different stages one by one.

picture △Time-consuming between key nodes

5.2 Use a first-level plug-in to start broadcasting in one hop

Using the new report statistics to observe the time consumption between key nodes, it takes a long time between the creation of the live room list (template component creation) and the actual call start (business view is ready), and this time consumption will gradually increase as the version converges. The increase is about 1000ms in two weeks. First, we solve the problem of time-consuming increase between these two nodes.

After observation and analysis of the start-up link, it is found that as the version converges, this part of the start-up link changes greatly, mainly because the proportion of the node that triggers the “service call start” node in the second-level plug-in converges with the version Increase. During the version convergence period, there is a high probability that the second-level plug-in has not been downloaded or installed when entering the live broadcast room. At this time, the first-level plug-in can quickly create a list and create a business view. In the first-level plug-in, the item attach of RecyclerView to the view The broadcast will be triggered when the tree is opened. This link is mainly waiting for the kernel to complete the pulling and parsing of the first frame of data. When the second-level plug-in gradually converges, the first-level plug-in will no longer create a business view after entering the live broadcast room, but the second-level plug-in will create a business view. Since there are many business components in the second-level plug-in, it takes time to load one by one, and there is also a certain time-consuming layer-by-layer call or event distribution from the first level to the second level, so the scene of the second-level plug-in broadcasting greatly increases the creation of the live room list (template Component creation) to the actual call start (business view ready) time-consuming.

5.2.1 Use the first-level plug-in to start broadcasting in one hop

Based on the analysis of the above problems, we modified the start-up logic of one hop scene, and all hops use the first-level plug-in to start play. The parent container id of the player created by the first-level plug-in and the second-level plug-in is the same, so that after the parent container of the player is initialized in the first-level plug-in, the playback process can end when the first frame of the kernel is called back. In the second-level plug-in, when initializing the parent container of the player, it will also judge whether it has been added to the view tree through the id. Only when it is not added (the second jump scene or an exception occurs during the first jump) will it be processed in the second level. The processing speed can be faster in the first-level plug-in, and the first-level priority and second-level bottom-up logic ensure that the view can be initialized smoothly after entering the live broadcast room.

5.2.2 Advance request interface

Using a plug-in to handle the start and optimize the link level of the second-level plug-in has many problems. Another time-consuming point is that when entering the live broadcast room, only the room_id is passed in without the stream address. In this case, the start request needs to be obtained through the interface. The player can be created and played only after broadcasting data. In order to optimize this time-consuming part, we designed a data request manager in the live broadcast room, which provides cache data and timeout cleaning logic. When the page is onCreate, the manager will be triggered to make an interface request. After the live room template is created, the requested live data will be obtained through the manager. If the manager interface request has not yet ended, the in-progress request will be reused, waiting for the request Return data immediately after completion. In this way, we can make full use of the 300ms time in the figure to do more necessary logic when entering the live room without streaming data.

picture

5.3 Pre-playing outside the player Activity

After optimizing the time-consuming access to the service link of the live broadcast room by pre-creating the player in the live broadcast room, pre-starting the broadcast, and starting the broadcast using a first-level plug-in in one hop, the time-consuming of the business link is gradually lower than the time-consuming of the kernel part, and the player Kernel time consumption has gradually become the bottleneck for optimization of one-hop start-up time consumption. In addition to exploring optimization solutions within the kernel, it is also an important direction to continue to optimize the entire inception link of the business. Through the time consumption between nodes, it can be found that it takes about 300ms for the user to click to the middle of onCrete on the Activity page. When this part of the time-consuming cannot be shortened, we can try to process some things in parallel during this time, reducing some of the logic after the page starts.

After the first-level plug-in is built into the Baidu APP, the plug-in pre-loading function is designed and launched. After the launch, users enter the live broadcast room by clicking on the live broadcast resource. 99%+ of the live broadcast first-level plug-in has been loaded. There is no more room for manipulation here when level-level plug-ins are loaded. However, advancing the timing of pre-broadcasting to the point where the user clicks can parallelize the loading of kernel data and the startup of the live broadcast room to a greater extent, thus reducing the impact of kernel time consumption on the entire start-up time consumption. picture △Schematic diagram of the player starting broadcast outside the live broadcast room

As shown in the figure above, add an early start module. After the user clicks, the player will be created and cached in parallel with the page startup. When the player is created after the page starts, it will first try to get the already started playback from the cache of the early start module. If the player is not obtained, then follow the normal player creation logic. If the cached player is obtained and no error occurs in the player, you only need to wait for the first frame of the kernel.

After the player starts broadcasting in advance, there is a high probability that the first frame event will arrive after the activity is started, but there is still a chance that it will arrive before the first frame monitoring is set in the live broadcast service. If the broadcast is successful, the broadcast success event needs to be distributed immediately (the meaning is different from the first frame event, so as to prevent confusion with the first frame event).

The timeout recovery logic is also designed in the early start module. If the early start fails or is not reused by the business within 5s (tentative) (Activity start abnormality or other business abnormalities), the cached player will be actively recovered to prevent the live broadcast The player created in advance takes up more memory and avoids leakage when the reuse is not successful. It is recycled when it is completed, but it cannot be set longer to prevent memory usage when it will not be reused.

Through the early start function, the experimental period hits the early start logic rather than the early start logic, and the overall start time takes 80th percentile optimization average: 450ms+.

5.4 The tasks in the live broadcast room are broken up

picture △Time-consuming distribution of the first kernel frame

After the time consumption of business links and kernel links has been optimized to a certain extent, we continue to disassemble the time consumption between key nodes. It takes a long time between marking the first frame notification inside the kernel and actually receiving the first frame notification for the live broadcast service. As shown in the figure above, the 80th percentile average of the online kernel first frame distribution time exceeds 1s. This segment optimizes the overall start-up time consumption Greater impact. The first frame of the kernel is marked in the sub-thread. When notifying the business, the message will be distributed through the main thread Handler, and the event will be transferred to the main thread through the system’s message distribution mechanism.

By checking all the main thread tasks between the time point of the first frame marked by the kernel and the time point when the business receives the first frame notification event, it is found that when the first frame distribution task starts queuing, there are many other tasks in the main thread task queue, and other event processing The time is long, which leads to a long queue time for the distribution of the first frame, and a long time for the distribution task as a whole. The live broadcast business is complex. If other tasks in the live broadcast room are already in the queue or are being executed when the first frame distribution task of the kernel is queued, the first frame distribution task cannot be executed until the live broadcast task is completed.

By screening all the main thread tasks during the start-up process of the live broadcast room, it is found that the second-level plug-in has more business functions, and the overall loading task takes a long time to execute. In order to verify that the online is also due to the second-level business tasks blocking the first frame distribution task , we designed an experiment in which the second-level component loading needs to wait for the first frame of the kernel before it can be carried out. By comparing the data of the experimental group with the control group, the distribution time of the first frame and the overall start-up time of the hit experiment are all significantly reduced. The overall time-consuming optimization is about 500ms.

Through experimental verification and local analysis of the business logic of the start-up stage, the number of pre-loading of various business components and corresponding views located in the live broadcast room is relatively large and time-consuming. Time, after the secondary plug-in is loaded, the business view will be created in parallel with the interface request in advance, and the initialization component and view will save time for component rendering after the interface is completed. If it is not pre-created, the business component will be initialized after the interface data comes back, and the data will be set after it is actively created. However, it takes a long time to execute all the pre-created tasks serially, which will block the main thread. Executing too many tasks in one frame of the page will also cause the page to freeze obviously.

After discovering this blocking problem, we designed to split the pre-created view tasks, split the large tasks executed together into multiple small tasks, and initialize each component as a separate task in the main thread task queue Queued for execution. It avoids the problem that a large task takes a long time. After this function is launched, the time-consuming task of loading components in the entire secondary plug-in has been reduced by 40%+.

5.5 The kernel sub-thread distributes the first frame

Since the tasks in the message queue of the main thread are queued for execution, after splitting the large task that blocks the first frame distribution event into more small tasks, it still cannot solve the problem that these small tasks have already been queued in the main thread task queue when the first frame event starts queuing question. In addition to reducing the impact of live broadcast services, it can also reduce the time-consuming first frame distribution by accelerating the distribution of kernel tasks. It is necessary to design a scheme on how to avoid the main thread queuing or fast queuing for the first frame event of the kernel without affecting the kernel stability and business logic.

In order to solve the above problems, we promoted the kernel and added a sub-thread to notify the business of the first frame event capability. After the business receives the first frame callback in the sub-thread, insert a new task to the front of the main thread task queue through the Handler’s postAtFrontOfQueue() method, so that the main thread can immediately process our newly created task after processing the current task. In the new task, the player screen logic can be processed immediately. There is no need to wait for the broadcast kernel’s original main thread message.

The pre-insertion of the main thread task cannot interrupt the task that has already started executing in the main thread when the new task is queued, and it will be executed only after the executing task ends. In order to optimize this scenario, after the kernel notifies the first frame through the sub-thread, the player needs to record this state. Before and after the execution of the live room business tasks in the first-level plug-in and the second-level plug-in, it is added to judge whether the player has received the first frame. Frame logic, if it has been received, you can process the screen first and then continue the current task.

By inserting the first frame message of the live broadcast kernel in front of the main thread task queue and judging whether the addition of key business nodes can be displayed on the screen, the first frame notification can be processed faster, and the impact of the first frame distribution on the broadcast start time can be reduced.

5.6 Balance of Start and Finish Indicators

During the optimization process of the live broadcast room, the full load time indicator (full load time: the time when the user clicks to the core functions of the live room, including page startup, live room list creation, secondary plug-in download, installation, loading, live room Interface data request, initialization of functional component view and rendering data in the live broadcast room, and core business component display) are also being continuously optimized. The second-level plug-in in the live broadcast room only triggers the download, installation and loading logic when the functions in the second-level plug-in are used. In the completed link, it is also noticed that it takes time for the user to click to the page onCreate, as shown in the figure below.

picture △ Schematic diagram of time-consuming page startup

In order to optimize the full load indicator of the live broadcast room, the live broadcast team considers that if the plug-in loading is paralleled with the page startup, then the time required for full loading will also be optimized to a certain extent. The live broadcast team continued to design the pre-loading scheme of the secondary plug-in, and advanced the loading position of the secondary plug-in to when the user clicks (this function is launched before the corresponding functions in chapters 5.4 and 5.5). After the function was launched, the data of the experimental group and the control group showed that the time-consuming of the experimental group was indeed 300ms+ optimized compared with the control group. However, there was an abnormality in the start-up time consumption. The start-up time consumption of the experimental group was significantly longer than that of the control group by 500ms+, and the start-up degradation continued to increase as the version converged. We quickly discovered this anomaly and determined that the data was correct through data analysis. How does the optimization of the full load cause the start-up change?

After data analysis, we found that the main location affected by the start of the broadcast is caused by the distribution of the first frame message of the kernel to the main thread, that is, the earlier the secondary plug-in is loaded, the time-consuming task of the distribution of the first frame of the kernel and the loading of the secondary component The greater the possibility of conflict. After confirming the cause of the problem, we implemented the functions in chapters 5.4 and 5.5 to reduce the impact of the secondary component loading task on the start-up. Since the time-consuming tasks in the second-level plug-in are completely split and scattered to alleviate the start-up degradation caused by the pre-download of the second-level plug-in, the complexity of the solution is relatively high, and the logic intrusion into the live broadcast room is too large, and the pre-loading of the second-level plug-in is not fully online. Fully loaded optimization We devised other solutions to achieve the goal.

Although we cannot directly load the secondary plug-in when entering the live broadcast room, we can download the secondary plug-in as much as possible before entering the live broadcast room, and load it directly when using it. This time-consuming is very small compared to the download time. We have optimized the plug-in pre-download module, which is triggered to pre-download plug-ins when live broadcast resources are displayed outside the live broadcast room. This module will comprehensively judge the current device network, bandwidth, download frequency and other conditions, and download the matching secondary plug-in at the right time. After the plug-in is downloaded in advance, the full load index will be greatly optimized. In addition to the plug-in pre-download, in the live room, through the initialization and splitting of the second-level components of the live room in Chapter 5.4, all components are initialized to optimize the main thread blocking, so that after the interface data request is successful, the components that affect the full load statistics can be prioritized. Others Components can be initialized after the full load is over. This solution also significantly optimizes the live broadcast full load indicator.

In addition to the above two optimization schemes, the live broadcast team also optimized the full load indicators in other directions, and also dealt with the indicator balance between the full load duration and the start broadcast duration, and did not cause deterioration of other indicators due to the optimization of one indicator . In the end, all the start-up and full-load indicators were achieved.

picture △2022 Time-consuming trend of Android terminal broadcasting

After the above multiple optimization schemes have been iterated step by step, at present, the latest version data on the Android side, the start-up time of the market has been reduced from 3s+ to about 1.3s; The broadcasting time was reduced from 1s+ to less than 700ms, and the scheduled goal was successfully completed.

As a core indicator of the live broadcast function, the start-up time needs to be continuously polished and optimized. In addition to the optimization of the business architecture, there are multiple directions to be explored, such as optimizing the streaming protocol, optimizing buffer configuration, adaptive network speed start-up, optimizing gop configuration, and edge node acceleration. The Baidu live broadcast team will continue to cultivate live broadcast technology to bring users better and better live broadcast experience.

——END——

Recommended reading:

iOS SIGKILL semaphore crash capture and optimization practice

How to implement flexible scheduling strategies in gateway services with millions of qps

Simple DDD programming

Baidu APP iOS terminal memory optimization practice-memory control scheme

Application of Ernie-SimCSE comparative learning in content anti-cheating

Quality assessment model helps improve risk decision-making level

#Baidu #Android #live #broadcast #experience #optimization #Baidu #Geek #personal #space #News Fast Delivery

Leave a Comment

Your email address will not be published. Required fields are marked *