2.5 times better performance! Byte Open Source High Performance C++ JSON Library sonic-cpp – News Fast Delivery

sonic-cpp is an efficient JSON library for C++ language jointly developed by the ByteDance STE team and the service framework team. It makes extreme use of the current CPU hardware features and vectorized programming, greatly improving the performance of serialization and deserialization, parsing The performance is 2.5 times that of rapidjson. Since sonic-cpp was launched internally in Byte, it has saved hundreds of thousands of CPU cores for core businesses such as Douyin and Toutiao. Recently, we officially open-sourced sonic-cpp, hoping to help more developers. Welcome to star and fork.

Github: https://github.com/bytedance/sonic-cpp

In ByteDance, there are a large number of businesses that need to use JSON parsing and adding, deleting, checking and modifying. The number of CPU cores occupied is very large, and the corresponding physical machine costs are relatively high. In some single services, the proportion of JSON CPUs even exceeds 40. %. Therefore, improving the performance of the JSON library is crucial to the cost optimization of ByteDance’s business. At the same time, the JSON parsing library has been updated several times. Although rapidjson, which is widely used in the industry, has greatly improved its performance, it still has a certain disadvantage in parsing performance compared to some recent new libraries (such as yyjson and simdjson). .

Figure 1.1 Analysis performance comparison of yyjson, simdjson and rapidjson

Image Source: https://github.com/ibireme/yyjson

Although yyjson and simdjson have faster JSON parsing speed, they both have their own shortcomings. simdjson does not support modifying the parsed JSON structure, which cannot be implemented in actual business. In order to pursue parsing performance, yyjson uses a linked list structure, resulting in very poor performance when searching for data.

Figure 1.2 yyjson data structure

Image source from: https://github.com/ibireme/yyjson

Based on the above reasons, in order to reduce physical costs, optimize performance, and utilize ByteDance’s open source Go JSON parsing library sonic-go’s experience and some ideas, the STE**** team and the service framework team have jointly developed a C JSON parsing library sonic-cpp for C++ services.

sonic-cpp mainly has the following features:

Efficient parsing performance, its performance is 2.5 times that of rapidjson
Solve the respective shortcomings of yyjson and simdjson, and support efficient addition, deletion, modification and query
Basically supports all common interfaces of json library, which is convenient for users to migrate
It has been implemented on a large scale in ByteDance’s commercial advertising, search, recommendation and many other middle-end businesses, and has passed the test of engineering

The design of sonic-cpp integrates the advantages of rapidjson, yyjson and simdjson, and further optimizes on this basis. In the process of implementation, we mainly make full use of key technologies such as vectorization (SIMD) instructions, optimized memory layout, and on-demand parsing, so that serialization, deserialization, and addition, deletion, modification, and query can achieve the ultimate performance.

Vectorized optimization (SIMD)

Single Instruction Multiple Data (Single Instruction Multiple Data, abbreviation:SIMD) is a technology that uses one controller to control multiple processors, and performs the same operation on each data in a set of data at the same time, thereby achieving spatial parallelism. For example, X86’s SSE or AVX2 instruction set, and ARM’s NEON instruction set. One of sonic-cpp’s core optimizations is achieved by taking advantage of the SIMD instruction set.

Serialization optimization

In the process of serializing from the DOM memory representation to the file, a very important process is to escape the string, such as adding the escape character “ before the quotation mark.For example, putThis is "a" string Serialized into "This is "a" string" , stored in the file.A common implementation is to scan characters one by one and add escapes, such as the implementation of cJson

sonic-cpp processes 32 characters at a time through five vectorized instructions, greatly improving performance.

The serialization process is as follows:

Through a vectorized load instruction, read 32 bytes to the vector register YMM1 at a time;

YMM1 and another 32 bytes (all“) for comparison to get a mask (Mask), which is stored in the vector register YMM2;
Then use a move mask instruction to reduce the mask in YMM2 to the GPR register R1;
Finally, calculate the number of tail 0 in R1 by instruction, and you can get”s position

But if there is no load mask instruction set of AVX512, when the last 32 bytes are read at the end, memory out of bounds may occur, which may cause problems such as coredump. The processing method of sonic-cpp is to use the memory allocation mechanism of Linux in units of pages, and solve the problem by checking whether the memory to be read spans pages. As long as it doesn’t cross pages, we think it’s safe to cross the border. If it spans pages, it will be processed in a conservative manner to ensure correctness and greatly improve the efficiency of serialization. See sonic-cpp implementation for specific implementation.

Deserialization optimization

In the deserialization process of JSON, there is also a very important step of parsing values, which is crucial to the performance of parsing. For example, parse the string “12.456789012345” into a floating point number 12.456789012345. Common implementations are basically character-by-character parsing, see Rapidjson’s implementation of ParseNumber.

sonic-cpp also uses SIMD instructions to analyze floating-point numbers, and the implementation method is shown in the figure below.

Similar to serialization and vectorization, the position of the decimal point and terminator is obtained through the same vector instruction, and then the original string is subtracted by the vector subtraction instruction'0'to get the real value.

When we have determined the position of the decimal point and terminator, as well as the 16 original values stored in the vector register, they are combined into the final 12456789012345and index 12.

Do benchmark tests for floating-point numbers of different lengths, and you can see that the parsing performance has improved significantly.

However, we found that when the length of the string is relatively small (less than 4), the performance of vectorization is degraded, because at this time the data is short, and scalar calculation does not have much disadvantage, but vectorization requires multiplication Add this type of recomputation instruction.

By analyzing the characteristics of JSON used internally by ByteDance, we found that there are a large number of short integers with less than 4 digits. At the same time, we believe that the number of floating-point numbers is generally longer than the decimal part, so we further improve this method. Integer Part of it is cyclically read and parsed through the scalar method, while the decimal part is processed through the above-mentioned vectorization method to speed up processing, which has achieved very good results.The process is as follows, see sonic-cpp ParseNumber implementation for specific implementation

Analysis on demand

In some business scenarios, users often only need a few target fields in JSON. At this time, it is unnecessary to fully parse the entire JSON. To this end, sonic-cpp implements a high-performance on-demand parsing interface, which can parse the target field in JSON according to the given JsonPointer (the path representation of the target field in JSON). When parsing on demand, due to the large size of JSON, the core operation is often how to skip unnecessary fields. as follows.

traditional implementation

JSON is a semi-structured data that often has nested objects and arrays. At present, there are mainly two methods to realize on-demand parsing: recursive descent method and two-stage processing.The recursive descent method needs to “parse” the entire JSON recursively and skip all unnecessary JSON fields. This method has too many branches and poor performance; the two-stage processing needs to mark the position of the entire JSON token structure in the first stage. E.g,}]etc. In the second stage, according to the token position information, the unnecessary JSON fields are linearly skipped. For example, when the required fields are in the front position in the JSON, the performance of this method is poor.

sonic-cpp implementation

sonic-cpp implements high performance based on SIMDsingle stageon-demand parsing. During on-demand parsing, the core operation is how to skip unnecessary JSON objects or arrays. sonic-cpp takes full advantage of the complete JSON object The number of opening parentheses must be equal to the number of closing parenthesesThis feature uses SIMD to read the 64-byte JSON field to get the bitmap of left and right brackets. Further, count the number of left and right brackets in object, and finally determine the end position of object by comparing the number of left and right brackets. The specific operation is as follows:

After full-scenario testing, sonic-cpp’s on-demand parsing is significantly better than existing implementations. The performance test results are shown in the figure below. Among them, rapidjson-sax is implemented based on the SAX interface of rapidjson, and uses the recursive descent method to achieve on-demand analysis. The on-demand parsing of simdjson is implemented based on two-stage processing. Normal, Fronter, and NotFoud respectively indicate that when parsing on demand, the position of the target field in the JSON is centered, front or absent. However, when using the on-demand parsing of sonic-cpp and simdjson, it is necessary to ensure that the input JSON is correct and legal.

Parsing extensions on demand

sonic-cpp leverages SIMD lookahead for efficient on-demand parsing. Inside ByteDance, this technique can also be applied to the merge operation of two JSONs. When merging JSON, it is usually necessary to parse two JSONs first, and then deserialize after merging. However, if there are fewer fields that need to be merged in the two JSONs, you can use the idea of on-demand parsing, first parse the values of each field into raw JSON format, and then perform the merge operation. This way, the parsing and serialization overhead during JSON merging can be greatly reduced.

DOM design optimization

B

In sonic-cpp, the class that represents a JSON value is called a node. Node uses a common method to combine type and size information into one, using only 8 bytes to reduce memory usage. For each node, only 16 bytes are needed in memory, and the layout is more compact. The specific structure is as follows:

DOM tree design

The DOM data structure of sonic-cpp adopts the implementation similar to rapidjson, which can add, delete, check and modify all nodes including array or object.

In the design of DOM, sonic-cpp organizes the members of object and array in an array to ensure its continuity in memory. The array method makes sonic-cpp more efficient in randomly accessing array members. For object, sonic-cpp saves a map in the meta data. The index corresponding to the key and value is saved in the map. Through this map, the complexity of searching is reduced from O(N) to O(logN). sonic-cpp has made certain optimizations for this map:

memory pool

The memory allocator provided by sonic-cpp uses the memory pool for memory allocation by default. This allocator is from rapidjson. Using memory pools has the following advantages:

Avoid frequent malloc. The node under the DOM is only 16 bytes, and the memory pool can be used to efficiently allocate memory for these small data structures.
To avoid destroying every node on the DOM, it is only necessary to release the memory of the allocator uniformly when the DOM tree is destroyed.

The built-in map of Object also uses the memory pool to allocate memory, so that the memory can be allocated and released uniformly.

On the basis of supporting efficient CRUD, its performance is comparable to simdjson and yyjson.

Performance comparison of different JSON libraries

The benchmark test supports sonic-cpp and yyjson on the basis of https://github.com/miloyip/nativejson-benchmark, and the test is obtained.

Deserialization (Parse) performance benchmark results

Serialization (Stringify) performance benchmark results:

Performance comparison of different scenarios

Performance comparison between sonic-cpp and rapidjson, simdjson and yyjson in different scenarios (HIB: Higher is better).

Performance comparison in production environment

In the actual production environment, the performance advantages of sonic-cpp have also been very well verified. The following is a comparison of a Bytedance Douyin service using sonic-cpp before and after peak CPU usage.

sonic-cpp currently only supports the amd64 architecture, and will gradually expand to other architectures such as ARM in the future. At the same time, we will actively support the features of JSON-related RFCs, for example, support the community’s JSON merge-related RFC 7386, design JSON Path according to RFC 8259 to achieve more convenient JSON access operations, etc.

Developers are welcome to join in and contribute PR, and together create a better C/C++ JSON library in the industry!

In order to help you better understand sonic-cpp, we will share the technical principles, practical effects and future plans of sonic-cpp live with you in “Nuggets Open Class 18” at 19:30 on December 15, 2022. Participate in the live broadcast interaction and have a chance to win surrounding gifts! There are many gifts, everyone is welcome to pay attention and scan the QR code below to make an appointment for live broadcast.

Live interactive gift picture

#times #performance #Byte #Open #Source #High #Performance #JSON #Library #soniccpp #News Fast Delivery