Author: JD Logistics Lang Yuanhui
background
As a control system in the time-limited domain, the Promise time-effective order control system provides services before and after the user places an order. It is an important node on the golden link for the user to place an order; the main logic of the order control system is for user requests Find the optimal rule that meets the conditions from the rule base, and return the timeliness control result of the rule to the client. For example, due to temporary epidemics and other reasons, fine-grained timeliness is implemented for different dimensions such as warehouses, distribution, merchants, and customer four-level addresses. control.
This system is also the system with the largest amount of concurrency on the Promise side. The TPS of the double 11 peak cluster traffic is at the million level, which requires very high performance of the system, and the SLA requirement is within 5ms. Therefore,How to quickly and correctly match rules in the rule base (hundreds of thousands) of massive requests is the technical challenge of the system.
simple solution
According to the simple idea, in engineering construction, the rule library is cached to Redis line by line in an asynchronous manner, Key is the rule condition, and Value is the corresponding result of the rule; when the user requests, the request Request(a,b,c,d ..) to do a full combination of parameters, try to find out all possible hit rules according to the Key of the whole combination, and then select the optimal rule from them.As shown below
The problem faced by this scheme is that the time complexity of the full combination is 2**n, n≈12;The time complexity of the algorithm is high and the algorithm stability is poor, in the worst case, a request requires 4096 calculation and read operations. Of course, we can use local cache to do some optimization in engineering, but it cannot solve the most fundamental performance problem. The schematic diagram of the architecture is as follows:
new solution
The above solution looks at matching and positioning from the perspective of rows, and each column of the row that can be hit must also meet the conditions, and there is some kind of vague internal connection. Can we think about this problem in reverse, so we try to implement a new solution. Of course, the schematic diagram of the architecture is still as shown in the above figure, and the core optimization is the hit algorithm.
Adoption of the new scheme as a wholeColumn Inverted Index and Inverted Index Bitwise Operationsin such a way thatThe computational complexity is changed from the original 2n is reduced to n**, and the stability of the algorithm is very well guaranteed. The inverted index of the column is to establish a KV relationship between the value of each column and the distributed row ID (that is, the Posting List), and the bit operation of the inverted index is to perform the bit operation between columns on the inverted index of the column that meets the conditions, that is, through Combined query to quickly find eligible rule rows.
Algorithm detailed design
1. Precompute the inverted index and bitmap of the generated columns
The Posting List is generated by grouping and merging the values of each column, and the KV relationship between the column value and the Posting List is established. Take the following figure as an example. The inverted index that can be generated for column A is: 301={1}, 201={2,3,4,5}, etc. It needs to be explained that null values are also a candidate and need to be generated KV relationship, such as nil={7}.
2. Generate the bitmap corresponding to the inverted index of the column
Convert the inverted index in step 1 into a bitmap to facilitate subsequent bitmap calculations. The conversion rule is the subscript of the bitmap corresponding to the row ID (steps 1 and 2 can be combined).
3. Find the column bitmap according to the user’s request, and generate candidate rule sets through bitmap calculation
Use the input parameter in the user request as the Key to find the bitmap that meets the conditions, perform the || operation on each column and the null value, and finally perform the & operation on the inter-column bitmap, and the result is a candidate rule set, as shown in the figure below Shown:
4. From the candidate rule base, sort according to business priority and find the optimal rule
Based on the candidate rules, sort according to the priority of the business, perform level-by-level bit operations &, when the traversal is completed or the bit operation is 0, the optimal rule is found to be the last one that is not empty. This process gradually narrows down from the candidate rule base Optimal scope of the process. It needs to be explained that when the bitmap requested by the user does not exist for a certain column, the corresponding empty bitmap needs to be used for participation. Taking column B as an example, the input parameter B_1102 does not exist, and B_nil needs to be used for participation&.
Complexity Analysis
From the above example, we can see that when searching for candidate rule sets in terms of time complexity, a round of || operation and a round of & operation are performed; when searching for the optimal rule, a round of & operation is performed, soThe overall complexity is 3n≈n.
In terms of space complexity, compared with the original row-based storage, the inverted index storage method needs to store the row ID for each column, which is equivalent to more *(n-1)Posting List storage spaceof course, this is a rough calculation, because in fact the storage of the row ID is finally converted to a bitmap storage, which has a very large space for compression.
Engineering Questions – Compressing Bitmaps
If the inverted index bitmap is very sparse, the system will waste a lot of space. Let’s take an extreme case. If the row ID hit in the tens of millions of rule bases is the 10 millionth bit, it needs to consume 1.2MB of space for storage in the traditional way of BitSet, which is a serious waste of memory. Is there a compression optimization scheme? In the RoaringBitMap compressed bitmap solution, we found that the same scene occupies only 144 bytes in the compressed bitmap mode; even in the 10 million bitmap space, we randomly store 10,000 values, and the ratio between the two is 31K vs 2MB.nearly 100 times differencein general RoaringBitMap compression rate is very large.
RoaringBitMap essentially splits a large bitmap into small blocks, each of which exists only when it needs to store data, so when performing intersection or union operations, RoaringBitMap only needs to calculate the existing blocks It does not need to calculate the entire block like bitmap, which not only achieves compressed storage but also improves computing performance.
The following figure is 821697800 as an example, the corresponding hexadecimal number is 30FA1D08, in which the upper 16 bits are 30FA, and the lower 16 bits are 1D08. First use binary search to find a container with a value of 30FA from the first-level index (ie, Container Array), which is a Bitmap container, and then search for the lower 16-bit value 1D08 in the container, which is 7432 in decimal, and find the corresponding value in the Bitmap position, set it to 1.
Applicable scene analysis
Looking back at the above design scheme, we can see that this method is only suitable for PostingList as simple as a row ID. If it is a complex object, it is not suitable to use a bitmap to store it. In addition, it is only applicable to the equivalent query, not to the range query of like and in. Why is there such a limitation? Because this method depends on the search condition space, in the scheme we use the value condition as the search key, and the value condition space is expected to be as limited, convenient, and small as possible. The range query makes this space difficult to exhaust and expand almost infinitely, so it is not applicable.
Other optimization methods
In addition to using bit operations to speed up the inverted index, considering the orderliness of the Posting List, there are other methods such as using jump tables, Hash tables, etc. Taking the jump table used in ES as an example, the & operation is actually performed It is to find the common part of two ordered Posting Lists, and control the time complexity at the log(n) level in the form of mutual binary search.
For details, see How does the industry use jump tables, hash tables, and bitmaps to accelerate inverted indexes?
#Practice #Inverted #Index #Bitmap #Computing #Millions #Concurrent #Scenarios #Cloud #Developers #Personal #Space #News Fast Delivery