Priority Inversion Those Things- ByteDance Terminal Technology’s Personal Space- News Fast Delivery

Start with an online question

I recently came across some online[HMDConfigManager remoteConfigWithAppID:]Stuck

initial analysis

Observed the main thread stack, the lock used is a read-write lockThen I went through the sub-threads holding the lock. There are various situations, and they are basically in the normal execution state. For example, some are in the state of opening files, and some are in the state ofreadstate, some are executingNSUserDefaultsMethods…Through observation, it is found that the threads with problems haveQOS:BACKGROUNDmark. Overall, it seems that the child thread holding the lock is still executing, but the time left for the main thread is not enough. Why do these sub-threads need to execute for so long while holding the lock, until the 8s of the main thread are stuck? One situation is that it is really time-consuming, and the other is that there is a priority inversion.

Solution

In this case, the thread that holds the read-write lock and has a low priority cannot be scheduled for a long time (or it is preempted when it is scheduled, or the time is not enough when it is scheduled), and the thread with high priority The threads have been blocked because they cannot get the read-write lock, so they deadlock each other.iOS8later introducedQualityOfServiceThe concept is similar to the priority of the thread, setting differentQualityOfServiceAfter the value of the system will assign a differentCPUTime, network resources and hard disk resources, etc., so we can set the priority of the queue through this.

Option 1: remove the pair`NSOperationQueue`Priority settings for

In the Threading Programming Guide document, Apple gives a hint:

important: It is generally a good idea to leave the priorities of your threads at their default values. Increasing the priorities of some threads also increases the likelihood of starvation among lower-priority threads. If your application contains high-priority and low-priority threads Must interact with each other, the starvation of lower-priority threads may block other threads and create performance bottlenecks.

Apple’s suggestion is not to modify the priority of threads arbitrarily, especially when there is critical resource competition between these high and low priority threads. So delete the relevant priority setting code to solve the problem.

Solution 2: Temporarily modify thread priority

The following hints were found in pthread_rwlock_rdlock(3pthread):

Realtime applications may encounter priority inversion when using read-write locks. The problem occurs when a high priority thread “locks” a read-write lock that is about to be “unlocked” by a low priority thread, but the low priority thread is preempted by a medium priority thread. This scenario leads to priority inversion. ; a high priority thread is blocked by lower priority threads for an unlimited period of time. During system design, realtime programmers must take into account the possibility of this kind of priority inversion. They can deal with it in a number of ways, such as by having critical sections that are guarded by read-write locks execute at a high priority, so that a thread cannot be preempted while executing in its critical section.

Although aimed at real-time systems, there are some hints and help. According to the prompt, the problematic code has been modified:in the thread throughpthread_rwlock_wrlockgot_rwlockWhen the time, temporarily raise its priority, in the release_rwlockAfter that, restore its original priority.


- (id)remoteConfigWithAppID:(NSString *)appID
{
    .......
    pthread_rwlock_rdlock(&_rwlock);
    HMDHeimdallrConfig *result = ....... // get existing config
    pthread_rwlock_unlock(&_rwlock);
    
    if(result == nil) {
        result = [[HMDHeimdallrConfig alloc] init]; // make a new config
        pthread_rwlock_wrlock(&_rwlock);
        
        qos_class_t oldQos = qos_class_self();
        BOOL needRecover = NO;
        
        // 临时提升线程优先级
        if (_enablePriorityInversionProtection && oldQos < QOS_CLASS_USER_INTERACTIVE) {
            int ret = pthread_set_qos_class_self_np(QOS_CLASS_USER_INTERACTIVE, 0);
            needRecover = (ret == 0);
        }
            
        ......

        pthread_rwlock_unlock(&_rwlock);
        
        // 恢复线程优先级
        if (_enablePriorityInversionProtection && needRecover) {
            pthread_set_qos_class_self_np(oldQos, 0);
        }
    }
    
    return result;
}

It is worth noting that only thepthreadofapi,NSThreadwhich providedAPIis not feasible

Demo Verification

In order to verify whether the above-mentioned manual adjustment of thread priority has a certain effect, here is passeddemoRunning a local experiment: defines the2000indivualoperation(the purpose is toCPUbusy), priority settingNSQualityOfServiceUserInitiatedand for which it can be100divisibleoperationThe priority of theNSQualityOfServiceBackgroundin eachoperationPerform the same time-consuming task, and then apply the selected10indivualoperationTime-consuming statistics.


for (int j = 0; j < 2000; ++j) {
    NSOperationQueue *operation = [[NSOperationQueue alloc] init];
    operation.maxConcurrentOperationCount = 1;
    operation.qualityOfService = NSQualityOfServiceUserInitiated;
    
    // 模块1
    // if (j % 100 == 0) {
    //    operation.qualityOfService = NSQualityOfServiceBackground;
    // }
    // 模块1
    
    [operation addOperationWithBlock:^{
        // 模块2
        // qos_class_t oldQos = qos_class_self();
        // pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0);
        // 模块2
        
        NSTimeInterval start = CFAbsoluteTimeGetCurrent();
        double sum = 0;
        for (int i = 0; i < 100000; ++i) {
            sum += sin(i) + cos(i) + sin(i*2) + cos(i*2);
        }
        start = CFAbsoluteTimeGetCurrent() - start;
        if (j % 100 == 0) {
            printf("%.8f\n", start * 1000);
        }
        
        // 模块2
        // pthread_set_qos_class_self_np(oldQos, 0);
        // 模块2
    }];
}

The statistical information is shown in the figure below

A	B	C
(Comment Module 1 and Module 2 code)	(Only open module 1 code)	(open module 1 and module 2 codes at the same time)
11.8190561	94.70210189	15.04005137

can be seen

Under normal circumstances, the average time spent on each task is:11.8190561;
whenoperationWhen it is set to low priority, its time consumption is greatly increased to:94.70210189;
whenoperationis set to low priority, and in theBlockManually restore its original priority, and its time-consuming has been greatly reduced:15.04005137 ( The time-consuming is higher than normal, you can think about why)

passDemoIt can be found that by manually adjusting its priority, the overall time-consuming of low-priority tasks is greatly reduced, so that when the lock is held, the blocking time for the main thread can be reduced.

Online effect

The verification process for this problem is divided into2stages:

The first stage is shown in the first red box, from3moon6No. starts in version19.7There is a relatively large drop in the main reason: the queue information being waited in the stack is determined byQOS:BACKGROUNDbecamecom.apple.root.default-qosthe priority of the queue is fromQOS_CLASS_BACKGROUNDpromoted toQOS_CLASS_DEFAULTwhich is equivalent to the implementation of Scheme 1, using the default priority.
The second stage is as2As shown in the red box, from4moon24number in version20.3to start verification. At present, it seems that the effect is not obvious for the time being. It is speculated that one of the main reasons is:demois to change the priority fromQOS_CLASS_BACKGROUNDpromoted toQOS_CLASS_USER_INITIATEDwhile online is equivalent to changing the priority of the queue from the default priorityQOS_CLASS_DEFAULTpromoted toQOS_CLASS_USER_INITIATEDSo relatively speaking, online improvements are relatively limited.
1. QOS_CLASS_BACKGROUNDofMachThe level priority number is 4;
2. QOS_CLASS_DEFAULTofMachThe level priority number is 31;
3. QOS_CLASS_USER_INITIATEDofMachThe level priority number is 37;

Deep understanding of priority inversion

So do all locks need to manually increase the priority of the thread holding the lock as above? Will the system automatically adjust the thread’s priority? If there is such a mechanism, is it possible to override all locks? Understanding these issues requires a deep understanding of priority inversion.

What is priority inversion?

Priority inversion means that a synchronization resource is owned by a lower-priority process/thread, and a higher-priority process/thread competes for the synchronization resource and fails to obtain the resource, so that the higher-priority process/thread is delayed The phenomenon of being scheduled for execution.According to the different types of blocking, priority inversion is divided intoBounded priority inversionandUnbounded priority inversion. Here is a schematic diagram with the help of Introduction to RTOS – Solution to Part 11 (Priority Inversion).

Bounded priority inversion

As shown, high priority tasks (Task H) is held by a low-priority task with a lock (Task L) blocking, because the blocking time depends on the time of the low-priority task in the critical section (the time of holding the lock), it is calledbounded priority inversion.if onlyTask Lhold the lock all the time,Task HIt will always be blocked, low priority tasks run in front of high priority tasks, and the priority is reversed.

The tasks here can also be understood as threads

Unbounded priority inversion

existTask LWhile holding the lock, if there is an intermediate priority task (Task M) interruptedTask L,frontboundedwill becomeunbounded,becauseTask Mjust preemptTask LofCPUit may block theTask Hany amount of time (Task Mmay be more than1indivual)

Priority inversion conventional solution ideas

currently resolvedUnbounded priority inversionHave2method: a so-called priority limit (priority ceiling protocol), the other is called priority inheritance (priority inheritance).

Priority ceiling protocol

In the priority limit scheme, the system associates each critical resource with 1 limit priority. When a task enters the critical area, the system will pass the limit priority to this task, making this task the highest priority; when the task exits the critical area, the system immediately restores its priority to normal, thus ensuring the system There will be no priority inversion. The value of the limit priority is determined by the maximum priority of all tasks requiring the critical resource.

As shown, the limit priority of the lock is 3.whenTask LWhile holding the lock, its priority will be raised to 3, andTask Hsame priority.This will preventTask M(priority 2) runs untilTask LandTask HThe lock is no longer needed.

Priority inheritance

In the priority inheritance scheme, the general principle is: when a high-priority task tries to acquire a lock, if the lock happens to be held by a low-priority task, the priority of the high-priority thread will be temporarily transferred to the owner of the lock. Low-priority threads, so that low-priority threads can execute faster and release synchronization resources, and then restore their original priority after releasing synchronization resources.

priority ceiling protocolandpriority inheritanceWhen the lock is released, the priority of the low-priority task is restored.Also note that the above2This method can only preventUnbounded priority inversionwithout preventingBounded priority inversion(Task Hhave to waitTask LIt can only be executed after the execution is completed, this reversal is unavoidable).

Can be avoided or diverted by the occurrence of the followingBounded priority inversion:

Reduce the execution time of the critical section, reduceBounded priority inversionThe time-consuming reversal of
Avoid using critical section resources that block high-priority tasks;
Use a queue exclusively to manage resources and avoid using locks.

Priority inheritance must be transitive.Take a chestnut: whenT1blocked byT2on the resources held, whileT2blocked againT3on a resource held.ifT1takes precedence overT2andT3the priority ofT3must passT2inheritT1priority.Otherwise, if another priority is higher thanT2andT3less thanT1the routT4will preemptT3triggering relative toT1priority inversion. Therefore, the priority that a thread inherits must be the highest priority of the thread that is directly or indirectly blocked.

How to avoid priority inversion?

QoS delivery

The iOS system mainly uses the following two mechanisms to switch between different threads (or queue) transfer between QoS:

Mechanism 1:dispatch_async
- dispatch_async() Automatically propagates the QoS from the calling thread, though it will translate User Interactive to User Initiated to avoid assigning that priority to non-main threads.
- Captured at time of block submission, translate user interactive to user initiated. Used if destination queue does not have a QoS and does not lower the QoS (ex dispatch_async back to the main thread)
Mechanism 2: XPC-based interprocess communication (IPC)

The QoS transmission rules of the system are relatively complicated, mainly refer to the following information:

current thread’s QoS
if using dispatch_block_create() method generated dispatch_blockthen consider generating block The parameters called when
dispatch_async or IPC The goal queue or threaded QoS

The scheduler will use this information to decide block With what priority to run.

If no other threads are synchronously waiting on this block,but block Just run according to the priority mentioned above.
如果出现了线程间同步等待的情况，则调度程序会根据情况调整线程的运行优先级。

How to trigger the priority inversion avoidance mechanism?

If the current thread is waiting for an ongoing operation on a thread (thread 1) (such as block1) while the system knows block1 where the target thread (owner), the system will solve the problem of priority inversion by increasing the priority of the relevant thread.Conversely, if the system does not know block1 If the target thread is located, it is impossible to know whose priority should be increased, and the inversion problem cannot be solved;

Holder information is recorded (owner) system API is as follows:

pthread mutex,os_unfair_lockand the upper layer API based on these two implementations
1. dispatch_once The implementation is based on os_unfair_lock of
2. NSLock,NSRecursiveLock,@synchronized The implementation of etc. is based on pthread mutex
dispatch_sync,dispatch_wait
xpc_connection_send_with_message_sync

use the above API Ability to enable the system to enable the priority inversion avoidance mechanism when a priority inversion occurs.

Basic API Validation

Next, the various “basic systems” mentioned aboveAPI“authenticating

Test verification environment: Simulator iOS15.2

pthread mutex

pthread mutexdata structurepthread_mutex_sone of themm_tidfield, specifically to record the thread holding the lockId.


// types_internal.h
struct pthread_mutex_s {
        long sig;
        _pthread_lock lock;
        union {
                uint32_t value;
                struct pthread_mutex_options_s options;
        } mtxopts;
        int16_t prioceiling;
        int16_t priority;
#if defined(__LP64__)
        uint32_t _pad;
#endif
        union {
                struct {
                        uint32_t m_tid[2]; // thread id of thread that has mutex locked
                        uint32_t m_seq[2]; // mutex sequence id
                        uint32_t m_mis[2]; // for misaligned locks m_tid/m_seq will span into here
                } psynch;
                struct _pthread_mutex_ulock_s ulock;
        };
#if defined(__LP64__)
        uint32_t _reserved[4];
#else
        uint32_t _reserved[1];
#endif
};

Code to verify: Will the thread priority be increased?


// printThreadPriority用来打印线程的优先级信息
void printThreadPriority() {
  thread_t cur_thread = mach_thread_self();
  mach_port_deallocate(mach_task_self(), cur_thread);
  mach_msg_type_number_t thread_info_count = THREAD_INFO_MAX;
  thread_info_data_t thinfo;
  kern_return_t kr = thread_info(cur_thread, THREAD_EXTENDED_INFO, (thread_info_t)thinfo, &thread_info_count);
  if (kr != KERN_SUCCESS) {
    return;
  }
  thread_extended_info_t extend_info = (thread_extended_info_t)thinfo;
  printf("pth_priority: %d, pth_curpri: %d, pth_maxpriority: %d\n", extend_info->pth_priority, extend_info->pth_curpri, extend_info->pth_maxpriority);
}

First lock and sleep on the child thread, and then the main thread requests the lock


dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
  printf("begin : \n");
  printThreadPriority();
  printf("queue before lock \n");
  pthread_mutex_lock(&_lock); //确保 backgroundQueue 先得到锁
  printf("queue lock \n");
  printThreadPriority();
  dispatch_async(dispatch_get_main_queue(), ^{
    printf("before main lock\n");
    pthread_mutex_lock(&_lock);
    printf("in main lock\n");
    pthread_mutex_unlock(&_lock);
    printf("after main unlock\n");
  });
  sleep(10);
  printThreadPriority();
  printf("queue unlock\n");
  pthread_mutex_unlock(&_lock);
  printf("queue after unlock\n");
});


begin : 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock 
queue lock 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
queue unlock
in main lock
after main unlock
queue after unlock

It can be seen that the low-priority child thread holds the lock first, and the priority at that time is4and when the lock is requested by the main thread, the priority of the child thread is raised to47

os_unfair_lock

os_unfair_lockto replaceOSSpinLock, to solve the priority inversion problem.waitos_unfair_lockThe locked thread will be in a dormant state, switching from user mode to kernel mode, rather than busy waiting.os_unfair_lockthreadIDSaved inside the lock, the lock waiter will give up his priority, thus avoiding priority inversion. Verify it:


dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
    printf("begin : \n");
    printThreadPriority();
    printf("queue before lock \n");
    os_unfair_lock_lock(&_unfair_lock); //确保 backgroundQueue 先得到锁
    printf("queue lock \n");
    printThreadPriority();
    dispatch_async(dispatch_get_main_queue(), ^{
      printf("before main lock\n");
      os_unfair_lock_lock(&_unfair_lock);
      printf("in main lock\n");
      os_unfair_lock_unlock(&_unfair_lock);
      printf("after main unlock\n");
    });
    sleep(10);
    printThreadPriority();
    printf("queue unlock\n");
    os_unfair_lock_unlock(&_unfair_lock);
    printf("queue after unlock\n");
  });


begin : 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock 
queue lock 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
queue unlock
in main lock
after main unlock
queue after unlock

result andpthread mutexunanimous

pthread_rwlock_t

exist pthread_rwlock_init There are the following tips:

Caveats: Beware of priority inversion when using read-write locks. A high-priority thread may be blocked waiting on a read-write lock locked by a low-priority thread. The microkernel has no knowledge of read-write locks, and therefore can’t boost the low- priority thread to prevent the priority inversion.

The general idea is that the kernel does not perceive read-write locks, and cannot increase the priority of low-priority threads, so priority inversion cannot be avoided. Discovered by query definition:pthread_rwlock_scontains the fieldrw_tidspecifically to record the thread holding the write lock, which is curious: whypthread_rwlock_sHaveownerinformation but still can’t avoid priority inversion?


struct pthread_rwlock_s {
        long sig;
        _pthread_lock lock;
        uint32_t
                unused:29,
                misalign:1,
                pshared:2;
        uint32_t rw_flags;
#if defined(__LP64__)
        uint32_t _pad;
#endif
        uint32_t rw_tid[2]; // thread id of thread that has exclusive (write) lock
        uint32_t rw_seq[4]; // rw sequence id (at 128-bit aligned boundary)
        uint32_t rw_mis[4]; // for misaligned locks rw_seq will span into here
#if defined(__LP64__)
        uint32_t _reserved[34];
#else
        uint32_t _reserved[18];
#endif
};

The https://news.ycombinator.com/item?id=21751269 link mentions:

xnu supports priority inheritance through “turnstiles“, a kernel-internal mechanism which is used by default by a number of locking primitives (list at [1]), including normal pthread mutexes (though not read-write locks [2]), as well as the os_unfair_lock API (via the ulock syscalls). With pthread mutexes, you can actually explicitly request priority inheritance by calling pthread_mutexattr_setprotocol [3] with PTHREAD_PRIO_INHERIT; the Apple implementation supports it, but currently ignores the protocol setting and just gives all mutexes priority inheritance.

The effect is:XNUuseturnstilesKernel mechanism for priority inheritance, this mechanism is applied inpthread mutexandos_unfair_locksuperior.

follow the vine, inksyn_waitfound in the method_kwq_use_turnstileThe call, where the comment is more euphemistic for the interpretation of the read-write lock, addedat least sometimes

pthread mutexes and rwlocks both (at least sometimes) know their owner and can use turnstiles. Otherwise, we pass NULL as the tstore to the shims so they wait on the global waitq.


// libpthread/kern/kern_synch.c
int
ksyn_wait(ksyn_wait_queue_t kwq, kwq_queue_type_t kqi, uint32_t lockseq,
                int fit, uint64_t abstime, uint16_t kwe_flags,
                thread_continue_t continuation, block_hint_t block_hint)
{
        thread_t th = current_thread();
        uthread_t uth = pthread_kern->get_bsdthread_info(th);
        struct turnstile **tstore = NULL;
        int res;

        assert(continuation != THREAD_CONTINUE_NULL);

        ksyn_waitq_element_t kwe = pthread_kern->uthread_get_uukwe(uth);
        bzero(kwe, sizeof(*kwe));
        kwe->kwe_count = 1;
        kwe->kwe_lockseq = lockseq & PTHRW_COUNT_MASK;
        kwe->kwe_state = KWE_THREAD_INWAIT;
        kwe->kwe_uth = uth;
        kwe->kwe_thread = th;
        kwe->kwe_flags = kwe_flags;

        res = ksyn_queue_insert(kwq, kqi, kwe, lockseq, fit);
        if (res != 0) {
                //panic("psynch_rw_wrlock: failed to enqueue\n"); // XXX                ksyn_wqunlock(kwq);
                return res;
        }

        PTHREAD_TRACE(psynch_mutex_kwqwait, kwq->kw_addr, kwq->kw_inqueue,
                        kwq->kw_prepost.count, kwq->kw_intr.count);

        if (_kwq_use_turnstile(kwq)) {
                // pthread mutexes and rwlocks both (at least sometimes) know their                
                // owner and can use turnstiles. Otherwise, we pass NULL as the                
                // tstore to the shims so they wait on the global waitq.                
                tstore = &kwq->kw_turnstile;
        }
        ......
}

check again_kwq_use_turnstileBy definition, the code is still very honest, only inKSYN_WQTYPE_MTXwill be enabledturnstilePerform priority inversion protection, and the type of read-write lock isKSYN_WQTYPE_RWLOCKwhich means that the read-write lock will not be used_kwq_use_turnstileso priority inversion cannot be avoided.


#define KSYN_WQTYPE_MTX         0x01
#define KSYN_WQTYPE_CVAR        0x02
#define KSYN_WQTYPE_RWLOCK      0x04
#define KSYN_WQTYPE_SEMA        0x08

static inline bool
_kwq_use_turnstile(ksyn_wait_queue_t kwq)
{
        // If we had writer-owner information from the
        // rwlock then we could use the turnstile to push on it. For now, only
        // plain mutexes use it.
        return (_kwq_type(kwq) == KSYN_WQTYPE_MTX);
}

Also in_pthread_find_ownerYou can also see that the read-write lockownerYes0


void
_pthread_find_owner(thread_t thread,
                struct stackshot_thread_waitinfo * waitinfo)
{
        ksyn_wait_queue_t kwq = _pthread_get_thread_kwq(thread);
        switch (waitinfo->wait_type) {
                case kThreadWaitPThreadMutex:
                        assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_MTX);
                        waitinfo->owner  = thread_tid(kwq->kw_owner);
                        waitinfo->context = kwq->kw_addr;
                        break;
                /* Owner of rwlock not stored in kernel space due to races. Punt
                 * and hope that the userspace address is helpful enough. */
                case kThreadWaitPThreadRWLockRead:
                case kThreadWaitPThreadRWLockWrite:
                        assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_RWLOCK);
                        waitinfo->owner  = 0;
                        waitinfo->context = kwq->kw_addr;
                        break;
                /* Condvars don't have owners, so just give the userspace address. */
                case kThreadWaitPThreadCondVar:
                        assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_CVAR);
                        waitinfo->owner  = 0;
                        waitinfo->context = kwq->kw_addr;
                        break;
                case kThreadWaitNone:
                default:
                        waitinfo->owner = 0;
                        waitinfo->context = 0;
                        break;
        }
}

Replace the lock with a read-write lock to verify that the previous theory is correct:


pthread_rwlock_init(&_rwlock, NULL);
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
  printf("begin : \n");
  printThreadPriority();
  printf("queue before lock \n");
  pthread_rwlock_rdlock(&_rwlock); //确保 backgroundQueue 先得到锁
  printf("queue lock \n");
  printThreadPriority();
  dispatch_async(dispatch_get_main_queue(), ^{
    printf("before main lock\n");
    pthread_rwlock_wrlock(&_rwlock);
    printf("in main lock\n");
    pthread_rwlock_unlock(&_rwlock);
    printf("after main unlock\n");
  });
  sleep(10);
  printThreadPriority();
  printf("queue unlock\n");
  pthread_rwlock_unlock(&_rwlock);
  printf("queue after unlock\n");
});


begin : 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock 
queue lock 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue unlock
queue after unlock
in main lock
after main unlock

It can be seen that the priority of the read-write lock will not be increased

dispatch_sync

thisAPII am familiar with it, here is a direct verification:


// 当前线程为主线程
dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_BACKGROUND, 0);
_queue = dispatch_queue_create("com.demo.test", qosAttribute);
printThreadPriority();
dispatch_async(_queue, ^{
    printf("dispatch_async before dispatch_sync : \n");
    printThreadPriority();
});
dispatch_sync(_queue, ^{
    printf("dispatch_sync: \n");
    printThreadPriority();
});
dispatch_async(_queue, ^{
    printf("dispatch_async after dispatch_sync: \n");
    printThreadPriority();
});


pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63 
dispatch_async before dispatch_sync : 
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_sync: 
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_async after dispatch_sync: 
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63

_queueis a low priority queue (QOS_CLASS_BACKGROUND),can be seendispatch_syncCall the task that was pushed into the queue, and before thatdispatch_asyncPushed tasks are promoted to higher priority47(consistent with the main thread), and the lastdispatch_asynctasks are prioritized4to execute.

dispatch_wait


// 当前线程为主线程
dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_BACKGROUND, 0);
_queue = dispatch_queue_create("com.demo.test", qosAttribute);
printf("main thread\n");
printThreadPriority();
dispatch_block_t block = dispatch_block_create(DISPATCH_BLOCK_INHERIT_QOS_CLASS, ^{
    printf("sub thread\n");
    sleep(2);
    printThreadPriority();
});
dispatch_async(_queue, block);
dispatch_wait(block, DISPATCH_TIME_FOREVER);

_queueis a low priority queue (QOS_CLASS_BACKGROUND), when used in the current main threaddispatch_waitWhen waiting, the output is as follows, low priority tasks are promoted to priority47


main thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
sub thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63

And if willdispatch_wait(block, DISPATCH_TIME_FOREVER)After commenting out, the output is as follows:


main thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
sub thread
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63

It is worth noting that,dispatch_waitis a macro (C11generics), or an entry-point function that acceptsdispatch_block_t,dispatch_group_t,dispatch_semaphore_t 3Types of parameters, but the specific meaning here should refer todispatch_block_wait,onlydispatch_block_waitPriorities are adjusted to avoid priority inversion.


intptr_t
dispatch_wait(void *object, dispatch_time_t timeout);
#if __has_extension(c_generic_selections)
#define dispatch_wait(object, timeout) \
                _Generic((object), \
                        dispatch_block_t:dispatch_block_wait, \
                        dispatch_group_t:dispatch_group_wait, \
                        dispatch_semaphore_t:dispatch_semaphore_wait \
                )((object),(timeout))
#endif

mysterious semaphore

`dispatch_semaphore`

beforedispatch_semaphoreThe cognition is very shallow, and often equates binary semaphores and mutexes. However, after research, it was found that:dispatch_semaphore No QoS The concept of not recording the thread currently holding the semaphore (owner), so when there are high-priority threads waiting for the lock, the kernel has no way of knowing which thread’s debug priority to raise (QoS). If the lock holder has a lower priority than other threads, the higher-priority waiting thread will wait forever. Mutex vs Semaphore: What’s the Difference? A detailed comparisonMutexandSemaphoredifference between.

Semaphores are for signaling (same a condition variables, events) while mutexes are for mutual exclusion. Technically, you can also use semaphores for mutual exclusion (a mutex can be thought as a binary semaphore) but you really shouldn’t.Right, but libdispatch doesn’t have a mutex. It has semaphores and queues. So if you’re trying to use libdispatch and you don’t want the closure-based aspect of queues, you might be tempted to use a semaphore instead. Don’t do that, use os_unfair_lock or pthread_mutex (or a higher-level construct like NSLock) instead.

These are some warnings, seedispatch_semaphoreIt is very dangerous and needs to be used with special care.

Here is an explanation through Apple’s official demo:


__block NSString *taskName = nil;
dispatch_semaphore_t sema = dispatch_semaphore_create(0); 
[self.connection.remoteObjectProxy requestCurrentTaskName:^(NSString *task) { 
     taskName = task; 
     dispatch_semaphore_signal(sema); 
}]; 
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER); 
return taskName;

Assuming that this code is executed on the main thread, the priority of the current thread isQOS_CLASS_USER_INTERACTIVE;
Since async is done from the main thread, the async task queue’sQoSwill be promoted toQOS_CLASS_USER_INITIATED;
The main thread is semaphoresemaBlocked, and the priority of the asynchronous task responsible for releasing the semaphoreQOS_CLASS_USER_INITIATEDLower priority than main threadQOS_CLASS_USER_INTERACTIVEso priority inversion may occur.

It is worth mentioning that,ClangStatic detection is made specifically for this case:

https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Checkers/GCDAtipatternChecker.cpp


static auto findGCDAntiPatternWithSemaphore() -> decltype(compoundStmt()) {

  const char *SemaphoreBinding = "semaphore_name";
  auto SemaphoreCreateM = callExpr(allOf(
      callsName("dispatch_semaphore_create"),
      hasArgument(0, ignoringParenCasts(integerLiteral(equals(0))))));

  auto SemaphoreBindingM = anyOf(
      forEachDescendant(
          varDecl(hasDescendant(SemaphoreCreateM)).bind(SemaphoreBinding)),
      forEachDescendant(binaryOperator(bindAssignmentToDecl(SemaphoreBinding),
                     hasRHS(SemaphoreCreateM))));

  auto HasBlockArgumentM = hasAnyArgument(hasType(
            hasCanonicalType(blockPointerType())
            ));

  auto ArgCallsSignalM = hasAnyArgument(stmt(hasDescendant(callExpr(
          allOf(
              callsName("dispatch_semaphore_signal"),
              equalsBoundArgDecl(0, SemaphoreBinding)
              )))));

  auto HasBlockAndCallsSignalM = allOf(HasBlockArgumentM, ArgCallsSignalM);

  auto HasBlockCallingSignalM =
    forEachDescendant(
      stmt(anyOf(
        callExpr(HasBlockAndCallsSignalM),
        objcMessageExpr(HasBlockAndCallsSignalM)
           )));

  auto SemaphoreWaitM = forEachDescendant(
    callExpr(
      allOf(
        callsName("dispatch_semaphore_wait"),
        equalsBoundArgDecl(0, SemaphoreBinding)
      )
    ).bind(WarnAtNode));

  return compoundStmt(
      SemaphoreBindingM, HasBlockCallingSignalM, SemaphoreWaitM);
}

To use this feature, just turn on thexcodeJust set it up:

in addition,dispatch_group and semaphore Similarly, when calling enter() method, it is impossible to predict who will call leave()so the system cannot know its ownerwho, so again there will be no issue of priority escalation.

The semaphore is stuck

dispatch_semaphoreThe author was very impressed. I wrote a piece of code like this before: using a semaphore to wait for the camera authorization result synchronously in the main thread.


__block BOOL auth = NO;
dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
[KTAuthorizeService requestAuthorizationWithType:KTPermissionsTypeCamera completionHandler:^(BOOL allow) {
  auth = allow;
  dispatch_semaphore_signal(semaphore);
}];
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);

Long-term occupation stuck after going onlinetop1, At that time, I was puzzled. After deeply understanding that the semaphore cannot avoid priority inversion, I finally suddenly realized that I swept away the haze in my heart.Such questions are generally2ways to solve:

use syncAPI


BOOL auth = [KTAuthorizeService authorizationWithType:KTPermissionsTypeCamera];
// do something next

Asynchronous callback, do not wait on the current thread


[KTAuthorizeService requestAuthorizationWithType:KTPermissionsTypeCamera completionHandler:^(BOOL allow) {
    BOOL auth = allow;
    // do something next via callback
}];

several concepts

turnstile

Mentioned aboveXNUuseturnstileFor priority inheritance, hereturnstileMechanisms are simply described and understood.existXNUIn the kernel, there are a large number of synchronization objects (such aslck_mtx_t), in order to solve the problem of priority inversion, each synchronization object must correspond to a separate data structure to maintain a large amount of information, such as the thread queue blocked on this synchronization object. As you can imagine, if each synchronization object has to allocate such a data structure, it will cause a huge waste of memory. To solve this problem,XNUAdoptedturnstilemechanism, a space-efficient solution. The proposal of this scheme is based on the fact that the same thread cannot be blocked on multiple synchronization objects at the same time.This fact allows all synchronization objects to only need to keep a pointer toturnstilepointer, and when needed to allocate aturnstilecan, andturnstileIt contains all the information needed to operate a synchronization object, such as the queue of blocked threads and the thread pointer that owns the synchronization object.turnstileIt is dynamically allocated from the pool, and the size of this pool will increase with the number of allocated threads in the system, soturnstileThe total will always be lower than or equal to the number of threads, which also determinesturnstileThe number is controllable.turnstileAllocated by the first thread blocked on the synchronization object, when no more threads are blocked on the synchronization object,turnstilewill be freed and recycled into the pool.turnstileThe data structure is as follows:


struct turnstile {
    struct waitq                  ts_waitq;              /* waitq embedded in turnstile */
    turnstile_inheritor_t         ts_inheritor;          /* thread/turnstile inheriting the priority (IL, WL) */
    union {
        struct turnstile_list ts_free_turnstiles;    /* turnstile free list (IL) */
        SLIST_ENTRY(turnstile) ts_free_elm;          /* turnstile free list element (IL) */
    };
    struct priority_queue_sched_max ts_inheritor_queue;    /* Queue of turnstile with us as an inheritor (WL) */
    union {
        struct priority_queue_entry_sched ts_inheritor_links;    /* Inheritor queue links */
        struct mpsc_queue_chain   ts_deallocate_link;    /* thread deallocate link */
    };
    SLIST_ENTRY(turnstile)        ts_htable_link;        /* linkage for turnstile in global hash table */
    uintptr_t                     ts_proprietor;         /* hash key lookup turnstile (IL) */
    os_refcnt_t                   ts_refcount;           /* reference count for turnstiles */
    _Atomic uint32_t              ts_type_gencount;      /* gen count used for priority chaining (IL), type of turnstile (IL) */
    uint32_t                      ts_port_ref;           /* number of explicit refs from ports on send turnstile */
    turnstile_update_flags_t      ts_inheritor_flags;    /* flags for turnstile inheritor (IL, WL) */
    uint8_t                       ts_priority;           /* priority of turnstile (WL) */

#if DEVELOPMENT || DEBUG
    uint8_t                       ts_state;              /* current state of turnstile (IL) */
    queue_chain_t                 ts_global_elm;         /* global turnstile chain */
    thread_t                      ts_thread;             /* thread the turnstile is attached to */
    thread_t                      ts_prev_thread;        /* thread the turnstile was attached before donation */
#endif
};

priority value

There are some priority values in the verification process, here is an explanation with the help of “Mac OS® X and iOS Internals”: the priority values involved in the experiment are relative toMachIn terms of layers, and are all user thread values

The priority of user threads is 0~63;
1. NSQualityOfServiceBackgroundofMachThe level priority number is 4;
2. NSQualityOfServiceUtilityofMachThe level priority number is 20;
3. NSQualityOfServiceDefaultofMachThe level priority number is 31;
4. NSQualityOfServiceUserInitiatedofMachThe level priority number is 37;
5. NSQualityOfServiceUserInteractiveofMachThe hierarchy priority is 47;
The priority of the kernel thread is 80~95;
The priority of the real-time system thread is 96~127;
64~79 are reserved for system use;

Summarize

This article mainly expounds some concepts and solutions of priority inversion, and combinesiOSSeveral locks of the platform have been investigated in detail. Through in-depth understanding, some unnecessary priority inversions can be avoided, so as to further avoid stuck exceptions.byte beating APMThe team also monitors the priority of threads to achieve the purpose of discovering and preventing priority inversion.

join us

The ByteDance APM middle platform is committed to improving the performance and stability of all products in the entire group. The technology stack covers iOS/Android/Server/Web/Hybrid/PC/Games/Mini Programs, etc. The work includes but is not limited to performance Stability monitoring, troubleshooting, in-depth optimization, anti-deterioration, etc. Long-term expectation is to output more and more constructive problem discovery and in-depth optimization methods for the industry.

Students who are interested in Byte APM team positions are welcome to send their resumes to xushuangqing@bytedance.com.

reference documents

Add a small assistant to reply[APM]to join the performance monitoring exchange group and get more technical dry goods

#Priority #Inversion #ByteDance #Terminal #Technologys #Personal #Space #News Fast Delivery