Start with an online question
I recently came across some online[HMDConfigManager remoteConfigWithAppID:]
Stuck
initial analysis
Observed the main thread stack, the lock used is a read-write lockThen I went through the sub-threads holding the lock. There are various situations, and they are basically in the normal execution state. For example, some are in the state of opening files, and some are in the state ofread
state, some are executingNSUserDefaults
Methods…Through observation, it is found that the threads with problems haveQOS:BACKGROUND
mark. Overall, it seems that the child thread holding the lock is still executing, but the time left for the main thread is not enough. Why do these sub-threads need to execute for so long while holding the lock, until the 8s of the main thread are stuck? One situation is that it is really time-consuming, and the other is that there is a priority inversion.
Solution
In this case, the thread that holds the read-write lock and has a low priority cannot be scheduled for a long time (or it is preempted when it is scheduled, or the time is not enough when it is scheduled), and the thread with high priority The threads have been blocked because they cannot get the read-write lock, so they deadlock each other.iOS8
later introducedQualityOfService
The concept is similar to the priority of the thread, setting differentQualityOfService
After the value of the system will assign a differentCPU
Time, network resources and hard disk resources, etc., so we can set the priority of the queue through this.
Option 1: remove the pairNSOperationQueue
Priority settings for
In the Threading Programming Guide document, Apple gives a hint:
important: It is generally a good idea to leave the priorities of your threads at their default values. Increasing the priorities of some threads also increases the likelihood of starvation among lower-priority threads. If your application contains high-priority and low-priority threads Must interact with each other, the starvation of lower-priority threads may block other threads and create performance bottlenecks.
Apple’s suggestion is not to modify the priority of threads arbitrarily, especially when there is critical resource competition between these high and low priority threads. So delete the relevant priority setting code to solve the problem.
Solution 2: Temporarily modify thread priority
The following hints were found in pthread_rwlock_rdlock(3pthread):
Realtime applications may encounter priority inversion when using read-write locks. The problem occurs when a high priority thread “locks” a read-write lock that is about to be “unlocked” by a low priority thread, but the low priority thread is preempted by a medium priority thread. This scenario leads to priority inversion. ; a high priority thread is blocked by lower priority threads for an unlimited period of time. During system design, realtime programmers must take into account the possibility of this kind of priority inversion. They can deal with it in a number of ways, such as by having critical sections that are guarded by read-write locks execute at a high priority, so that a thread cannot be preempted while executing in its critical section.
Although aimed at real-time systems, there are some hints and help. According to the prompt, the problematic code has been modified:in the thread throughpthread_rwlock_wrlock
got_rwlock
When the time, temporarily raise its priority, in the release_rwlock
After that, restore its original priority.
- (id)remoteConfigWithAppID:(NSString *)appID
{
.......
pthread_rwlock_rdlock(&_rwlock);
HMDHeimdallrConfig *result = ....... // get existing config
pthread_rwlock_unlock(&_rwlock);
if(result == nil) {
result = [[HMDHeimdallrConfig alloc] init]; // make a new config
pthread_rwlock_wrlock(&_rwlock);
qos_class_t oldQos = qos_class_self();
BOOL needRecover = NO;
// 临时提升线程优先级
if (_enablePriorityInversionProtection && oldQos < QOS_CLASS_USER_INTERACTIVE) {
int ret = pthread_set_qos_class_self_np(QOS_CLASS_USER_INTERACTIVE, 0);
needRecover = (ret == 0);
}
......
pthread_rwlock_unlock(&_rwlock);
// 恢复线程优先级
if (_enablePriorityInversionProtection && needRecover) {
pthread_set_qos_class_self_np(oldQos, 0);
}
}
return result;
}
It is worth noting that only the
pthread
ofapi
,NSThread
which providedAPI
is not feasible
Demo Verification
In order to verify whether the above-mentioned manual adjustment of thread priority has a certain effect, here is passeddemo
Running a local experiment: defines the2000
indivualoperation
(the purpose is toCPU
busy), priority settingNSQualityOfServiceUserInitiated
and for which it can be100
divisibleoperation
The priority of theNSQualityOfServiceBackground
in eachoperation
Perform the same time-consuming task, and then apply the selected10
indivualoperation
Time-consuming statistics.
for (int j = 0; j < 2000; ++j) {
NSOperationQueue *operation = [[NSOperationQueue alloc] init];
operation.maxConcurrentOperationCount = 1;
operation.qualityOfService = NSQualityOfServiceUserInitiated;
// 模块1
// if (j % 100 == 0) {
// operation.qualityOfService = NSQualityOfServiceBackground;
// }
// 模块1
[operation addOperationWithBlock:^{
// 模块2
// qos_class_t oldQos = qos_class_self();
// pthread_set_qos_class_self_np(QOS_CLASS_USER_INITIATED, 0);
// 模块2
NSTimeInterval start = CFAbsoluteTimeGetCurrent();
double sum = 0;
for (int i = 0; i < 100000; ++i) {
sum += sin(i) + cos(i) + sin(i*2) + cos(i*2);
}
start = CFAbsoluteTimeGetCurrent() - start;
if (j % 100 == 0) {
printf("%.8f\n", start * 1000);
}
// 模块2
// pthread_set_qos_class_self_np(oldQos, 0);
// 模块2
}];
}
The statistical information is shown in the figure below
A | B | C |
---|---|---|
(Comment Module 1 and Module 2 code) | (Only open module 1 code) | (open module 1 and module 2 codes at the same time) |
11.8190561 | 94.70210189 | 15.04005137 |
can be seen
- Under normal circumstances, the average time spent on each task is:11.8190561;
- when
operation
When it is set to low priority, its time consumption is greatly increased to:94.70210189; - when
operation
is set to low priority, and in theBlock
Manually restore its original priority, and its time-consuming has been greatly reduced:15.04005137 ( The time-consuming is higher than normal, you can think about why)
passDemo
It can be found that by manually adjusting its priority, the overall time-consuming of low-priority tasks is greatly reduced, so that when the lock is held, the blocking time for the main thread can be reduced.
Online effect
The verification process for this problem is divided into2
stages:
- The first stage is shown in the first red box, from
3
moon6
No. starts in version19.7
There is a relatively large drop in the main reason: the queue information being waited in the stack is determined byQOS:BACKGROUND
becamecom.apple.root.default-qos
the priority of the queue is fromQOS_CLASS_BACKGROUND
promoted toQOS_CLASS_DEFAULT
which is equivalent to the implementation of Scheme 1, using the default priority. - The second stage is as
2
As shown in the red box, from4
moon24
number in version20.3
to start verification. At present, it seems that the effect is not obvious for the time being. It is speculated that one of the main reasons is:demo
is to change the priority fromQOS_CLASS_BACKGROUND
promoted toQOS_CLASS_USER_INITIATED
while online is equivalent to changing the priority of the queue from the default priorityQOS_CLASS_DEFAULT
promoted toQOS_CLASS_USER_INITIATED
So relatively speaking, online improvements are relatively limited.QOS_CLASS_BACKGROUND
ofMach
The level priority number is 4;QOS_CLASS_DEFAULT
ofMach
The level priority number is 31;QOS_CLASS_USER_INITIATED
ofMach
The level priority number is 37;
Deep understanding of priority inversion
So do all locks need to manually increase the priority of the thread holding the lock as above? Will the system automatically adjust the thread’s priority? If there is such a mechanism, is it possible to override all locks? Understanding these issues requires a deep understanding of priority inversion.
What is priority inversion?
Priority inversion means that a synchronization resource is owned by a lower-priority process/thread, and a higher-priority process/thread competes for the synchronization resource and fails to obtain the resource, so that the higher-priority process/thread is delayed The phenomenon of being scheduled for execution.According to the different types of blocking, priority inversion is divided intoBounded priority inversion
andUnbounded priority inversion
. Here is a schematic diagram with the help of Introduction to RTOS – Solution to Part 11 (Priority Inversion).
Bounded priority inversion
As shown, high priority tasks (Task H
) is held by a low-priority task with a lock (Task L
) blocking, because the blocking time depends on the time of the low-priority task in the critical section (the time of holding the lock), it is calledbounded priority inversion
.if onlyTask L
hold the lock all the time,Task H
It will always be blocked, low priority tasks run in front of high priority tasks, and the priority is reversed.
The tasks here can also be understood as threads
Unbounded priority inversion
existTask L
While holding the lock, if there is an intermediate priority task (Task M
) interruptedTask L
,frontbounded
will becomeunbounded
,becauseTask M
just preemptTask L
ofCPU
it may block theTask H
any amount of time (Task M
may be more than1
indivual)
Priority inversion conventional solution ideas
currently resolvedUnbounded priority inversion
Have2
method: a so-called priority limit (priority ceiling protocol
), the other is called priority inheritance (priority inheritance
).
Priority ceiling protocol
In the priority limit scheme, the system associates each critical resource with 1 limit priority. When a task enters the critical area, the system will pass the limit priority to this task, making this task the highest priority; when the task exits the critical area, the system immediately restores its priority to normal, thus ensuring the system There will be no priority inversion. The value of the limit priority is determined by the maximum priority of all tasks requiring the critical resource.
As shown, the limit priority of the lock is 3.whenTask L
While holding the lock, its priority will be raised to 3, andTask H
same priority.This will preventTask M
(priority 2) runs untilTask L
andTask H
The lock is no longer needed.
Priority inheritance
In the priority inheritance scheme, the general principle is: when a high-priority task tries to acquire a lock, if the lock happens to be held by a low-priority task, the priority of the high-priority thread will be temporarily transferred to the owner of the lock. Low-priority threads, so that low-priority threads can execute faster and release synchronization resources, and then restore their original priority after releasing synchronization resources.
priority ceiling protocol
andpriority inheritance
When the lock is released, the priority of the low-priority task is restored.Also note that the above2
This method can only preventUnbounded priority inversion
without preventingBounded priority inversion
(Task H
have to waitTask L
It can only be executed after the execution is completed, this reversal is unavoidable).
Can be avoided or diverted by the occurrence of the followingBounded priority inversion
:
- Reduce the execution time of the critical section, reduce
Bounded priority inversion
The time-consuming reversal of - Avoid using critical section resources that block high-priority tasks;
- Use a queue exclusively to manage resources and avoid using locks.
Priority inheritance must be transitive.Take a chestnut: when
T1
blocked byT2
on the resources held, whileT2
blocked againT3
on a resource held.ifT1
takes precedence overT2
andT3
the priority ofT3
must passT2
inheritT1
priority.Otherwise, if another priority is higher thanT2
andT3
less thanT1
the routT4
will preemptT3
triggering relative toT1
priority inversion. Therefore, the priority that a thread inherits must be the highest priority of the thread that is directly or indirectly blocked.
How to avoid priority inversion?
QoS delivery
The iOS system mainly uses the following two mechanisms to switch between different threads (or queue
) transfer between QoS
:
- Mechanism 1:
dispatch_async
dispatch_async()
Automatically propagates the QoS from the calling thread, though it will translate User Interactive to User Initiated to avoid assigning that priority to non-main threads.- Captured at time of block submission, translate user interactive to user initiated. Used if destination queue does not have a QoS and does not lower the QoS (ex dispatch_async back to the main thread)
- Mechanism 2: XPC-based interprocess communication (
IPC
)
The QoS transmission rules of the system are relatively complicated, mainly refer to the following information:
- current thread’s
QoS
- if using
dispatch_block_create
() method generateddispatch_block
then consider generatingblock
The parameters called when dispatch_async
orIPC
The goalqueue
or threadedQoS
The scheduler will use this information to decide block
With what priority to run.
- If no other threads are synchronously waiting on this
block
,butblock
Just run according to the priority mentioned above. 如果出现了线程间同步等待的情况,则调度程序会根据情况调整线程的运行优先级。
How to trigger the priority inversion avoidance mechanism?
If the current thread is waiting for an ongoing operation on a thread (thread 1) (such as block1
) while the system knows block1
where the target thread (owner
), the system will solve the problem of priority inversion by increasing the priority of the relevant thread.Conversely, if the system does not know block1
If the target thread is located, it is impossible to know whose priority should be increased, and the inversion problem cannot be solved;
Holder information is recorded (owner
) system API is as follows:
pthread mutex
,os_unfair_lock
and the upper layer API based on these two implementationsdispatch_once
The implementation is based onos_unfair_lock
ofNSLock
,NSRecursiveLock
,@synchronized
The implementation of etc. is based onpthread mutex
dispatch_sync
,dispatch_wait
xpc_connection_send_with_message_sync
use the above API
Ability to enable the system to enable the priority inversion avoidance mechanism when a priority inversion occurs.
Basic API Validation
Next, the various “basic systems” mentioned aboveAPI
“authenticating
Test verification environment: Simulator iOS15.2
pthread mutex
pthread mutex
data structurepthread_mutex_s
one of themm_tid
field, specifically to record the thread holding the lockId
.
// types_internal.h
struct pthread_mutex_s {
long sig;
_pthread_lock lock;
union {
uint32_t value;
struct pthread_mutex_options_s options;
} mtxopts;
int16_t prioceiling;
int16_t priority;
#if defined(__LP64__)
uint32_t _pad;
#endif
union {
struct {
uint32_t m_tid[2]; // thread id of thread that has mutex locked
uint32_t m_seq[2]; // mutex sequence id
uint32_t m_mis[2]; // for misaligned locks m_tid/m_seq will span into here
} psynch;
struct _pthread_mutex_ulock_s ulock;
};
#if defined(__LP64__)
uint32_t _reserved[4];
#else
uint32_t _reserved[1];
#endif
};
Code to verify: Will the thread priority be increased?
// printThreadPriority用来打印线程的优先级信息
void printThreadPriority() {
thread_t cur_thread = mach_thread_self();
mach_port_deallocate(mach_task_self(), cur_thread);
mach_msg_type_number_t thread_info_count = THREAD_INFO_MAX;
thread_info_data_t thinfo;
kern_return_t kr = thread_info(cur_thread, THREAD_EXTENDED_INFO, (thread_info_t)thinfo, &thread_info_count);
if (kr != KERN_SUCCESS) {
return;
}
thread_extended_info_t extend_info = (thread_extended_info_t)thinfo;
printf("pth_priority: %d, pth_curpri: %d, pth_maxpriority: %d\n", extend_info->pth_priority, extend_info->pth_curpri, extend_info->pth_maxpriority);
}
First lock and sleep on the child thread, and then the main thread requests the lock
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
printf("begin : \n");
printThreadPriority();
printf("queue before lock \n");
pthread_mutex_lock(&_lock); //确保 backgroundQueue 先得到锁
printf("queue lock \n");
printThreadPriority();
dispatch_async(dispatch_get_main_queue(), ^{
printf("before main lock\n");
pthread_mutex_lock(&_lock);
printf("in main lock\n");
pthread_mutex_unlock(&_lock);
printf("after main unlock\n");
});
sleep(10);
printThreadPriority();
printf("queue unlock\n");
pthread_mutex_unlock(&_lock);
printf("queue after unlock\n");
});
begin :
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock
queue lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
queue unlock
in main lock
after main unlock
queue after unlock
It can be seen that the low-priority child thread holds the lock first, and the priority at that time is4
and when the lock is requested by the main thread, the priority of the child thread is raised to47
os_unfair_lock
os_unfair_lock
to replaceOSSpinLock
, to solve the priority inversion problem.waitos_unfair_lock
The locked thread will be in a dormant state, switching from user mode to kernel mode, rather than busy waiting.os_unfair_lock
threadID
Saved inside the lock, the lock waiter will give up his priority, thus avoiding priority inversion. Verify it:
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
printf("begin : \n");
printThreadPriority();
printf("queue before lock \n");
os_unfair_lock_lock(&_unfair_lock); //确保 backgroundQueue 先得到锁
printf("queue lock \n");
printThreadPriority();
dispatch_async(dispatch_get_main_queue(), ^{
printf("before main lock\n");
os_unfair_lock_lock(&_unfair_lock);
printf("in main lock\n");
os_unfair_lock_unlock(&_unfair_lock);
printf("after main unlock\n");
});
sleep(10);
printThreadPriority();
printf("queue unlock\n");
os_unfair_lock_unlock(&_unfair_lock);
printf("queue after unlock\n");
});
begin :
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock
queue lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
queue unlock
in main lock
after main unlock
queue after unlock
result andpthread mutex
unanimous
pthread_rwlock_t
exist pthread_rwlock_init There are the following tips:
Caveats: Beware of priority inversion when using read-write locks. A high-priority thread may be blocked waiting on a read-write lock locked by a low-priority thread. The microkernel has no knowledge of read-write locks, and therefore can’t boost the low- priority thread to prevent the priority inversion.
The general idea is that the kernel does not perceive read-write locks, and cannot increase the priority of low-priority threads, so priority inversion cannot be avoided. Discovered by query definition:pthread_rwlock_s
contains the fieldrw_tid
specifically to record the thread holding the write lock, which is curious: whypthread_rwlock_s
Haveowner
information but still can’t avoid priority inversion?
struct pthread_rwlock_s {
long sig;
_pthread_lock lock;
uint32_t
unused:29,
misalign:1,
pshared:2;
uint32_t rw_flags;
#if defined(__LP64__)
uint32_t _pad;
#endif
uint32_t rw_tid[2]; // thread id of thread that has exclusive (write) lock
uint32_t rw_seq[4]; // rw sequence id (at 128-bit aligned boundary)
uint32_t rw_mis[4]; // for misaligned locks rw_seq will span into here
#if defined(__LP64__)
uint32_t _reserved[34];
#else
uint32_t _reserved[18];
#endif
};
The https://news.ycombinator.com/item?id=21751269 link mentions:
xnu supports priority inheritance through “turnstiles“, a kernel-internal mechanism which is used by default by a number of locking primitives (list at [1]), including normal pthread mutexes (though not read-write locks [2]), as well as the os_unfair_lock API (via the ulock syscalls). With pthread mutexes, you can actually explicitly request priority inheritance by calling pthread_mutexattr_setprotocol [3] with PTHREAD_PRIO_INHERIT; the Apple implementation supports it, but currently ignores the protocol setting and just gives all mutexes priority inheritance.
The effect is:XNU
useturnstiles
Kernel mechanism for priority inheritance, this mechanism is applied inpthread mutex
andos_unfair_lock
superior.
follow the vine, inksyn_wait
found in the method_kwq_use_turnstile
The call, where the comment is more euphemistic for the interpretation of the read-write lock, addedat least sometimes
pthread mutexes and rwlocks both (at least sometimes) know their owner and can use turnstiles. Otherwise, we pass NULL as the tstore to the shims so they wait on the global waitq.
// libpthread/kern/kern_synch.c
int
ksyn_wait(ksyn_wait_queue_t kwq, kwq_queue_type_t kqi, uint32_t lockseq,
int fit, uint64_t abstime, uint16_t kwe_flags,
thread_continue_t continuation, block_hint_t block_hint)
{
thread_t th = current_thread();
uthread_t uth = pthread_kern->get_bsdthread_info(th);
struct turnstile **tstore = NULL;
int res;
assert(continuation != THREAD_CONTINUE_NULL);
ksyn_waitq_element_t kwe = pthread_kern->uthread_get_uukwe(uth);
bzero(kwe, sizeof(*kwe));
kwe->kwe_count = 1;
kwe->kwe_lockseq = lockseq & PTHRW_COUNT_MASK;
kwe->kwe_state = KWE_THREAD_INWAIT;
kwe->kwe_uth = uth;
kwe->kwe_thread = th;
kwe->kwe_flags = kwe_flags;
res = ksyn_queue_insert(kwq, kqi, kwe, lockseq, fit);
if (res != 0) {
//panic("psynch_rw_wrlock: failed to enqueue\n"); // XXX ksyn_wqunlock(kwq);
return res;
}
PTHREAD_TRACE(psynch_mutex_kwqwait, kwq->kw_addr, kwq->kw_inqueue,
kwq->kw_prepost.count, kwq->kw_intr.count);
if (_kwq_use_turnstile(kwq)) {
// pthread mutexes and rwlocks both (at least sometimes) know their
// owner and can use turnstiles. Otherwise, we pass NULL as the
// tstore to the shims so they wait on the global waitq.
tstore = &kwq->kw_turnstile;
}
......
}
check again_kwq_use_turnstile
By definition, the code is still very honest, only inKSYN_WQTYPE_MTX
will be enabledturnstile
Perform priority inversion protection, and the type of read-write lock isKSYN_WQTYPE_RWLOCK
which means that the read-write lock will not be used_kwq_use_turnstile
so priority inversion cannot be avoided.
#define KSYN_WQTYPE_MTX 0x01
#define KSYN_WQTYPE_CVAR 0x02
#define KSYN_WQTYPE_RWLOCK 0x04
#define KSYN_WQTYPE_SEMA 0x08
static inline bool
_kwq_use_turnstile(ksyn_wait_queue_t kwq)
{
// If we had writer-owner information from the
// rwlock then we could use the turnstile to push on it. For now, only
// plain mutexes use it.
return (_kwq_type(kwq) == KSYN_WQTYPE_MTX);
}
Also in_pthread_find_owner
You can also see that the read-write lockowner
Yes0
void
_pthread_find_owner(thread_t thread,
struct stackshot_thread_waitinfo * waitinfo)
{
ksyn_wait_queue_t kwq = _pthread_get_thread_kwq(thread);
switch (waitinfo->wait_type) {
case kThreadWaitPThreadMutex:
assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_MTX);
waitinfo->owner = thread_tid(kwq->kw_owner);
waitinfo->context = kwq->kw_addr;
break;
/* Owner of rwlock not stored in kernel space due to races. Punt
* and hope that the userspace address is helpful enough. */
case kThreadWaitPThreadRWLockRead:
case kThreadWaitPThreadRWLockWrite:
assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_RWLOCK);
waitinfo->owner = 0;
waitinfo->context = kwq->kw_addr;
break;
/* Condvars don't have owners, so just give the userspace address. */
case kThreadWaitPThreadCondVar:
assert((kwq->kw_type & KSYN_WQTYPE_MASK) == KSYN_WQTYPE_CVAR);
waitinfo->owner = 0;
waitinfo->context = kwq->kw_addr;
break;
case kThreadWaitNone:
default:
waitinfo->owner = 0;
waitinfo->context = 0;
break;
}
}
Replace the lock with a read-write lock to verify that the previous theory is correct:
pthread_rwlock_init(&_rwlock, NULL);
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
printf("begin : \n");
printThreadPriority();
printf("queue before lock \n");
pthread_rwlock_rdlock(&_rwlock); //确保 backgroundQueue 先得到锁
printf("queue lock \n");
printThreadPriority();
dispatch_async(dispatch_get_main_queue(), ^{
printf("before main lock\n");
pthread_rwlock_wrlock(&_rwlock);
printf("in main lock\n");
pthread_rwlock_unlock(&_rwlock);
printf("after main unlock\n");
});
sleep(10);
printThreadPriority();
printf("queue unlock\n");
pthread_rwlock_unlock(&_rwlock);
printf("queue after unlock\n");
});
begin :
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue before lock
queue lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
before main lock
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
queue unlock
queue after unlock
in main lock
after main unlock
It can be seen that the priority of the read-write lock will not be increased
dispatch_sync
thisAPI
I am familiar with it, here is a direct verification:
// 当前线程为主线程
dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_BACKGROUND, 0);
_queue = dispatch_queue_create("com.demo.test", qosAttribute);
printThreadPriority();
dispatch_async(_queue, ^{
printf("dispatch_async before dispatch_sync : \n");
printThreadPriority();
});
dispatch_sync(_queue, ^{
printf("dispatch_sync: \n");
printThreadPriority();
});
dispatch_async(_queue, ^{
printf("dispatch_async after dispatch_sync: \n");
printThreadPriority();
});
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_async before dispatch_sync :
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_sync:
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
dispatch_async after dispatch_sync:
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
_queue
is a low priority queue (QOS_CLASS_BACKGROUND
),can be seendispatch_sync
Call the task that was pushed into the queue, and before thatdispatch_async
Pushed tasks are promoted to higher priority47
(consistent with the main thread), and the lastdispatch_async
tasks are prioritized4
to execute.
dispatch_wait
// 当前线程为主线程
dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(DISPATCH_QUEUE_SERIAL, QOS_CLASS_BACKGROUND, 0);
_queue = dispatch_queue_create("com.demo.test", qosAttribute);
printf("main thread\n");
printThreadPriority();
dispatch_block_t block = dispatch_block_create(DISPATCH_BLOCK_INHERIT_QOS_CLASS, ^{
printf("sub thread\n");
sleep(2);
printThreadPriority();
});
dispatch_async(_queue, block);
dispatch_wait(block, DISPATCH_TIME_FOREVER);
_queue
is a low priority queue (QOS_CLASS_BACKGROUND
), when used in the current main threaddispatch_wait
When waiting, the output is as follows, low priority tasks are promoted to priority47
main thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
sub thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
And if willdispatch_wait(block, DISPATCH_TIME_FOREVER)
After commenting out, the output is as follows:
main thread
pth_priority: 47, pth_curpri: 47, pth_maxpriority: 63
sub thread
pth_priority: 4, pth_curpri: 4, pth_maxpriority: 63
It is worth noting that,
dispatch_wait
is a macro (C11
generics), or an entry-point function that acceptsdispatch_block_t
,dispatch_group_t
,dispatch_semaphore_t
3
Types of parameters, but the specific meaning here should refer todispatch_block_wait
,onlydispatch_block_wait
Priorities are adjusted to avoid priority inversion.
intptr_t
dispatch_wait(void *object, dispatch_time_t timeout);
#if __has_extension(c_generic_selections)
#define dispatch_wait(object, timeout) \
_Generic((object), \
dispatch_block_t:dispatch_block_wait, \
dispatch_group_t:dispatch_group_wait, \
dispatch_semaphore_t:dispatch_semaphore_wait \
)((object),(timeout))
#endif
mysterious semaphore
dispatch_semaphore
beforedispatch_semaphore
The cognition is very shallow, and often equates binary semaphores and mutexes. However, after research, it was found that:dispatch_semaphore
No QoS
The concept of not recording the thread currently holding the semaphore (owner
), so when there are high-priority threads waiting for the lock, the kernel has no way of knowing which thread’s debug priority to raise (QoS
). If the lock holder has a lower priority than other threads, the higher-priority waiting thread will wait forever. Mutex vs Semaphore: What’s the Difference? A detailed comparisonMutex
andSemaphore
difference between.
Semaphores are for signaling (same a condition variables, events) while mutexes are for mutual exclusion. Technically, you can also use semaphores for mutual exclusion (a mutex can be thought as a binary semaphore) but you really shouldn’t.Right, but libdispatch doesn’t have a mutex. It has semaphores and queues. So if you’re trying to use libdispatch and you don’t want the closure-based aspect of queues, you might be tempted to use a semaphore instead. Don’t do that, use os_unfair_lock or pthread_mutex (or a higher-level construct like NSLock) instead.
These are some warnings, seedispatch_semaphore
It is very dangerous and needs to be used with special care.
Here is an explanation through Apple’s official demo:
__block NSString *taskName = nil;
dispatch_semaphore_t sema = dispatch_semaphore_create(0);
[self.connection.remoteObjectProxy requestCurrentTaskName:^(NSString *task) {
taskName = task;
dispatch_semaphore_signal(sema);
}];
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER);
return taskName;
- Assuming that this code is executed on the main thread, the priority of the current thread is
QOS_CLASS_USER_INTERACTIVE
; - Since async is done from the main thread, the async task queue’s
QoS
will be promoted toQOS_CLASS_USER_INITIATED
; - The main thread is semaphore
sema
Blocked, and the priority of the asynchronous task responsible for releasing the semaphoreQOS_CLASS_USER_INITIATED
Lower priority than main threadQOS_CLASS_USER_INTERACTIVE
so priority inversion may occur.
It is worth mentioning that,Clang
Static detection is made specifically for this case:
https://github.com/llvm-mirror/clang/blob/master/lib/StaticAnalyzer/Checkers/GCDAtipatternChecker.cpp
static auto findGCDAntiPatternWithSemaphore() -> decltype(compoundStmt()) {
const char *SemaphoreBinding = "semaphore_name";
auto SemaphoreCreateM = callExpr(allOf(
callsName("dispatch_semaphore_create"),
hasArgument(0, ignoringParenCasts(integerLiteral(equals(0))))));
auto SemaphoreBindingM = anyOf(
forEachDescendant(
varDecl(hasDescendant(SemaphoreCreateM)).bind(SemaphoreBinding)),
forEachDescendant(binaryOperator(bindAssignmentToDecl(SemaphoreBinding),
hasRHS(SemaphoreCreateM))));
auto HasBlockArgumentM = hasAnyArgument(hasType(
hasCanonicalType(blockPointerType())
));
auto ArgCallsSignalM = hasAnyArgument(stmt(hasDescendant(callExpr(
allOf(
callsName("dispatch_semaphore_signal"),
equalsBoundArgDecl(0, SemaphoreBinding)
)))));
auto HasBlockAndCallsSignalM = allOf(HasBlockArgumentM, ArgCallsSignalM);
auto HasBlockCallingSignalM =
forEachDescendant(
stmt(anyOf(
callExpr(HasBlockAndCallsSignalM),
objcMessageExpr(HasBlockAndCallsSignalM)
)));
auto SemaphoreWaitM = forEachDescendant(
callExpr(
allOf(
callsName("dispatch_semaphore_wait"),
equalsBoundArgDecl(0, SemaphoreBinding)
)
).bind(WarnAtNode));
return compoundStmt(
SemaphoreBindingM, HasBlockCallingSignalM, SemaphoreWaitM);
}
To use this feature, just turn on thexcode
Just set it up:
in addition,
dispatch_group
andsemaphore
Similarly, when callingenter()
method, it is impossible to predict who will callleave()
so the system cannot know itsowner
who, so again there will be no issue of priority escalation.
The semaphore is stuck
dispatch_semaphore
The author was very impressed. I wrote a piece of code like this before: using a semaphore to wait for the camera authorization result synchronously in the main thread.
__block BOOL auth = NO;
dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
[KTAuthorizeService requestAuthorizationWithType:KTPermissionsTypeCamera completionHandler:^(BOOL allow) {
auth = allow;
dispatch_semaphore_signal(semaphore);
}];
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);
Long-term occupation stuck after going onlinetop1
, At that time, I was puzzled. After deeply understanding that the semaphore cannot avoid priority inversion, I finally suddenly realized that I swept away the haze in my heart.Such questions are generally2
ways to solve:
- use sync
API
BOOL auth = [KTAuthorizeService authorizationWithType:KTPermissionsTypeCamera];
// do something next
- Asynchronous callback, do not wait on the current thread
[KTAuthorizeService requestAuthorizationWithType:KTPermissionsTypeCamera completionHandler:^(BOOL allow) {
BOOL auth = allow;
// do something next via callback
}];
several concepts
turnstile
Mentioned aboveXNU
useturnstile
For priority inheritance, hereturnstile
Mechanisms are simply described and understood.existXNU
In the kernel, there are a large number of synchronization objects (such aslck_mtx_t
), in order to solve the problem of priority inversion, each synchronization object must correspond to a separate data structure to maintain a large amount of information, such as the thread queue blocked on this synchronization object. As you can imagine, if each synchronization object has to allocate such a data structure, it will cause a huge waste of memory. To solve this problem,XNU
Adoptedturnstile
mechanism, a space-efficient solution. The proposal of this scheme is based on the fact that the same thread cannot be blocked on multiple synchronization objects at the same time.This fact allows all synchronization objects to only need to keep a pointer toturnstile
pointer, and when needed to allocate aturnstile
can, andturnstile
It contains all the information needed to operate a synchronization object, such as the queue of blocked threads and the thread pointer that owns the synchronization object.turnstile
It is dynamically allocated from the pool, and the size of this pool will increase with the number of allocated threads in the system, soturnstile
The total will always be lower than or equal to the number of threads, which also determinesturnstile
The number is controllable.turnstile
Allocated by the first thread blocked on the synchronization object, when no more threads are blocked on the synchronization object,turnstile
will be freed and recycled into the pool.turnstile
The data structure is as follows:
struct turnstile {
struct waitq ts_waitq; /* waitq embedded in turnstile */
turnstile_inheritor_t ts_inheritor; /* thread/turnstile inheriting the priority (IL, WL) */
union {
struct turnstile_list ts_free_turnstiles; /* turnstile free list (IL) */
SLIST_ENTRY(turnstile) ts_free_elm; /* turnstile free list element (IL) */
};
struct priority_queue_sched_max ts_inheritor_queue; /* Queue of turnstile with us as an inheritor (WL) */
union {
struct priority_queue_entry_sched ts_inheritor_links; /* Inheritor queue links */
struct mpsc_queue_chain ts_deallocate_link; /* thread deallocate link */
};
SLIST_ENTRY(turnstile) ts_htable_link; /* linkage for turnstile in global hash table */
uintptr_t ts_proprietor; /* hash key lookup turnstile (IL) */
os_refcnt_t ts_refcount; /* reference count for turnstiles */
_Atomic uint32_t ts_type_gencount; /* gen count used for priority chaining (IL), type of turnstile (IL) */
uint32_t ts_port_ref; /* number of explicit refs from ports on send turnstile */
turnstile_update_flags_t ts_inheritor_flags; /* flags for turnstile inheritor (IL, WL) */
uint8_t ts_priority; /* priority of turnstile (WL) */
#if DEVELOPMENT || DEBUG
uint8_t ts_state; /* current state of turnstile (IL) */
queue_chain_t ts_global_elm; /* global turnstile chain */
thread_t ts_thread; /* thread the turnstile is attached to */
thread_t ts_prev_thread; /* thread the turnstile was attached before donation */
#endif
};
priority value
There are some priority values in the verification process, here is an explanation with the help of “Mac OS® X and iOS Internals”: the priority values involved in the experiment are relative toMach
In terms of layers, and are all user thread values
- The priority of user threads is 0~63;
NSQualityOfServiceBackground
ofMach
The level priority number is 4;NSQualityOfServiceUtility
ofMach
The level priority number is 20;NSQualityOfServiceDefault
ofMach
The level priority number is 31;NSQualityOfServiceUserInitiated
ofMach
The level priority number is 37;NSQualityOfServiceUserInteractive
ofMach
The hierarchy priority is 47;
- The priority of the kernel thread is 80~95;
- The priority of the real-time system thread is 96~127;
- 64~79 are reserved for system use;
Summarize
This article mainly expounds some concepts and solutions of priority inversion, and combinesiOS
Several locks of the platform have been investigated in detail. Through in-depth understanding, some unnecessary priority inversions can be avoided, so as to further avoid stuck exceptions.byte beating APM
The team also monitors the priority of threads to achieve the purpose of discovering and preventing priority inversion.
join us
The ByteDance APM middle platform is committed to improving the performance and stability of all products in the entire group. The technology stack covers iOS/Android/Server/Web/Hybrid/PC/Games/Mini Programs, etc. The work includes but is not limited to performance Stability monitoring, troubleshooting, in-depth optimization, anti-deterioration, etc. Long-term expectation is to output more and more constructive problem discovery and in-depth optimization methods for the industry.
Students who are interested in Byte APM team positions are welcome to send their resumes to xushuangqing@bytedance.com.
reference documents
Add a small assistant to reply[APM]to join the performance monitoring exchange group and get more technical dry goods
#Priority #Inversion #ByteDance #Terminal #Technologys #Personal #Space #News Fast Delivery