In big data scenarios, data shuffle represents the process of data exchange in different partitions, and the performance of shuffle often becomes the performance bottleneck of jobs or even the entire cluster. Especially in the scenario where ByteDance has hundreds of PB Shuffle data every day, the shuffle process exposes many problems. This article will expand such problems one by one and introduce the optimization practice in ByteDance.
#Daily #Blog #Application #Practice #Cloud #Shuffle #Service #ByteDance #Spark #Scenario