REST job server for Apache Spark
main feature
- Submit spark jobs to run through the Rest API, support sql, java/scala, python type jobs, decouple business systems and spark clusters.
- Spark Job running resources are isolated and highly available, and each job runs independently in a Spark driver.
- Pre-start Spark Driver, improve job startup speed, Driver share to run multiple jobs (only one job runs at the same time)
- Supports multi-Yarn cluster deployment, Client submits Job to the specified cluster to run.
- Driver customization can achieve more capabilities, such as: table permissions, fragmented file compression, DQC and other functions.
- Support job instance task dependency (DAG),
- Support kerberos authentication
There are already kyuubi, livy, why develop jobserver?
I started working on middle-end products a few years ago. DataStudio needs a getway to submit jobs, and it will be slow to submit drivers from spark to yarn. For a better interactive experience, the spark driver needs to be started in advance, similar to jdbc connection to manage the spark driver. At the same time, it needs to solve Resource isolation issues. There was no kyuubi at that time. kyuubi currently supports sql, java/scala task development, and lacks python task capabilities. Most of the tasks encountered at work are python jobs. Spark was initially positioned as a tool for AI personnel, and pyspark is the biggest advantage. livy is mainly for interactive scenarios and requires the client to manage the sessionId. Spark thrift has no resource separation, and a job may use up all resources, making it unavailable to other users.
#Spark #Jobserver #Homepage #Documentation #Downloads #REST #Job #Server #Apache #Spark #News Fast Delivery