Apache Druid It is a distributed data processing system that supports real-time multi-dimensional OLAP analysis. It supports both high-speed real-time data ingestion processing and real-time and flexible multi-dimensional data analysis queries. Therefore, the most commonly used scenario of Druid is flexible and fast multi-dimensional OLAP analysis in the context of big data.
In addition, Druid has a key feature: it supports pre-aggregation ingestion and aggregation analysis of data based on timestamps, so some users often use it in scenarios with time-series data processing and analysis.
Currently Apache Druid 24.0.0 is released, this versionContains over 300 new features, bug fixes, performance enhancements, documentation improvements and additional tests from 67 contributors. Here are some of the new features:
Multi-stage query task engine
SQL-based ingestion of Apache Druid (ingestion) uses a distributed multi-stage query architecture that includes a query engine called the Multi-Phase Query Task Engine (MSQ Task Engine). The MSQ task engine extends Druid’s query capabilities so that queries that reference external data can be written and ingested using SQL INSERT and REPLACE.
As of Druid 24.0.0, SQL-based ingestion using the multi-stage query task engine is the most recommended solution, while alternative ingestion solutions such as native batch processing and Hadoop-based ingestion systems are still supported.
refer to:
#12524
#12386
#12523
#12589
nested columns
Druid now supports storing nested data structures directly in the newly added COMPLEX
refer to:
#12753
#12714
#12753
#12920
Update Java support
Java 11 is fully supported, with improved Java 17 support.
#12839
query engine update
Updated query handling for column indexes and filters
The redesigned column index is very flexible, allowing various index types to be modeled. Added a mechanism to build filters that use updated indexes, while also allowing other column implementations to implement built-in index types to provide adapters to use indexes in the current collection filters provided by Druid.
#12388
time filter operator
You can now use the Druid SQL operator TIME_IN_INTERVAL to filter query results based on time. Use TIME_IN_INTERVAL in preference to the SQL BETWEEN operator to filter by time. For more information, see Date and Time Functions.
#12662
Null values and the “in” filter
ifvalues
array containsnull
, the “in” filter matches null values. Unlike SQL IN filters that do not match null values.
For more information, see Query Filters and SQL Data Types.
#12863
Virtual columns in search queries
Previously, search queries could only search for dimensions present in the data source, now virtual columns are supported as parameters in the query.
#12720
Optimizing simple MIN/MAX SQL queries on __time
Simple query like nowselect max(__time) from ds
astimeBoundary
The query runs to take advantage of the time dimension ordering in the segment. A feature flag can be set to enable this feature.
#12472
#12491
String aggregation result
First/Last string aggregators now compare based on value only.Previously, the value of the first/last string aggregator was first based on_time
Columns are compared, and then by value.
If you have an existing query and want to keep using it_time
column and value, update the query to use ORDER BY MAX(timeCol).
#12773
Jackson serialization
Introduced and implemented new helper functionsJacksonUtils
to achieve SerializerProvider
Object reuse.
Additionally, by default the GroupByQueryToolChest
Backward compatibility for mapped rows, which eliminates copy heavyweightsObjectMapper
. Introduced a configuration option that allows administrators to explicitly enable backwards compatibility.
#12468
Updated IPAddress Java library
A new IPAddress Java library dependency has been added to handle IP addresses, the library includes IPv6 support, and IPv4 functions have been migrated to use the new library.
#11634
Others include lots of performance improvements, this is a big release, check out the update announcement for more details.
#Apache #Druid #released #News Fast Delivery