User talk:Zihao H/sandbox

= Apache Doris =

Apache Doris is an open-source high-performance, real-time analytical database based on MPP architecture. And it is a project of the Apache Software Foundation.

Apache Doris only requires a sub-second response time to return query results under massive data. It can support not only high-concurrent point query scenarios but also high-throughput analysis scenarios. Apache Doris is a tool for report analysis, ad-hoc query, unified data warehouse, and data lake query acceleration. On Apache Doris, users can build various applications, such as user behavior analysis, A/B test platform, log retrieval analysis, user portrait analysis, and order analysis.

Doris is widely used in business OLAP applications to analyze massive data in real-time. Apache Doris currently serves over 1,000 users worldwide covering leading technology companies, such as Alibaba Cloud, Baidu, Bytedance(TikTok), JD.COM, Kwai, Meituan, MiHoYo, NetEase, Shopee, Tencent, Xiaomi, etc.

History
Originally known as Baidu PALO, Doris was born inside Chinese search engine company Baidu as a data warehouse for its advertisement business before it open-sourced in 2017 and entered the Apache Incubator in 2018.

In June 2022, Apache Doris graduated from Apache incubator as a Top-Level Project successfully. In 5 years, with the guidance of Apache Way and the great support from incubator mentors, Apache Doris Community nurtured an impressive growth with more than 400 contributors and reached 6,700 stars in Github repository.

Applications
Apache Doris can be easily used in several cases:

Reporting Anaylysis
Analysis services such as Real-time Dashboards, Reports for Decision Making and High Concurrent user-oriented report.



Ad-Hoc Query
Analyst-oriented self-service analytics with irregular query patterns and high throughput requirements.

Unified Data Warehouse
Apache Doris allows users to build a unified data warehouse via single platform instead of handling multiple software stacks.

Lakehouse Query
You can query your datalake from Apache Hive, Apache Hudi, Apache Iceberg and other Object Storage System such as AWS S3.

Architecture


Apache Doris has a simple architecture with 2 types of processes, FE and BE.

FE
Frontend (FE) is designed for user request access, query parsing & planning, metadata management, node management, etc.

BE
Backend (BE) is designed for data storage and query plan execution

MPP
Massively parallel processing (MPP) is a collaborative processing of the same program using two or more processors. By using different processors, speed can be dramatically increased. Doris adopts the MPP model in its query engine to realize parallel execution within different nodes. It also supports distributed shuffle JOIN for multiple wide tables to handle complex queries.

Vectorized SQL Query Execution Enginge
The Doris query engine is vectorized, with all memory structures lined up in a columnar format. This can largely reduce virtual function calls, improve cache hit rates, and make efficient use of SIMD instructions. Usually, vectorized engine is 5-10 time faster in wide table aggregation than the non-vectorized.

ClickBench
ClickBench is a benchmark for analytical DBMS. This benchmark represents typical workload in areas of traffic analysis, web analytics, machine-generated data, structured logs, and events data.

Cold Run
Apache Doris won the 2nd place in query performance test(Cold Run on Instance: c6a.4xlarge, 500gb gp2).

Hot Run
Apache Doris won the 3rd place in query performance test(Hot Run on Instance: c6a.4xlarge, 500gb gp2).