Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat](json) Support json_search function in 2.0 #40962

Merged
merged 4 commits into from
Sep 20, 2024

Conversation

liutang123
Copy link
Contributor

Proposed changes

pick #40948
Issue Number: close #xxx

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@liutang123
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -357,6 +373,19 @@ class JsonbPath {
leg_vector.emplace_back(leg.release());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'leg_vector' [clang-diagnostic-error]

        leg_vector.emplace_back(leg.release());
        ^

@@ -357,6 +373,19 @@
leg_vector.emplace_back(leg.release());
}

void pop_leg_from_leg_vector() { leg_vector.pop_back(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'leg_vector' [clang-diagnostic-error]

    void pop_leg_from_leg_vector() { leg_vector.pop_back(); }
                                     ^


bool to_string(std::string* res) const {
res->push_back(SCOPE);
for (const auto& leg : leg_vector) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'leg_vector' [clang-diagnostic-error]

        for (const auto& leg : leg_vector) {
                               ^

}
return true;
}

size_t get_leg_vector_size() { return leg_vector.size(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'leg_vector' [clang-diagnostic-error]

    size_t get_leg_vector_size() { return leg_vector.size(); }
                                          ^

}
return true;
}

size_t get_leg_vector_size() { return leg_vector.size(); }

leg_info* get_leg_from_leg_vector(size_t i) { return leg_vector[i].get(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use of undeclared identifier 'leg_vector' [clang-diagnostic-error]

    leg_info* get_leg_from_leg_vector(size_t i) { return leg_vector[i].get(); }
                                                         ^

static constexpr auto name = "json_search";
static FunctionPtr create() { return std::make_shared<FunctionJsonSearch>(); }

String get_name() const override { return name; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'get_name' should be marked [[nodiscard]] [modernize-use-nodiscard]

Suggested change
String get_name() const override { return name; }
[[nodiscard]] String get_name() const override { return name; }

static FunctionPtr create() { return std::make_shared<FunctionJsonSearch>(); }

String get_name() const override { return name; }
bool is_variadic() const override { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'is_variadic' should be marked [[nodiscard]] [modernize-use-nodiscard]

Suggested change
bool is_variadic() const override { return false; }
[[nodiscard]] bool is_variadic() const override { return false; }


String get_name() const override { return name; }
bool is_variadic() const override { return false; }
size_t get_number_of_arguments() const override { return 0; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'get_number_of_arguments' should be marked [[nodiscard]] [modernize-use-nodiscard]

Suggested change
size_t get_number_of_arguments() const override { return 0; }
[[nodiscard]] size_t get_number_of_arguments() const override { return 0; }

bool is_variadic() const override { return false; }
size_t get_number_of_arguments() const override { return 0; }

DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'get_return_type_impl' should be marked [[nodiscard]] [modernize-use-nodiscard]

Suggested change
DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
[[nodiscard]] DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {

return make_nullable(std::make_shared<DataTypeJsonb>());
}

bool use_default_implementation_for_nulls() const override { return false; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'use_default_implementation_for_nulls' should be marked [[nodiscard]] [modernize-use-nodiscard]

Suggested change
bool use_default_implementation_for_nulls() const override { return false; }
[[nodiscard]] bool use_default_implementation_for_nulls() const override { return false; }

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


String get_name() const override { return name; }
bool is_variadic() const override { return false; }
size_t get_number_of_arguments() const override { return 3; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'get_number_of_arguments' should be marked [[nodiscard]] [modernize-use-nodiscard]

Suggested change
size_t get_number_of_arguments() const override { return 3; }
[[nodiscard]] size_t get_number_of_arguments() const override { return 3; }

@liutang123
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 49159 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 847be8d0e4910ae9966c50afef238c4b234291a4, data reload: false

------ Round 1 ----------------------------------
q1	17664	4341	4351	4341
q2	2064	154	146	146
q3	10460	1941	1904	1904
q4	10342	1258	1331	1258
q5	8444	3872	3915	3872
q6	228	124	125	124
q7	2046	1638	1588	1588
q8	9506	2761	2762	2761
q9	11338	10026	9864	9864
q10	8665	3560	3458	3458
q11	426	250	248	248
q12	477	305	301	301
q13	18359	3944	4014	3944
q14	349	340	330	330
q15	509	463	457	457
q16	527	463	453	453
q17	1132	995	961	961
q18	7270	6839	6799	6799
q19	1708	1531	1516	1516
q20	519	305	310	305
q21	4420	4278	4148	4148
q22	491	392	381	381
Total cold run time: 116944 ms
Total hot run time: 49159 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4345	4346	4333	4333
q2	332	227	219	219
q3	4220	4209	4167	4167
q4	2761	2755	2759	2755
q5	7217	7172	7138	7138
q6	240	119	117	117
q7	3221	2862	2814	2814
q8	4375	4486	4507	4486
q9	13828	13668	13587	13587
q10	4383	4455	4407	4407
q11	783	710	729	710
q12	1085	898	908	898
q13	7112	3942	3909	3909
q14	471	458	438	438
q15	490	464	451	451
q16	661	613	585	585
q17	3781	3834	3789	3789
q18	9056	8713	8815	8713
q19	1726	1703	1662	1662
q20	2388	2128	2155	2128
q21	8382	8420	8410	8410
q22	1059	940	944	940
Total cold run time: 81916 ms
Total hot run time: 76656 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.81% (8144/21540)
Line Coverage: 29.56% (67020/226739)
Region Coverage: 29.05% (34582/119049)
Branch Coverage: 24.96% (17820/71386)
Coverage Report: http://coverage.selectdb-in.cc/coverage/847be8d0e4910ae9966c50afef238c4b234291a4_847be8d0e4910ae9966c50afef238c4b234291a4/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 211852 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 847be8d0e4910ae9966c50afef238c4b234291a4, data reload: false

query1	960	393	404	393
query2	6580	2190	2142	2142
query3	6922	201	200	200
query4	23820	21758	21559	21559
query5	19743	6506	6464	6464
query6	296	219	220	219
query7	4193	296	312	296
query8	263	272	279	272
query9	3087	2671	2609	2609
query10	439	298	293	293
query11	15849	14924	14952	14924
query12	127	75	71	71
query13	1032	442	431	431
query14	17709	13153	13704	13153
query15	382	222	229	222
query16	6440	281	258	258
query17	1819	896	903	896
query18	902	316	312	312
query19	209	150	152	150
query20	80	76	76	76
query21	187	94	95	94
query22	5195	5035	5013	5013
query23	34318	33542	33548	33542
query24	7752	6273	6265	6265
query25	523	430	411	411
query26	1267	162	157	157
query27	2504	294	290	290
query28	6105	2298	2241	2241
query29	2927	2796	2731	2731
query30	242	170	167	167
query31	951	749	734	734
query32	75	61	60	60
query33	451	259	247	247
query34	875	475	468	468
query35	1131	900	943	900
query36	1396	1156	1130	1130
query37	173	61	65	61
query38	3072	2918	2878	2878
query39	1378	1338	1309	1309
query40	306	94	93	93
query41	39	38	35	35
query42	86	88	87	87
query43	635	569	673	569
query44	1171	710	714	710
query45	247	230	232	230
query46	1235	939	971	939
query47	1935	1774	1710	1710
query48	511	419	409	409
query49	649	370	373	370
query50	861	584	601	584
query51	4755	4672	4715	4672
query52	95	74	83	74
query53	233	189	181	181
query54	2647	2500	2473	2473
query55	84	79	83	79
query56	225	210	192	192
query57	1241	1251	1038	1038
query58	217	209	208	208
query59	3588	3205	3493	3205
query60	216	207	217	207
query61	102	91	93	91
query62	883	487	491	487
query63	203	178	175	175
query64	5843	1643	1468	1468
query65	3589	3578	3554	3554
query66	623	414	425	414
query67	16199	15177	16286	15177
query68	10102	673	642	642
query69	504	276	259	259
query70	1701	1398	1601	1398
query71	421	316	303	303
query72	6901	4831	4725	4725
query73	769	318	328	318
query74	6199	5768	5883	5768
query75	5415	3668	3703	3668
query76	5836	1131	1208	1131
query77	1001	261	256	256
query78	12524	11766	11962	11766
query79	8400	657	649	649
query80	1735	388	371	371
query81	482	233	240	233
query82	1621	97	102	97
query83	179	131	128	128
query84	254	71	69	69
query85	887	317	303	303
query86	327	294	289	289
query87	3202	2990	2989	2989
query88	4540	2314	2291	2291
query89	497	291	309	291
query90	1955	210	210	210
query91	169	122	126	122
query92	56	50	51	50
query93	6939	597	551	551
query94	663	203	205	203
query95	1965	1946	1916	1916
query96	651	333	328	328
query97	6446	6380	6418	6380
query98	218	198	193	193
query99	3018	907	886	886
Total cold run time: 325369 ms
Total hot run time: 211852 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 847be8d0e4910ae9966c50afef238c4b234291a4, data reload: false

query1	0.02	0.02	0.02
query2	0.07	0.02	0.03
query3	0.25	0.04	0.06
query4	1.79	0.07	0.07
query5	0.54	0.52	0.52
query6	1.29	0.61	0.61
query7	0.02	0.01	0.00
query8	0.04	0.02	0.02
query9	0.52	0.50	0.48
query10	0.54	0.53	0.53
query11	0.12	0.08	0.08
query12	0.12	0.09	0.08
query13	0.62	0.61	0.62
query14	0.76	0.80	0.77
query15	0.77	0.76	0.75
query16	0.38	0.39	0.36
query17	0.98	0.99	1.01
query18	0.25	0.25	0.23
query19	1.86	1.86	1.80
query20	0.01	0.00	0.01
query21	15.48	0.55	0.54
query22	2.04	2.32	1.44
query23	17.12	1.08	0.84
query24	5.92	1.04	1.10
query25	0.41	0.07	0.06
query26	0.61	0.16	0.16
query27	0.05	0.04	0.03
query28	7.07	0.73	0.70
query29	12.64	2.33	2.27
query30	0.58	0.51	0.54
query31	2.80	0.38	0.37
query32	3.41	0.50	0.48
query33	3.10	3.07	3.04
query34	15.24	4.81	4.78
query35	4.85	4.87	4.85
query36	1.07	1.01	1.01
query37	0.06	0.05	0.04
query38	0.03	0.02	0.03
query39	0.02	0.02	0.01
query40	0.16	0.14	0.14
query41	0.06	0.01	0.01
query42	0.02	0.01	0.02
query43	0.03	0.01	0.01
Total cold run time: 103.72 s
Total hot run time: 30.34 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 847be8d0e4910ae9966c50afef238c4b234291a4 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       20.9 seconds inserted 10000000 Rows, about 478K ops/s

@liutang123
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.80% (8142/21540)
Line Coverage: 29.55% (66999/226740)
Region Coverage: 29.04% (34570/119051)
Branch Coverage: 24.95% (17812/71390)
Coverage Report: http://coverage.selectdb-in.cc/coverage/7500e4b7da110006683996c7ead5c8c44f8bf0d1_7500e4b7da110006683996c7ead5c8c44f8bf0d1/report/index.html

@liutang123
Copy link
Contributor Author

run external

Copy link
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@lide-reed lide-reed merged commit 8e6e82d into apache:branch-2.0 Sep 20, 2024
23 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/nereids kind/test reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants