-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata types should be duckdb types #28
Comments
Eg I was expecting meta for uuid example to be {
"meta": [
{
"name": "uuid_string",
"type": "VARCHAR"
},
{
"name": "uuid_uuid",
"type": "UUID"
},
{
"name": "uuid_string_type",
"type": "VARCHAR"
},
{
"name": "uuid_uuid_type",
"type": "VARCHAR"
}
]
} |
Is representing numbers as strings intentional?
|
@tarasglek absolutely those are great points! The initial intention of http_server was to emulate ClickHouse alongside chsql (a reimplementation of quackpipe as a set of extensions from the inside out) but it seems we should steer this back into being a DuckDB native resource for other developers and extensions to build on top of without having to fork or redo the job. Shall we retain the clickhouse API compatibility? I'd be more than happy introducing a "switch" or new formats to be DuckDB native if there's interest in leveraging this for integrations. |
This is the defacto duckdb rpc interface for now :) |
Then its time to do right. If there's a proposal, please let me know. My instinct would be to "default" format to DuckDB unless one of the ClickHouse emulation formats is specified as a starting point. WDYT? |
Your proposal makes sense. |
Dropping in on the conversation (sorry!) just reflecting outside of json on - on the response format mimetype, there is now json and what about "text/csv"? The CSV reader in duckdb is fast and it would be nice to be able to serve csv for various dirty integrations or scenarios where parquet and json are too modern inventions :) ... (ideally supporting streaming with those range requests which I believe the CSV reader in duckdb can deal with). And at the other end of the spectrum, for stronger typing and more modern applications maybe parquet and Arrow Flight SQL would be nice (or maybe gRPC is not in scope for httpserver)? |
@mskyttner we're building a functional prorotype (quackpy) to add a Flight SQL interface on top of http_server but it'll take a little more time |
Need to recheck but I believe this might just be there to overcome JSON's limitations for large numbers. will confirm during the next batch of changes for sure |
@tarasglek since the serializer is now separate from the main logic (thanks to our awesome contributors @NiclasHaderer and @gropaul) we could think of implementing header based control for types? We could choose some headers or any other approach, ie: // Handle both GET and POST requests
void HandleHttpRequest(const duckdb_httplib_openssl::Request& req, duckdb_httplib_openssl::Response& res) {
// ...
bool convert_types = true; // Default to true
if (req.has_param("convert_types")) {
convert_types = req.get_param_value("convert_types") == "true";
} else if (req.has_header("X-Convert-Types")) {
convert_types = req.get_header_value("X-Convert-Types") == "true";
} and then extend the serializer to skip conversion based on parrameters? void ResultSerializer::SerializeValue(yyjson_mut_val *parent, const Value &value, optional_ptr<string> name, const LogicalType &type, const bool convert_types) {
yyjson_mut_val *val = nullptr;
if (value.IsNull()) {
goto null_handle;
}
// If no conversion is needed, serialize directly
if (!convert_types) {
val = yyjson_mut_strcpy(doc, value.ToString().c_str());
if (!name) {
YY_APPEND_FAIL(yyjson_mut_arr_append(parent, val));
} else {
yyjson_mut_val *key = yyjson_mut_strcpy(doc, name->c_str());
D_ASSERT(key);
YY_APPEND_FAIL(yyjson_mut_obj_add(parent, key, val));
}
return;
}
switch (type.id()) {
// Type conversion cases (as before)
// ...
} Any thoughts? |
Tbh. I would never convert number to string. I don't know why clickouse would do that (and if they really do it), but it sounds wrong to me and incentivizes people to do things like store everything in strings. This is something that is already happening, but I really do not want to make bad things worse by adding features like this... The Serializer (mind you, only the JsonCompact one) tries to convert every viable type to its correct corresponding value. I however agree, that the meta description would benefit from some love and polishing. I think Int32 is the correct value to return (even though we might want to format it a bit nicer), but complex types like maps and structs could be represented nicer |
I think that the behaviour mentioned in the first message is outdated since #26, in fact it should serialize as a number since this PR. But there was not a release to the duckdb community repo, so the old serialization is still active for everybody installing via the community extensions repo. |
Everything larger than a |
correct, we were waiting for 1.2.0 and its time to release this week! |
PR Merged 0.1.5 is on the way out |
I can but my testing depends on go binding which is not on 1.2 yet |
I noticed that metadata returned when POSTing to
/?default_format=JSONCompact
uses non-duckdb types and they are usually strings. Is this to be compatible with clickhouse?I think it would make more sense to use duckdb types (eg VARCHAR > String), (INT > Int32)
eg
returns
Returns
The text was updated successfully, but these errors were encountered: