Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openml integration in shogun #4628

Draft
wants to merge 33 commits into
base: develop
Choose a base branch
from
Draft

Conversation

gf712
Copy link
Member

@gf712 gf712 commented May 3, 2019

adds OpenML natively to shogun.

TODO:

  • C++ specific
  • write GET
  • write POST
  • write OpenML flow class
  • write OpenML task
  • Shogun specific
  • write OpenML to SGObject
  • write SGObject to OpenML
  • expose OpenML to all interfaces
  • write shogun extension for runs with flows
  • Maintenance stuff
  • figure out how to split all the classes, i.e. files and directories in shogun

return 0;
}

const char* OpenMLReader::xml_server = "https://www.openml.org/api/v1/xml";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these things be hard-coded here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


if (!curl_handle)
{
SG_SERROR("Failed to initialise curl handle.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe some infos on what happened?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the exact error would be... From the docs: "If this function [curl_easy_init] returns NULL, something went wrong and you cannot use the other curl functions."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks, yes, that seems helpful!

@karlnapf
Copy link
Member

karlnapf commented May 5, 2019

sweet, looking forward to see this once it is working :)

@gf712
Copy link
Member Author

gf712 commented May 7, 2019

@vigsterkr I copied a few things from deadbeef to add RapidJSON as a dependency

@vigsterkr
Copy link
Member

saw it 😼

CONFIG_FLAG HAVE_XML)
# RapidJSON
include(external/RapidJSON)
SHOGUN_INCLUDE_DIRS(SCOPE PUBLIC ${RAPIDJSON_INCLUDE_DIR})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigsterkr is it ok to have RapidJSON as PUBLIC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it was like this in the other PR, right? i mean currently i include the RapidJSON only in .cpp so private should be fine.... but maybe/probably that caused me troubles with unit tests ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was private in the other PR, but swig needs the RapidJSON header files

Copy link
Member

@vigsterkr vigsterkr May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? now i understand why :)

#include <shogun/io/SGIO.h>

#include <curl/curl.h>
#include <rapidjson/document.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would avoid this by all means :)

components_type m_components;

#ifndef SWIG
static void check_flow_response(rapidjson::Document& doc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it really need to be part of the class, meaning can't you just have this function in the implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I guess I can move to free function and avoid exposing RapidJSON to swig :)

#ifndef SWIG
static void check_flow_response(rapidjson::Document& doc);

static SG_FORCED_INLINE void emplace_string_to_map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above...

param_dict.emplace(name, "");
}

static SG_FORCED_INLINE void emplace_string_to_map(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@gf712 gf712 force-pushed the openml_rest branch 3 times, most recently from 4a6f93b to 45ac04e Compare May 8, 2019 12:10
@gf712 gf712 force-pushed the openml_rest branch 4 times, most recently from df8809d to ea27c0a Compare May 10, 2019 10:08
@gf712 gf712 force-pushed the openml_rest branch 2 times, most recently from 394b4bd to 8a98439 Compare May 15, 2019 07:34
@gf712
Copy link
Member Author

gf712 commented May 17, 2019

I have a local version mostly working (with binary classification tasks at least). There are some memory issues but they will be fixed once we switch to smart pointers.

@iglesias
Copy link
Collaborator

@karlnapf so this works now, a minimal example looks like this:

#include <shogun/base/init.h>
#include <shogun/io/openml/OpenMLFlow.h>
#include <shogun/io/openml/OpenMLTask.h>
#include <shogun/io/openml/OpenMLRun.h>
#include <shogun/io/openml/OpenMLSplit.h>

using namespace shogun;

int main()
{
	init_shogun_with_defaults();
	sg_io->set_loglevel(MSG_GCDEBUG);
	auto flow = OpenMLFlow::download_flow("9602", "");
	auto task = OpenMLTask::get_task("4423", "");
	auto run = OpenMLRun::run_flow_on_task(flow, task);
        // TODO: run.publish()
	exit_shogun();
}

Once the smart pointer PR is merged it'll work with interfaces

What are 9602 and 4423?

@gf712
Copy link
Member Author

gf712 commented May 21, 2019

What are 9602 and 4423?

Those are the IDs of a flow and a task that I used with the openml-python extension I wrote and i know they work

@gf712 gf712 force-pushed the openml_rest branch 2 times, most recently from 7842b61 to 54b2936 Compare May 29, 2019 13:53
@stale
Copy link

stale bot commented Aug 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 2, 2020
@karlnapf
Copy link
Member

karlnapf commented Aug 6, 2020

yes, let's keep it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants