Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of Tensorboard Plugins and Julia Types #10

Open
PhilipVinc opened this issue Mar 27, 2019 · 27 comments
Open

List of Tensorboard Plugins and Julia Types #10

PhilipVinc opened this issue Mar 27, 2019 · 27 comments

Comments

@PhilipVinc
Copy link
Member

This is a list of all Tensorboard plugins and types that they (potentially) can accept. I want to see the overlap among different types.

  • Scalar (scalar vs iteration line plot)
    • accepts <:Number. Non Real numbers can be preprocessed (e.g. Complex numbers);
  • Scalars (taken from TensorboardX, displays multiple curves on the same scalar plot)
    • ? potentially a N-tuple of numbers ?
  • Histograms/Distributions
    • Tuple{Vector{Real},Vector{Real}} for histogram's bin edges and bin heights;
    • Vector, plot a vector as an histogram by converting data to Tuple(1:length(data)-1, data) ;
  • Images
    • 3-Arrays are naturally 2D images;
    • Matrix is a grayscale 2D image;
    • Vector is a grayscale 1D image;
    • Could also be passed as an image object (jpeg, eps...)
  • Text
    • String
    • Vector{String} or Matrix{String} are also rendered as lists or tables.
  • Embeddings
    • Requires a Tensor with data, and optionally label_imgs and metadata
  • PR Curves
    • ??
  • Audio
    • This is specific enough and you must specify the sampling rate and other parameters that I guess can be handled by a specific type.

Extra:
(This is something I need for myself and is not currently supported by TensorBoard). Eventually I would like to contribute to tensorboard a plugin to show a whole plot/curve at each iteration (very similar to what PR curve does). Tensorboard dev team is also considering this but apparently they don't have the time to work on it ATM.

  • Curves
    • Tuple{Vector{Real},Vector{Real}} (this is different from histograms, as the length of the two vectors is the same in this case while for histograms they differ by 1).
@oxinabox
Copy link
Member

Text mode

Text mode seems really strong, because it will render markdown.
docs, python demo
And markdown is approximately a superset of HTML, since you can embed HTML within it.
It should be possible to use repr(MIME"text/markdown, datum), with fallbacks to MIME"text/html"andMIME"text/plain`.
in order to displace basically anything, as a fallback.
Probably add some more structure to that than just dumping the text representation, e,g. the time and step and such.

@oxinabox
Copy link
Member

oxinabox commented Mar 27, 2019

I do think that supporting everything in the logger via a function based API,
and wrapping and dispatching that API in the logging API is two steps of the task.
I.e. one can wrap all the plugins and worry about deciding which to use when you @info them seperately.

@c42f
Copy link
Member

c42f commented Apr 5, 2019

Btw, regarding text mode - I want to render textural logs as markdown-formatted text by default at some stage (in Logging.ConsoleLogger) so that's an extremely helpful "coincidence". (Only reason it hasn't been done was that stdlib Markdown was a bit awkward to use when adding extra decorations to the left of the the markdown text last time I tried.)

@PhilipVinc
Copy link
Member Author

@c42f, @oxinabox A reflection regarding using Wrapper types to dispatch to the right back-end, as we had briefly discussed in #9

For example, just present normal 2D arrays as images; if you want a histogram as presentation I think this implies you're interpreting the array as samples from some distribution, and you should wrap the array appropriately in a Samples wrapper. (Choose a better name of course ;-) ).

One potential issue I foresee with wrapper types is that when you compose loggers ('a la DemuxLogger or switch from Tensorboard to anything else that is not aware about those wrapper types, you start to introduce unnecessary noise.

For example, I recently started composing two loggers, Tensorboard and a very basic dump-to-MVHistory logger, so that I can monitor my simulations in tensor board and keep a copy of all the data in a Julia-friendly format. It's annoying to see wrapper types in the MVHistory.

@oxinabox
Copy link
Member

oxinabox commented Apr 26, 2019

hmm yes, that is annoying. That is a very good point.

The alternative is magic names.
So for say logging a Vector to the text logger,
@info x x_tensorbroad=TB_Text
and so the Tensorboad logger would if it gets a argument called x,
and an argument called x_tensorboard, would then call the function specified by x_tensorboard on x before deciding which logger.
So in the above it would call TB_text(x).

This could also be used to allow other preprocessing that you only want to do for the TensorBoardLogger, e.g. you might want to so x_tensorboard = y->sum(y;dim=1)

Other loggers would just see the original x, and well as the extra magic argument.
So they would still be able to handle x in their favoured way.

@c42f
Copy link
Member

c42f commented Apr 28, 2019

when you compose loggers or switch from Tensorboard to anything else that is not aware about those wrapper types, you start to introduce unnecessary noise.

Can you give a concrete example? I still hold to my point that wrappers are the appropriate way to add certain kinds of meaning to key value pairs (eg, "interpret this array as samples drawn from a distribution" => histogram formatting).

I do think the idea of adding formatting hints such as x_tensorboard into the log statements themselves will never work nicely with other backends and will just cause problems in the long term.


Let's step back a bit to the general problem. At a high level I think this is a classic case of the need to separate content (the log events) from presentation (the log sinks), but we haven't figured out how to hook the presentation onto the content yet with some kind of pattern matching.

Perhaps it would make sense to consider an analogy with html/css. Primary keys for pattern matching the content in that case are element_name,class,id (I think? I'm no expert whatsoever), plus nesting in the document hierarchy. To continue the analogy, at the content level maybe we have

  • html element_name ~ julia type
  • element class ~ key in the key value pair (??)
  • id ~ log event id? (not terribly useful in this context?)

Currently the log keys are generally used for variable names (or at least, that's how I use them) and there's very little in the way of guidelines for key naming. So one cannot expect them to be consistent between libraries in a way which would allow class-like pattern matching for formatting. But maybe they could or should be somehow?

@shashikdm
Copy link
Contributor

I agree with @c42f. Wrappers are elegant solution to the ambiguity problem. Since automatic dispatch is already in place, one can give raw data (without any wrapper) for simple datatypes such as String, Real, Array when using DemuxLogger. So both loggers can work smoothly. But it can't be helped when one has to use wrapper (such as in case of images) 🤔.

@shashikdm
Copy link
Contributor

I would like to start working on TBtext wrapper soon. Following are the datatypes that I believe TBtext should handle.

  • AbstractString
  • Array (use repr to convert it to String)
  • Matrix (use repr to convert it to String)
  • Markdown.MD (Its from a Library should we support it? String can be found by traversing its fields)

@PhilipVinc @oxinabox @c42f Please let me know if TBtext should handle any other datatype .

@oxinabox
Copy link
Member

Please let me know if TBtext should handle any other datatype .

It should handle literally every datatype.
It doesn't need to do any kind of handling.
It just needs to control the dispatch.
We already implemented all that in the https://github.com/PhilipVinc/TensorBoardLogger.jl/blob/de969bdd31d5ace88b484a76e77bea5ac08c1b59/src/Loggers/LogText.jl#L12

@PhilipVinc
Copy link
Member Author

PhilipVinc commented Apr 28, 2019

Can you give a concrete example? I still hold to my point that wrappers are the appropriate way to add certain kinds of meaning to key value pairs (eg, "interpret this array as samples drawn from a distribution" => histogram formatting).

My current setup involves a DemuxLogger which forwards log messages both to TensorBoard and to an extremely simple logger (let's call it MVHistoryLogger) which pushes all incoming messages to a MVHistory.

If I use a wrapper type TBImage to log a 2D Matrix so that TensorBoard it as a 2D image

@info "" mymatrix=TBImage(my2ddata)

then in my matrix data will be stored wrapped inside the TBImage type. An additional complication of this is that if I serialise the MVHistory holding this data with JLD2, I will also need to load TensorBoardLogger otherwise he might spit out errors because he does not recognise the type.

--
I do agree that using Wrapper types is the most elegant solution when using only one logger. I just see some problems arising when mixing several loggers. Unless all loggers are aware of this preprocess machinery and of wrapper types (by splitting this logic out into a separate package).

@shashikdm
Copy link
Contributor

shashikdm commented Apr 29, 2019

I suggest instead of making a struct and using it as wrapper, we can create function which return data along with some metadata which tells TBLogger which logger to use. eg

function TBimage(data::Array)
        metadata = "log_image"
        (data, metadata)
end

then in preprocess function we can check if metadata exists, use that logger. else automatic dispatch

@shashikdm
Copy link
Contributor

Downside is that metadata will also appear in the other logger.

@c42f
Copy link
Member

c42f commented Apr 29, 2019

If I use a wrapper type TBImage to log a 2D Matrix so that TensorBoard it as a 2D image

Thanks, it's great to have a concrete example. Certainly it's less than ideal to have the wrappers be TensorBoard-specific types because other backends then need to depend on TensorBoard for correct formatting.

Of course we could have some selected wrappers in stdlib Logging, but that wouldn't generalize either. @shashikdm's suggestion of adding metadata makes sense. We'd need a type for this which is more specific than a plain Tuple — you don't want to confuse 2-element Tuple (value1,value2) with (value,metadata).

Having said that, I think a more idiomatic alternative would be to put some "log key-value matching" functions into a central location (perhaps LoggingExtras for now, with the view to eventually moving it into stdlib Logging). The existing multimedia display system seems very related, ie, display(d::AbstractDisplay, x) and show(io, mime, x) etc. It seems like we should somehow hook into those or extend them for this purpose.

Relevant prior discussion JuliaLang/julia#29397
See also JuliaLang/julia#27430

@PhilipVinc
Copy link
Member Author

I agree with the fact that the multimedia display system is related. For example, if all wrapper types were subtypes of some abstract type WrapperLogType end, we could fix the formatting of wrapped types by defining

show(io, mime, x::WrapperLogType) = show(io, mime, x.data) 

We could then define a new MIME type "TensorBoard" and put logic relevant for us there.

But

  • We still need a way to pass down information on the key (or tag) under which data is shown in Tensorboard, which show does not support; we would need something like show(io, mime, x, key)
  • right now we don't really log to a stream, but rather push to a Vector and only serialise at the end. (Theoretically we could serialise each object individually but this would add some useless overhead if you're logging many messages)

@c42f
Copy link
Member

c42f commented Apr 30, 2019

Agreed, show and display aren't general enough even though there's a tantalizing connection.

Possibly more relevant is Jameson's comment here: JuliaLang/julia#29397 (comment)? I didn't get around to looking at that yet.

@xukai92
Copy link
Contributor

xukai92 commented Sep 18, 2019

tensorboardX supports logging a matplotlib object directly. How hard is it to implmenet this feature here?

@oxinabox
Copy link
Member

Not too hard really, use Plots.savefig then hit up the stuff for displaying images.

@xukai92
Copy link
Contributor

xukai92 commented Sep 18, 2019

Cool. I will give a try.

@oxinabox
Copy link
Member

any Plots dep should be hidden behind Requires.jl

@xukai92
Copy link
Contributor

xukai92 commented Sep 18, 2019

BTW I found the corresponding helper func in tensorboardX: https://github.com/lanpa/tensorboardX/blob/master/tensorboardX/utils.py#L2

@PhilipVinc
Copy link
Member Author

PhilipVinc commented Sep 18, 2019

@oxinabox We don't have yet implemented Requires in TensorBoardLogger.
This makes me think that we could hide behind Requires.jl a default dispatch for Plots.Plot objects to call Plots.savefig on them.

If I have a minute I'll do this with #39

@oxinabox
Copy link
Member

oxinabox commented Sep 18, 2019

@oxinabox We don't have yet implemented Requires in TensorBoardLogger.

Huh so we don't.
I think we should use Requires aggressively in this package.

I know i was hesitant before, but now I think we should add lots of deps and visualize for all of them via using Requires.jl

@xukai92
Copy link
Contributor

xukai92 commented Sep 20, 2019

The new TFBoard also supports a plane called hyperparameters (https://www.tensorflow.org/tensorboard/r2/hyperparameter_tuning_with_hparams). How should one add this?

@PhilipVinc
Copy link
Member Author

PhilipVinc commented Sep 21, 2019

Interesting...
Adding this to TBL.jl should be quite easy:

  1. You should go on the TensorBoard repository and find the .proto file for the hyperparamters plugin;
  2. Compile the .proto to Julia files with ProtoBuf.jl, and include them in the package;
  3. We should decide on a type used internally (and probably also exposed as API) to signal that this data should be serialised as hyper parameters. Something similar to TBImage or TBAudio. Let's call it TBHyperParams for now;
  4. Write the a function hyperparams_summary(name::String, data::TBHyperParams) that serialises the data to the correct protobuffer. You can have a look at text_summary which implements something similar;
  5. Specify how TBHyperParams should be managed by the dispatch machinery by declaring the two functions
preprocess(name,   val::TBHyperParams, data) = push!(data, name=>val)
summary_impl(name, val:: TBHyperParams) = hyperparams_summary(name,val)

Most of the work will be figuring out how to do [4]. To do it, as the documentation is very scarce, you must go through tensor board or tensorboardX's source code.

If you'd like to give this a try I'd be happy to guide you.

@xukai92
Copy link
Contributor

xukai92 commented Oct 29, 2019

I'm very new to protocol buffer. For 1. I found the folder for hyperparameters is https://github.com/tensorflow/tensorboard/tree/master/tensorboard/plugins/hparams. Which files should I try to convert?

And how do one use ProtoBuf.jl? I tried run(ProtoBuf.protoc(--julia_out=jlout tensorboard/plugins/hparams/api.proto)) (suppose api.proto is what I want to convert) but this gives me errors starting with Plugin output is unparseable.

@PhilipVinc
Copy link
Member Author

Hey @xukai92 I'm sorry I never answered, but I was overwelmed in October/November from my PhD defense.
If you want to get back at this, let me know.

Just for reference, for anyone who attempts this in the future, you should take the .proto files in that folder, and compile them with Protobuffer. AThey probably depend on the Proto files of the main TensorBoard package, so you should pass that folder too, but I should look into it again...

@c42f
Copy link
Member

c42f commented Feb 21, 2020

Relevant crossref to this discussion of dispatch in logging messages is the Progress type @tkf just introduced in ProgressLogging, and which will also be supported in TerminalLogger:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants