Skip to content

Conversation

@tarleb
Copy link
Collaborator

@tarleb tarleb commented Dec 2, 2018

A new module Text.Pandoc.Format is added and exposed to library users.

Types supported as input and/or output format can be described via value
of the KnownFormat type. The submodule
Text.Pandoc.Format.KnownFormat is hidden, but re-exported through its
parent module.

Text.Pandoc.Extensions is made a submodule of Text.Pandoc.Format.
The Extensions module is now hidden, but re-exported through
Text.Pandoc.Format.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 2, 2018

Experimental approach to formalizing formats.

Pros:

  • stricter types and specific errors (e.g., for the -D command line option);
  • implicit knowledge made explicit;
  • formats are enumerable.

Cons:

  • more code and (explicit) complexity;
  • naming conflicts (e.g., constructors of EPUBVersion and HTMLSlidesVariant).

@jgm
Copy link
Owner

jgm commented Dec 4, 2018

Wow, this is a lot of code! I'm not sure yet what I think -- I haven't had time to look at it all.

Can you say more about what problem this solves, or what problem motivated it?

One small comment: since markdown_mmd etc. are implemented as sets of extensions on top of Markdown, it makes more sense to have Markdown be the format, and to implement the others as flavors, it seems to me. But maybe there's a reason for doing it this way?

Also: there's an issue somewhere for making Format an enumerated type (in RawBlock and RawInline) (#547). It would be good to think about that, too, in this context. We wouldn't want to end up with two distinct Format enumerations.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 4, 2018

Sorry for dumping such a large chunk of code. I wanted to do a small edit, but then kept making changes to ensure the approach to be sensible in a bigger context (my first two approaches turned out to be bad).

The main driver for this change is the idea to expose functions like read_file or write_file to Lua users. I'm approaching this by trying to define the respective PandocMonad Haskell functions in a way that would be pleasant and useful for consumers of the Haskell library. We could then define a PandocLua type as an instance of PandocMonad and just use these functions. This is also were the idea for a T.P.IO module came from.

So what I'm looking for is a function writeOutput :: (PandocMonad m, MonadIO m) => OutputOptions -> FilePath -> m (), where OutputOptions contains the target format, writerOptions, etc. The problem I'm facing is that I'd have to pass formats and extensions as a string so it can be passed to getWriter. But that also means that either the extensions encoded in the string, or those stored in WriterOptions, would be ignored. That seemed non-optimal.

So this PR started as an edit to getWriter and getReader, but then got out of hand when I tried to get it "right".


I added separate constructors for all markdown types because the format is passed to filters; filters would no longer have the ability to distinguish between markdown and markdown_mmd output. I feel that a single Markdown type would be the right thing to do, but still didn't want to break backwards compatibility.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 4, 2018

I forgot an important piece: I'd like OutputOptions to be easy to create as a Lua value, so it should not contain types like Writer m.

@tarleb
Copy link
Collaborator Author

tarleb commented Jun 25, 2022

I've rebased the PR, downsized its scope, and made it backwards compatible. It now just shows the general idea, but applies it only in selected parts of the code base.

My plan would be to iterate on this after (if) this PR gets merged:

  1. implement the same for output formats;
  2. use these formats where possible, including template selection (Fix unintended behavior for template search with custom writer #8137);
  3. unify input/output formats -- possibly use a format algebra like that in Write new Format and Formats types, some helper functions pandoc-types#78.

@tarleb tarleb force-pushed the known-formats branch 2 times, most recently from e1d4f7a to e0342a8 Compare June 25, 2022 07:34
@tarleb tarleb changed the title RFC: add module Text.Pandoc.Format Use ADT to represent input formats Jun 25, 2022
@tarleb
Copy link
Collaborator Author

tarleb commented Jun 25, 2022

The downside of this approach is that steps 1 and 2 would lead to moderate code duplication. But that would then disappear again with step 3.

A new module Text.Pandoc.Format.Input is added and exposed to library
users. Types supported as input format can be represented as value of
the *InputFormat* type.
@tarleb
Copy link
Collaborator Author

tarleb commented Dec 16, 2022

This is outdated, and some useful parts of this approach have already made it into the code. Closing.

@tarleb tarleb closed this Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants