Skip to content

Latest commit

 

History

History

metalsmith-collections-related

metalsmith-collections-related

npm: version npm: downloads

Snyk: vulnerabilities codecov: coverage license

A Metalsmith plugin to find related files within collections.

Files are "related" if they share important terms in their contents.

For each file in a collection, Term Frequency-Inverse Document Frequency (TF-IDF) is used to:

  • Find the top natural.maxTerms important terms in the file's contents
  • Find how much weight those terms have in every other file in the collection
  • Filter matches that have at least natural.minTfIdf weight
  • Sort by descending weight (most "related" first)
  • Limit to maxRelated number of matches

Installation

npm install --save metalsmith-collections-related

JavaScript Usage

Collections need to be processed before related files can be found:

import Metalsmith from 'metalsmith';
import collections from 'metalsmith-collections';
import related from 'metalsmith-collections-related';

Metalsmith(__dirname)
    .use(collections({
        // options here
    }))
    .use(related({
        // options here
    }))
    .build((err) => {
        if (err) {
            throw err;
        }
    });

File metadata

This plugin adds a metadata field named related to each file in the format:

{
  "contents": "...",
  "path": "...",
  "related": {
    "[collection name]": [
      { "contents": "...", "path": "..." },
      { "contents": "...", "path": "..." }
      // up to the `maxRelated` number of files
    ],
    "[another collection name]": [
      { "contents": "...", "path": "..." },
      { "contents": "...", "path": "..." }
      // up to the `maxRelated` number of files
    ]
    // up to as many collections as the file is in
  }
}

which can be used with templating engines, such as with handlebars:

{{#each related}}
    <a href="{{ path }}">{{ path }}</a>
{{/each}}

Options

pattern (optional)

Type: string Default: "**/*"

A micromatch glob pattern to find input files.

maxRelated (optional)

Type: number Default: 3

The number of related files to add to each file's metadata.

natural (optional)

Type: object Default:

{
  "minTfIdf": 0,
  "maxTerms": 10
}

natural.minTfIdf (optional)

Type: number Default: 0

The minimum TF-IDF measure.

natural.maxTerms (optional)

Type: number Default: 10

The maximum number of terms to use for TF-IDF weighting.

sanitizeHtml (optional)

Type: object Default:

{
  "allowedTags": [],
  "allowedAttributes": {},
  "nonTextTags": ["pre"]
}

An object of sanitize-html options.

Changelog

Changelog