A Metalsmith plugin to find related files within collections.
Files are "related" if they share important terms in their contents.
For each file in a collection, Term Frequency-Inverse Document Frequency (TF-IDF) is used to:
- Find the top
natural.maxTerms
important terms in the file's contents - Find how much weight those terms have in every other file in the collection
- Filter matches that have at least
natural.minTfIdf
weight - Sort by descending weight (most "related" first)
- Limit to
maxRelated
number of matches
npm install --save metalsmith-collections-related
Collections need to be processed before related files can be found:
import Metalsmith from 'metalsmith';
import collections from 'metalsmith-collections';
import related from 'metalsmith-collections-related';
Metalsmith(__dirname)
.use(collections({
// options here
}))
.use(related({
// options here
}))
.build((err) => {
if (err) {
throw err;
}
});
This plugin adds a metadata field named related
to each file in the format:
{
"contents": "...",
"path": "...",
"related": {
"[collection name]": [
{ "contents": "...", "path": "..." },
{ "contents": "...", "path": "..." }
// up to the `maxRelated` number of files
],
"[another collection name]": [
{ "contents": "...", "path": "..." },
{ "contents": "...", "path": "..." }
// up to the `maxRelated` number of files
]
// up to as many collections as the file is in
}
}
which can be used with templating engines, such as with handlebars
:
Type: string
Default: "**/*"
A micromatch
glob pattern to find input files.
Type: number
Default: 3
The number of related files to add to each file's metadata.
Type: object
Default:
{
"minTfIdf": 0,
"maxTerms": 10
}
Type: number
Default: 0
The minimum TF-IDF measure.
Type: number
Default: 10
The maximum number of terms to use for TF-IDF weighting.
Type: object
Default:
{
"allowedTags": [],
"allowedAttributes": {},
"nonTextTags": ["pre"]
}
An object of sanitize-html
options.