openmpf · ZachCafego · Feb 21, 2025 · Mar 18, 2026
diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -20,8 +20,9 @@ A list of algorithms currently integrated into the OpenMPF as distributed proces
 | Detection | Speech | Sphinx
 | Detection | Speech | Azure Cognitive Services Batch Transcription API
 | Detection | Scene | OpenCV
+| Detection | Captions/Features | LLaVA
 | Detection | Classification | OpenCV DNN (GoogLeNet, Yahoo NSFW, vehicle color)
-| Detection | Classification | Clip
+| Detection | Classification | CLIP
 | Detection/Tracking | Classification | OpenCV DNN (YOLO)
 | Detection/Tracking | Classification/Features | TensorRT (COCO classes)
 | Detection | Text Region | EAST

diff --git a/docs/site/index.html b/docs/site/index.html
@@ -295,13 +295,18 @@ <h1 id="overview">Overview</h1>
 </tr>
 <tr>
 <td>Detection</td>
+<td>Captions/Features</td>
+<td>LLaVA</td>
+</tr>
+<tr>
+<td>Detection</td>
 <td>Classification</td>
 <td>OpenCV DNN (GoogLeNet, Yahoo NSFW, vehicle color)</td>
 </tr>
 <tr>
 <td>Detection</td>
 <td>Classification</td>
-<td>Clip</td>
+<td>CLIP</td>
 </tr>
 <tr>
 <td>Detection/Tracking</td>
@@ -443,5 +448,5 @@ <h1 id="overview">Overview</h1>
 
 <!--
 MkDocs version : 0.17.5
-Build Date UTC : 2026-02-09 17:23:22
+Build Date UTC : 2026-03-18 17:41:24
 -->
diff --git a/docs/site/search/search_index.json b/docs/site/search/search_index.json
@@ -2,12 +2,12 @@
     "docs": [
         {
             "location": "/index.html",
-            "text": "NOTICE:\n This software (or technical data) was produced for the U.S. Government under contract, and is subject to the\nRights in Data-General Clause 52.227-14, Alt. IV (DEC 2007). Copyright 2024 The MITRE Corporation. All Rights Reserved.\n\n\nOverview\n\n\nThere are numerous video and image exploitation capabilities available today. The Open Media Processing Framework (OpenMPF) provides a framework for chaining, combining, or replacing individual components for the purpose of experimentation and comparison.\n\n\nOpenMPF is a non-proprietary, scalable framework that permits practitioners and researchers to construct video, imagery, and audio exploitation capabilities using the available third-party components. Using OpenMPF, one can extract targeted entities in large-scale data environments, such as face and object detection.\n\n\nFor those developing new exploitation capabilities, OpenMPF exposes a set of Application Program Interfaces (APIs) for extending media analytics functionality. The APIs allow integrators to introduce new algorithms capable of detecting new targeted entity types. For example, a backpack detection algorithm could be integrated into an OpenMPF instance. OpenMPF does not restrict the number of algorithms that can operate on a given media file, permitting researchers, practitioners, and developers to explore arbitrarily complex composites of exploitation algorithms.\n\n\nA list of algorithms currently integrated into the OpenMPF as distributed processing components is shown here:\n\n\n\n\n\n\n\n\nOperation\n\n\nObject Type\n\n\nFramework\n\n\n\n\n\n\n\n\n\n\nDetection/Tracking\n\n\nFace\n\n\nLBP-Based OpenCV\n\n\n\n\n\n\nDetection/Tracking\n\n\nMotion\n\n\nMOG w/ STRUCK\n\n\n\n\n\n\nDetection/Tracking\n\n\nMotion\n\n\nSuBSENSE w/ STRUCK\n\n\n\n\n\n\nDetection/Tracking\n\n\nLicense Plate\n\n\nOpenALPR\n\n\n\n\n\n\nDetection\n\n\nSpeech\n\n\nSphinx\n\n\n\n\n\n\nDetection\n\n\nSpeech\n\n\nAzure Cognitive Services Batch Transcription API\n\n\n\n\n\n\nDetection\n\n\nScene\n\n\nOpenCV\n\n\n\n\n\n\nDetection\n\n\nClassification\n\n\nOpenCV DNN (GoogLeNet, Yahoo NSFW, vehicle color)\n\n\n\n\n\n\nDetection\n\n\nClassification\n\n\nClip\n\n\n\n\n\n\nDetection/Tracking\n\n\nClassification\n\n\nOpenCV DNN (YOLO)\n\n\n\n\n\n\nDetection/Tracking\n\n\nClassification/Features\n\n\nTensorRT (COCO classes)\n\n\n\n\n\n\nDetection\n\n\nText Region\n\n\nEAST\n\n\n\n\n\n\nDetection\n\n\nText (OCR)\n\n\nApache Tika\n\n\n\n\n\n\nDetection\n\n\nText (OCR)\n\n\nTesseract OCR\n\n\n\n\n\n\nDetection\n\n\nText (OCR)\n\n\nAzure Cognitive Services Read API\n\n\n\n\n\n\nDetection\n\n\nForm Structure (with OCR)\n\n\nAzure Cognitive Services Form Recognizer API\n\n\n\n\n\n\nDetection\n\n\nKeywords\n\n\nBoost Regular Expressions\n\n\n\n\n\n\nDetection\n\n\nKnown Phrases\n\n\nTransformer Tagging\n\n\n\n\n\n\nDetection\n\n\nImage (from document)\n\n\nApache Tika\n\n\n\n\n\n\nCorrection\n\n\nText\n\n\nNatural Language Processing using cython_hunspell library\n\n\n\n\n\n\nDetection\n\n\nLanguage\n\n\nfastText with the GlotLID model\n\n\n\n\n\n\nTranslation\n\n\nLanguage\n\n\nAzure Cognitive Services Translate API\n\n\n\n\n\n\nTranslation\n\n\nLanguage\n\n\nNo Language Left Behind (NLLB)\n\n\n\n\n\n\nTranslation\n\n\nLanguage\n\n\nArgos\n\n\n\n\n\n\nTranslation\n\n\nLanguage\n\n\nWhisper\n\n\n\n\n\n\nVideo Summarization\n\n\nActivity\n\n\nLLAMA3\n\n\n\n\n\n\n\n\nThe OpenMPF exposes data processing and job management web services via a User Interface (UI). These services allow users to upload media, create media processing jobs, determine the status of jobs, and retrieve the artifacts associated with completed jobs. The web services give application developers flexibility to use the OpenMPF in their preferred environment and programming language.",
+            "text": "NOTICE:\n This software (or technical data) was produced for the U.S. Government under contract, and is subject to the\nRights in Data-General Clause 52.227-14, Alt. IV (DEC 2007). Copyright 2024 The MITRE Corporation. All Rights Reserved.\n\n\nOverview\n\n\nThere are numerous video and image exploitation capabilities available today. The Open Media Processing Framework (OpenMPF) provides a framework for chaining, combining, or replacing individual components for the purpose of experimentation and comparison.\n\n\nOpenMPF is a non-proprietary, scalable framework that permits practitioners and researchers to construct video, imagery, and audio exploitation capabilities using the available third-party components. Using OpenMPF, one can extract targeted entities in large-scale data environments, such as face and object detection.\n\n\nFor those developing new exploitation capabilities, OpenMPF exposes a set of Application Program Interfaces (APIs) for extending media analytics functionality. The APIs allow integrators to introduce new algorithms capable of detecting new targeted entity types. For example, a backpack detection algorithm could be integrated into an OpenMPF instance. OpenMPF does not restrict the number of algorithms that can operate on a given media file, permitting researchers, practitioners, and developers to explore arbitrarily complex composites of exploitation algorithms.\n\n\nA list of algorithms currently integrated into the OpenMPF as distributed processing components is shown here:\n\n\n\n\n\n\n\n\nOperation\n\n\nObject Type\n\n\nFramework\n\n\n\n\n\n\n\n\n\n\nDetection/Tracking\n\n\nFace\n\n\nLBP-Based OpenCV\n\n\n\n\n\n\nDetection/Tracking\n\n\nMotion\n\n\nMOG w/ STRUCK\n\n\n\n\n\n\nDetection/Tracking\n\n\nMotion\n\n\nSuBSENSE w/ STRUCK\n\n\n\n\n\n\nDetection/Tracking\n\n\nLicense Plate\n\n\nOpenALPR\n\n\n\n\n\n\nDetection\n\n\nSpeech\n\n\nSphinx\n\n\n\n\n\n\nDetection\n\n\nSpeech\n\n\nAzure Cognitive Services Batch Transcription API\n\n\n\n\n\n\nDetection\n\n\nScene\n\n\nOpenCV\n\n\n\n\n\n\nDetection\n\n\nCaptions/Features\n\n\nLLaVA\n\n\n\n\n\n\nDetection\n\n\nClassification\n\n\nOpenCV DNN (GoogLeNet, Yahoo NSFW, vehicle color)\n\n\n\n\n\n\nDetection\n\n\nClassification\n\n\nCLIP\n\n\n\n\n\n\nDetection/Tracking\n\n\nClassification\n\n\nOpenCV DNN (YOLO)\n\n\n\n\n\n\nDetection/Tracking\n\n\nClassification/Features\n\n\nTensorRT (COCO classes)\n\n\n\n\n\n\nDetection\n\n\nText Region\n\n\nEAST\n\n\n\n\n\n\nDetection\n\n\nText (OCR)\n\n\nApache Tika\n\n\n\n\n\n\nDetection\n\n\nText (OCR)\n\n\nTesseract OCR\n\n\n\n\n\n\nDetection\n\n\nText (OCR)\n\n\nAzure Cognitive Services Read API\n\n\n\n\n\n\nDetection\n\n\nForm Structure (with OCR)\n\n\nAzure Cognitive Services Form Recognizer API\n\n\n\n\n\n\nDetection\n\n\nKeywords\n\n\nBoost Regular Expressions\n\n\n\n\n\n\nDetection\n\n\nKnown Phrases\n\n\nTransformer Tagging\n\n\n\n\n\n\nDetection\n\n\nImage (from document)\n\n\nApache Tika\n\n\n\n\n\n\nCorrection\n\n\nText\n\n\nNatural Language Processing using cython_hunspell library\n\n\n\n\n\n\nDetection\n\n\nLanguage\n\n\nfastText with the GlotLID model\n\n\n\n\n\n\nTranslation\n\n\nLanguage\n\n\nAzure Cognitive Services Translate API\n\n\n\n\n\n\nTranslation\n\n\nLanguage\n\n\nNo Language Left Behind (NLLB)\n\n\n\n\n\n\nTranslation\n\n\nLanguage\n\n\nArgos\n\n\n\n\n\n\nTranslation\n\n\nLanguage\n\n\nWhisper\n\n\n\n\n\n\nVideo Summarization\n\n\nActivity\n\n\nLLAMA3\n\n\n\n\n\n\n\n\nThe OpenMPF exposes data processing and job management web services via a User Interface (UI). These services allow users to upload media, create media processing jobs, determine the status of jobs, and retrieve the artifacts associated with completed jobs. The web services give application developers flexibility to use the OpenMPF in their preferred environment and programming language.",
             "title": "Home"
         },
         {
             "location": "/index.html#overview",
-            "text": "There are numerous video and image exploitation capabilities available today. The Open Media Processing Framework (OpenMPF) provides a framework for chaining, combining, or replacing individual components for the purpose of experimentation and comparison.  OpenMPF is a non-proprietary, scalable framework that permits practitioners and researchers to construct video, imagery, and audio exploitation capabilities using the available third-party components. Using OpenMPF, one can extract targeted entities in large-scale data environments, such as face and object detection.  For those developing new exploitation capabilities, OpenMPF exposes a set of Application Program Interfaces (APIs) for extending media analytics functionality. The APIs allow integrators to introduce new algorithms capable of detecting new targeted entity types. For example, a backpack detection algorithm could be integrated into an OpenMPF instance. OpenMPF does not restrict the number of algorithms that can operate on a given media file, permitting researchers, practitioners, and developers to explore arbitrarily complex composites of exploitation algorithms.  A list of algorithms currently integrated into the OpenMPF as distributed processing components is shown here:     Operation  Object Type  Framework      Detection/Tracking  Face  LBP-Based OpenCV    Detection/Tracking  Motion  MOG w/ STRUCK    Detection/Tracking  Motion  SuBSENSE w/ STRUCK    Detection/Tracking  License Plate  OpenALPR    Detection  Speech  Sphinx    Detection  Speech  Azure Cognitive Services Batch Transcription API    Detection  Scene  OpenCV    Detection  Classification  OpenCV DNN (GoogLeNet, Yahoo NSFW, vehicle color)    Detection  Classification  Clip    Detection/Tracking  Classification  OpenCV DNN (YOLO)    Detection/Tracking  Classification/Features  TensorRT (COCO classes)    Detection  Text Region  EAST    Detection  Text (OCR)  Apache Tika    Detection  Text (OCR)  Tesseract OCR    Detection  Text (OCR)  Azure Cognitive Services Read API    Detection  Form Structure (with OCR)  Azure Cognitive Services Form Recognizer API    Detection  Keywords  Boost Regular Expressions    Detection  Known Phrases  Transformer Tagging    Detection  Image (from document)  Apache Tika    Correction  Text  Natural Language Processing using cython_hunspell library    Detection  Language  fastText with the GlotLID model    Translation  Language  Azure Cognitive Services Translate API    Translation  Language  No Language Left Behind (NLLB)    Translation  Language  Argos    Translation  Language  Whisper    Video Summarization  Activity  LLAMA3     The OpenMPF exposes data processing and job management web services via a User Interface (UI). These services allow users to upload media, create media processing jobs, determine the status of jobs, and retrieve the artifacts associated with completed jobs. The web services give application developers flexibility to use the OpenMPF in their preferred environment and programming language.",
+            "text": "There are numerous video and image exploitation capabilities available today. The Open Media Processing Framework (OpenMPF) provides a framework for chaining, combining, or replacing individual components for the purpose of experimentation and comparison.  OpenMPF is a non-proprietary, scalable framework that permits practitioners and researchers to construct video, imagery, and audio exploitation capabilities using the available third-party components. Using OpenMPF, one can extract targeted entities in large-scale data environments, such as face and object detection.  For those developing new exploitation capabilities, OpenMPF exposes a set of Application Program Interfaces (APIs) for extending media analytics functionality. The APIs allow integrators to introduce new algorithms capable of detecting new targeted entity types. For example, a backpack detection algorithm could be integrated into an OpenMPF instance. OpenMPF does not restrict the number of algorithms that can operate on a given media file, permitting researchers, practitioners, and developers to explore arbitrarily complex composites of exploitation algorithms.  A list of algorithms currently integrated into the OpenMPF as distributed processing components is shown here:     Operation  Object Type  Framework      Detection/Tracking  Face  LBP-Based OpenCV    Detection/Tracking  Motion  MOG w/ STRUCK    Detection/Tracking  Motion  SuBSENSE w/ STRUCK    Detection/Tracking  License Plate  OpenALPR    Detection  Speech  Sphinx    Detection  Speech  Azure Cognitive Services Batch Transcription API    Detection  Scene  OpenCV    Detection  Captions/Features  LLaVA    Detection  Classification  OpenCV DNN (GoogLeNet, Yahoo NSFW, vehicle color)    Detection  Classification  CLIP    Detection/Tracking  Classification  OpenCV DNN (YOLO)    Detection/Tracking  Classification/Features  TensorRT (COCO classes)    Detection  Text Region  EAST    Detection  Text (OCR)  Apache Tika    Detection  Text (OCR)  Tesseract OCR    Detection  Text (OCR)  Azure Cognitive Services Read API    Detection  Form Structure (with OCR)  Azure Cognitive Services Form Recognizer API    Detection  Keywords  Boost Regular Expressions    Detection  Known Phrases  Transformer Tagging    Detection  Image (from document)  Apache Tika    Correction  Text  Natural Language Processing using cython_hunspell library    Detection  Language  fastText with the GlotLID model    Translation  Language  Azure Cognitive Services Translate API    Translation  Language  No Language Left Behind (NLLB)    Translation  Language  Argos    Translation  Language  Whisper    Video Summarization  Activity  LLAMA3     The OpenMPF exposes data processing and job management web services via a User Interface (UI). These services allow users to upload media, create media processing jobs, determine the status of jobs, and retrieve the artifacts associated with completed jobs. The web services give application developers flexibility to use the OpenMPF in their preferred environment and programming language.",
             "title": "Overview"
         },
         {