Skip to content

vhargrave/DA-Milo-Bulk-Import-Comparison

Repository files navigation

AEM Log Analysis Scripts

This collection of Node.js scripts helps fetch, process, and analyze AEM publish logs from two environments (bacom and da-bacom) to identify content synchronization discrepancies and generate lists of paths for potential actions like deletion or re-importing.

Prerequisites

  • Node.js (v16 or later recommended)
  • npm (usually included with Node.js)
  • A valid X-Auth-Token for accessing the AEM Admin log API (https://admin.hlx.page/log/...).

Installation

Install the necessary dependencies:

npm ci

Configuration

The main configuration happens within the orchestrator script main.js:

  1. Authentication Token: Open main.js and replace the empty string value for X-Auth-Token within the headers object with your valid token.
    const headers = {
      'X-Auth-Token': 'YOUR_VALID_TOKEN_HERE' // <-- EDIT THIS
    };
  2. Log Start Date: Adjust the fromDateParam variable in main.js to set the starting date (in ISO 8601 format) from which date logs should be fetched and you want your comparison.
    const fromDateParam = 'YYYY-MM-DDTHH:mm:ss.sssZ'; // <-- EDIT THIS (e.g., '2024-04-01T00:00:00.000Z')
  3. (Optional) Processing Workflow: The processingScripts array in main.js defines which analysis scripts run automatically after logs are fetched. You can modify this list if needed.

Running the Main Workflow

To execute the standard analysis process, run the main orchestrator script:

node main.js

This script will perform the following steps sequentially:

  1. Fetch Logs: Uses 1-get-logs.js to fetch logs for both da-bacom and bacom sites (defined in main.js) starting from fromDateParam. Raw logs are saved to data/da-bacom-logs.json and data/bacom-logs.json. A copy data/bacom-logs-bulk.json is also created.
  2. Extract Last Actions: Runs 2-extract-last-published-and-deleted.js. This processes the raw logs and creates JSON files detailing the last known action (publish or delete) for each unique path, saving them as:
    • data/da-bacom-last-published.json
    • data/da-bacom-last-deleted.json
    • data/bacom-last-published.json
    • data/bacom-last-published-bulk.json
    • data/bacom-last-deleted.json
  3. Extract Paths to Re-import: Runs 3-extract-paths-to-reimport-to-da-bacom.js. This compares bacom-last-published-bulk.json against the DA last action files to find paths potentially needing re-import to DA-Bacom. Outputs are saved to the results/ directory:
    • results/paths-to-reimport-to-da-bacom.txt: List of paths.
    • results/paths-to-reimport-to-da-bacom.json: Detailed JSON for the paths above.
    • results/paths-to-not-reimport-to-da-bacom.json: Detailed JSON for paths that were superseded on DA-Bacom.
  4. Extract Paths to Delete: Runs 3-extract-paths-to-delete-from-da-bacom.js. This compares da-bacom-last-published.json and bacom-last-deleted.json to identify paths published on DA that were subsequently deleted on Bacom since the from date. Outputs:
    • results/paths-to-delete-from-da-bacom.txt: List of paths potentially safe to delete from DA.

Output Files

  • data/ directory: Contains raw log files (*-logs.json) and processed files detailing the last known state (*-last-published.json, *-last-deleted.json).
  • results/ directory: Contains the final analysis outputs, typically lists of paths (.txt) or detailed reports (.json).

Individual Scripts

Besides the main workflow run by main.js, other scripts exist for specific analyses. They typically read files from data/ or results/ and output new files to results/. Run them manually using node <script-name>.js.

  • compare-live-html.js: Reads paths from results/temp.json (NOTE: temp.json is generated by commented-out code in 3-extract-paths-to-reimport-to-da-bacom.js), fetches the live .plain.html for each path from both Bacom and DA-Bacom environments, performs a normalized comparison (ignoring certain URLs, attributes, etc.), logs detailed differences (Text, Classes, IDs), and outputs a list of paths with significant differences to results/html-differences.txt. This script is not finished at all and is very much a prototype script.
  • find-potentially-overwritten-megan-paths.js: Reads a specific list of paths from data/paths-megan.txt, compares their last Bacom publish time against the last DA action time (excluding actions by [email protected] by default - check script for correct user), and outputs paths potentially overwritten on DA at least 1 hour after the Bacom publish to results/megan-paths-conflict.txt. This was used because Megan had sent me a list of her own files to also import to DA, and I wanted to check if they would override any new content in DA.
  • generate-combined-urls.js: Reads path lists from two specified .txt files (currently results/paths-to-reimport-to-da-bacom.txt and data/paths-megan.txt), combines them, removes duplicates, sorts the list, and outputs the unique paths to a specified file (currently results/paths-to-reimport-to-da-bacom-with-content-megan.txt). I used this for merging my list of imports with Megan's

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published