This collection of Node.js scripts helps fetch, process, and analyze AEM publish logs from two environments (bacom and da-bacom) to identify content synchronization discrepancies and generate lists of paths for potential actions like deletion or re-importing.
- Node.js (v16 or later recommended)
- npm (usually included with Node.js)
- A valid
X-Auth-Tokenfor accessing the AEM Admin log API (https://admin.hlx.page/log/...).
Install the necessary dependencies:
npm ciThe main configuration happens within the orchestrator script main.js:
- Authentication Token: Open
main.jsand replace the empty string value forX-Auth-Tokenwithin theheadersobject with your valid token.const headers = { 'X-Auth-Token': 'YOUR_VALID_TOKEN_HERE' // <-- EDIT THIS };
- Log Start Date: Adjust the
fromDateParamvariable inmain.jsto set the starting date (in ISO 8601 format) from which date logs should be fetched and you want your comparison.const fromDateParam = 'YYYY-MM-DDTHH:mm:ss.sssZ'; // <-- EDIT THIS (e.g., '2024-04-01T00:00:00.000Z')
- (Optional) Processing Workflow: The
processingScriptsarray inmain.jsdefines which analysis scripts run automatically after logs are fetched. You can modify this list if needed.
To execute the standard analysis process, run the main orchestrator script:
node main.jsThis script will perform the following steps sequentially:
- Fetch Logs: Uses
1-get-logs.jsto fetch logs for bothda-bacomandbacomsites (defined inmain.js) starting fromfromDateParam. Raw logs are saved todata/da-bacom-logs.jsonanddata/bacom-logs.json. A copydata/bacom-logs-bulk.jsonis also created. - Extract Last Actions: Runs
2-extract-last-published-and-deleted.js. This processes the raw logs and creates JSON files detailing the last known action (publish or delete) for each unique path, saving them as:data/da-bacom-last-published.jsondata/da-bacom-last-deleted.jsondata/bacom-last-published.jsondata/bacom-last-published-bulk.jsondata/bacom-last-deleted.json
- Extract Paths to Re-import: Runs
3-extract-paths-to-reimport-to-da-bacom.js. This comparesbacom-last-published-bulk.jsonagainst the DA last action files to find paths potentially needing re-import to DA-Bacom. Outputs are saved to theresults/directory:results/paths-to-reimport-to-da-bacom.txt: List of paths.results/paths-to-reimport-to-da-bacom.json: Detailed JSON for the paths above.results/paths-to-not-reimport-to-da-bacom.json: Detailed JSON for paths that were superseded on DA-Bacom.
- Extract Paths to Delete: Runs
3-extract-paths-to-delete-from-da-bacom.js. This comparesda-bacom-last-published.jsonandbacom-last-deleted.jsonto identify paths published on DA that were subsequently deleted on Bacom since the from date. Outputs:results/paths-to-delete-from-da-bacom.txt: List of paths potentially safe to delete from DA.
data/directory: Contains raw log files (*-logs.json) and processed files detailing the last known state (*-last-published.json,*-last-deleted.json).results/directory: Contains the final analysis outputs, typically lists of paths (.txt) or detailed reports (.json).
Besides the main workflow run by main.js, other scripts exist for specific analyses. They typically read files from data/ or results/ and output new files to results/. Run them manually using node <script-name>.js.
compare-live-html.js: Reads paths fromresults/temp.json(NOTE:temp.jsonis generated by commented-out code in3-extract-paths-to-reimport-to-da-bacom.js), fetches the live.plain.htmlfor each path from both Bacom and DA-Bacom environments, performs a normalized comparison (ignoring certain URLs, attributes, etc.), logs detailed differences (Text, Classes, IDs), and outputs a list of paths with significant differences toresults/html-differences.txt. This script is not finished at all and is very much a prototype script.find-potentially-overwritten-megan-paths.js: Reads a specific list of paths fromdata/paths-megan.txt, compares their last Bacom publish time against the last DA action time (excluding actions by[email protected]by default - check script for correct user), and outputs paths potentially overwritten on DA at least 1 hour after the Bacom publish toresults/megan-paths-conflict.txt. This was used because Megan had sent me a list of her own files to also import to DA, and I wanted to check if they would override any new content in DA.generate-combined-urls.js: Reads path lists from two specified.txtfiles (currentlyresults/paths-to-reimport-to-da-bacom.txtanddata/paths-megan.txt), combines them, removes duplicates, sorts the list, and outputs the unique paths to a specified file (currentlyresults/paths-to-reimport-to-da-bacom-with-content-megan.txt). I used this for merging my list of imports with Megan's