Skip to main content

Command Palette

Search for a command to run...

A Local Setup Guide for Lucene Index Inspection with Luke

Updated
2 min read
A Local Setup Guide for Lucene Index Inspection with Luke

Why Inspect Indexes with Luke?

Luke is an X-ray tool for AEM's search phonebook. It lets you see exactly how AEM stores and finds your content.

Luke is a standalone Java GUI for introspecting Lucene index binaries. It exposes segments, term dictionaries, and analyzer tokenization. Direct filesystem access is blocked in AEMaaCS cloud environments. You must use Luke locally against the AEMaaCS SDK to debug custom oak:QueryIndexDefinition nodes. Validate definitions locally before Cloud Manager deployment to the remote Elasticsearch engine.

Version Identification

You need an extraction tool that perfectly matches your AEM version. Mismatched tools will break the process.

AEMaaCS SDKs bundle specific Jackrabbit Oak versions. Check /system/console/bundles for org.apache.jackrabbit.oak-core (e.g., 1.90.0). Download the exact matching oak-run-[version].jar from Maven Central. Download the legacy luke-all-4.7.0.jar. Do not use Luke 8+. Local AEM SegmentNodeStores generate Lucene 4.7.x binaries.

Extracting the Index

AEM stores large search files outside its main database. You must use a command to pull them together into a single folder.

Oak stores Lucene segments as external blobs in the FileDataStore. The segmentstore only holds lightweight blobId references. Extraction requires the --fds-path parameter to resolve these external binaries. Use exact version suffixes for out-of-the-box indexes (e.g., damAssetLucene-13). Never run oak-run tools against an active repository. Stop the local AEM instance first.

java -jar oak-run-1.90.0.jar index "C:\aem-sdk\crx-quickstart\repository\segmentstore" --index-dump --index-paths=/oak:index/damAssetLucene-13 --fds-path="C:\aem-sdk\crx-quickstart\repository\datastore"

Launching Luke

AEM uses a custom language to save its search data. You must combine Luke with AEM's toolset to translate it.

AEM serializes binaries using a proprietary oakCodec. Standalone Luke will throw an IllegalArgumentException for the missing SPI. You must merge oak-run and luke-all into a single Java classpath during launch. In the Luke UI, target the path input strictly to the generated \data sub-directory.

java -cp luke-all-4.7.0.jar;oak-run-1.90.0.jar org.getopt.luke.Luke

Troubleshooting Cheat Sheet

Error

Cause

Fix

Dumped (0 B)

Partial path match.

Use the exact, versioned node name from CRXDE (e.g., damAssetLucene-13).

IllegalStateException: read external blob

Missing Datastore reference.

Append --fds-path pointing to repository/datastore.

IndexNotFoundException

Luke pointing to parent metadata folder.

Append \data to the path in the Luke UI browser.

IllegalArgumentException: oakCodec

Missing custom AEM codec SPI.

Launch Java using -cp with both the Luke and oak-run JARs.

  1. Download oak-run

  2. Download Luke

Lucene Index with Luke