How to configure Nuxeo 5.2 to use JCR

This section deals with fulltext indexing with Nuxeo 5.2 and Jackrabbit. Please read Nuxeo 5.2 with JCR and PostgreSQL for generic information and PostgreSQL-specific configuration.

To allow fulltext indexing of attached documents in Nuxeo, the general information at http://wiki.apache.org/jackrabbit/IndexingConfiguration is relevant.

For Nuxeo, use the following steps:

  1. start Nuxeo once with the general Jackrabbit configuration described in other documents,
  2. stop Nuxeo,
  3. create the file $NUXEO/server/default/data/NXRuntime/repos/default/workspaces/default/indexing_configuration.xml as described below,
  4. modify the file $NUXEO/server/default/data/NXRuntime/repos/default/workspaces/default/workspace.xml as described below,
  5. start Nuxeo again.
The file indexing_configuration.xml should contain your chosen indexing configuration, you can take example on the following:
<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:jcr="http://www.jcp.org/jcr/1.0"
               xmlns:ecmdt="http://nuxeo.org/ecm/jcr/docs"
               xmlns:ecmft="http://nuxeo.org/ecm/jcr/fields">
  <aggregate primaryType="ecmdt:File">
    <include primaryType="ecmft:content">*</include>
    <include primaryType="ecmft:content">*/*/*</include>
  </aggregate>
</configuration>

The first include element will find all content nodes that are children of the main document's node, this covers the structure of the file schema, where there is a child ("content") having a property holding the binary data. The second include element will cover the structure of nodes used by the files schema, where there is a child ("files") that has children (one for each attached file in the list), who themselves have a child ("file") having a property holding the binary data. The lower nodes we're interested in have type "ecmft:content".

The file workspace.xml should be modified to add the path to your indexing_configuration.xml file, usually this means having:

<SearchIndex class="org.nuxeo.ecm.core.repository.jcr.jackrabbit.SearchIndex">
  <param name="path" value="${wsp.home}/index"/>
  <param name="indexingConfiguration" value="${wsp.home}/indexing_configuration.xml"/>
  <param name="textFilterClasses" value="org.apache.jackrabbit.extractor.MsWordTextExtractor,
                                org.apache.jackrabbit.extractor.MsExcelTextExtractor,
                                org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,
                                org.apache.jackrabbit.extractor.PdfTextExtractor,
                                org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,
                                org.apache.jackrabbit.extractor.RTFTextExtractor,
                                org.apache.jackrabbit.extractor.XMLTextExtractor"/>
  <param name="extractorPoolSize " value="2"/>
</SearchIndex>
Version 3.2 last modified by Thierry Martins on 24/06/2009 at 17:05

Comments 0

No comments for this document

Attachments 0

No attachments for this document

Creator: Florent Guillaume on 2009/04/26 18:21
© 2008-2010 Nuxeo
1.3.8295