Extractor plugin modules are available in Confluence 1.4 and later versions

Extractor plugins allow you to hook into the mechanism by which Confluence populates its search index.

Each time content is created or updated in Confluence, it is passed through a chain of extractors that assemble the fields and data that will be added to the search index for that content. By writing your own extractor you can add information to the index.

Extractor plugins can be used to extract the content from attachment types that Confluence does not support,

Confluence's internal search is built on top of the Lucene Java library. While familiarity with Lucene is not an absolute requirement for writing an extractor plugin, you'll need it to write anything more than the most basic of plugins.

Extractor Plugins

Here is an example atlassian-plugin.xml file containing a single search extractor:

<atlassian-plugin name="Sample Extractor" key="confluence.extra.extractor">
    ...
    <extractor name="Page Metadata Extractor" key="pageMetadataExtractor" 
               class="confluence.extra.extractor.PageMetadataExtractor" priority="1000">
        <description>Extracts certain keys from a page's metadata and adds them to the search index.</description>
    </extractor>
    ...
</atlassian-plugin>

As a general rule, all extractors should have priorities below 1000, unless you are writing an extractor for a new attachment type, in which case it should be greater than 1000.

If you are not sure what priority to choose, just go with priority="900" for regular extractors, and priority="1200" for attachment content extractors.

To see the priorities of the extractors that are built into Confluence, look in WEB-INF/classes/plugins/core-extractors.xml and WEB-INF/classes/plugins/attachment-extractors.xml. From Confluence-2.6.0, these files are packaged inside confluence-2.6.0.jar; we have instructions for Editing files within .jar archives if you're unfamiliar with the process.

The Extractor Interface

All extractors must implement the following interface:

package bucket.search.lucene;

import bucket.search.Searchable;
import org.apache.lucene.document.Document;

public interface Extractor
{
    public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable);
}

Attachment Content Extractors

If you are writing an extractor that indexes the contents of a particular attachment type (for example, OpenOffice documents or Flash files), you should extend the abstract class bucket.search.lucene.extractor.BaseAttachmentContentExtractor. This class ensures that only one attachment content extractor successfully runs against any file (you can manipulate the priorities of attachment content extractors to make sure they run in the right order).

For more information, see: Attachment Content Extractor Plugins

An Example Extractor

The following example extractor is untested, but it associates a set of page-level properties with the page in the index, both as part of the regular searchable text, and also as Lucene Text fields that can be searched individually, for example in a custom {abstract-search} macro.

package com.example.extras.extractor;

import bucket.search.lucene.Extractor;
import bucket.search.Searchable;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import com.atlassian.confluence.core.ContentEntityObject;
import com.atlassian.confluence.core.ContentPropertyManager;
import com.opensymphony.util.TextUtils;

public class ContentPropertyExtractor implements Extractor
{
    public static final String[] INDEXABLE_PROPERTIES = {"status", "abstract"};
    
    private ContentPropertyManager contentPropertyManager;
    
    public void addFields(Document document, StringBuffer defaultSearchableText, Searchable searchable)
    {
        if (searchable instanceof ContentEntityObject)
        {
            ContentEntityObject contentEntityObject = (ContentEntityObject) searchable;
            for (int i = 0; i < INDEXABLE_PROPERTIES.length; i++)
            {
                String key = INDEXABLE_PROPERTIES[i];
                String value = contentPropertyManager.getStringProperty(contentEntityObject, key);

                if (TextUtils.stringSet(value))
                {
                    defaultSearchableText.append(value).append(" ");
                    document.add(Field.Text(key, value));
                }
            }
        }
    }

    public void setContentPropertyManager(ContentPropertyManager contentPropertyManager)
    {
        this.contentPropertyManager = contentPropertyManager;
    }
}

デバッグ

There's a really primitive Lucene index browser hidden in Confluence which may help when debugging. You'll need to tell it the filesystem path to your $conf-home/index directory.

http://yourwiki.example.com/admin/indexbrowser.jsp