How do I disable indexing of attachments

Sometimes a user can experience problems indexing large MSExcel or MSPowerPoint documents and the reindexing may cause potential Unknown Ptg warning messages that are harmless. There is already a request to Suppress these warnings from the re-indexing of unreadable documents by the POI library.

The error is usually not serious yet can sometimes cause problems when large attachments are used. So you may like to disable indexing of a particular type of document.

これを行うには次のいずれかの方法を利用できます。

方法 1: 管理コンソールを利用

You can disable the relevant modules from the Attachment Extractors or Office Connector plugins, by going to Administration -> Configuration -> Plugins and disabling the relevant plugin modules:

To disable the indexing of PDF attachments, go to the Attachment Extractors plugin and disable the following module:
- PDF Content Extractor — For PDF attachments
To disable the indexing of Office attachments, go to the Office Connector plugin and disable the following modules as required:
- Word Content Extractor — For Word 97/2007 (.doc and .docx) attachments
- PowerPoint 97 Content Extractor — For PowerPoint 97 (.ppt) attachments
- PowerPoint 2007 Content Extractor — For PowerPoint 2007 (.pptx) attachments
- Excel 97 Content Extractor — For Excel 97 (.xls) attachments
- Excel 2007 Content Extractor — For Excel 2007 (.xlsx) attachments

The search query will ignore all attachments of the type corresponding to the disabled module.

Method 2: Editing the `atlassian-plugin.xml` files of plugins

You need to modify the content of the atlassian-plugin.xml file in the following JAR files and comment out the relevant file type extractor:

confluence-attachment-extractors-x.x.jar (for PDF) or
OfficeConnector-x.x.jar (for Office files)

Both of these JAR files are located in the confluence\WEB-INF\classes\classes\com\atlassian\confluence\setup\atlassian-bundled-plugins.zip file.

If you are unfamiliar with modifying JAR files, please refer to the Editing Files within JAR Archives document for further information.

You can identify file type extractors in atlassian-plugin.xml files by the occurrence of ContentExtractor in their key attribute.

特定のファイルタイプの ContentExtractor を無効化すると、そのタイプのすべてのファイルが検索対象外になります。

The example below shows a pdfContentExtractor disabled which would prevent PDF attachments from being indexed.

<atlassian-plugin key="com.atlassian.confluence.plugins.attachmentExtractors" name="Attachment Extractors">
    <plugin-info>
        <description>This plugin extracts searchable text from various attachment types.</description>
        <version>1.1</version>
        <vendor name="Atlassian Pty Ltd" url="http://www.atlassian.com/"/>
    </plugin-info>

    <!--
    <extractor name="PDF Content Extractor" key="pdfContentExtractor" class="com.atlassian.bonnie.search.extractor.PdfContentExtractor" priority="1100">
        <description>Indexes contents of PDF files</description>
    </extractor>
    -->

</atlassian-plugin>

The following table shows the file type extractors in the atlassian-plugin.xml of the OfficeConnector-x.x.jar file, which require commenting out to prevent indexing:

添付ファイルのタイプ	ファイルタイプ抽出
Word 97/2007 (`.doc` および `.docx`)	<extractor name="Word Content Extractor" key="wordContentExtractor" class="com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor" priority="1099"> <description>Indexes contents of Word 97/2007 files</description> </extractor>
PowerPoint 97 (`.ppt`)	<extractor name="PowerPoint 97 Content Extractor" key="ppt97ContentExtractor" class="com.atlassian.confluence.extra.officeconnector.index.powerpoint.PowerPointTextExtractor" priority="1099"> <description>Indexes contents of PowerPoint 97 files</description> </extractor>
PowerPoint 2007 (`.pptx`)	<extractor name="PowerPoint 2007 Content Extractor" key="ppt2k7ContentExtractor" class="com.atlassian.confluence.extra.officeconnector.index.powerpoint.PowerPointXMLTextExtractor" priority="1099"> <description>Indexes contents of PowerPoint 2007 files</description> </extractor>
Excel 97 (`.xls`)	<extractor name="Excel 97 Content Extractor" key="excel97ContentExtractor" class="com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor" priority="1099"> <description>Indexes contents of Excel 97 files</description> </extractor>
Excel 2007 (`.xlsx`)	<extractor name="Excel 2007 Content Extractor" key="excel2k7ContentExtractor" class="com.atlassian.confluence.extra.officeconnector.index.excel.ExcelXMLTextExtractor" priority="1099"> <description>Indexes contents of Excel 2007 files</description> </extractor>

ページツリー

How do I disable indexing of attachments

方法 1: 管理コンソールを利用

Method 2: Editing the atlassian-plugin.xml files of plugins

Method 2: Editing the `atlassian-plugin.xml` files of plugins