How to exclude extracted text from attachments directory backup

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

This guide only applies to Confluence Server or Data Center 6.5 and later.

目的

When a file is uploaded in Confluence, its text is extracted and indexed so that people can search for the content of a file, not just the filename. From Confluence 6.5, we store this extracted text in the filesystem alongside the attached file, so that when that file needs to be reindexed (for example, when the page it's attached to changes), we don't need to re-extract the content of the file.  We'll only re-extract the content when a new version of the file is uploaded, and store extracted text for the latest version of the, not earlier versions.

The files containing the extracted text are generally quite small, but over time this can add up to a lot of additional files, and increase the total size of the attachments directory backup (part of your home / shared home directory). For this reason, you might want to exclude these files when backing up your attachments directory.

ソリューション

To exclude these files from your backup, you can rely on the file extension which is always .extracted_text. For example, the following unix shell script backs up attachments without including the extracted text files.

$tar -czf attachments.tar.gz –exclude '.extracted_text' ./shared/attachments


(info) The extracted text files are also not included when you perform a space or site export. 


最終更新日: 2022 年 2 月 22 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.