4.7.3 Index management
4.7.3.1 Optimizing the index
If you have a lot of documents in the repository, you can speed up fulltext searches by optimizing the index.
This is done as follows.
First go to the JMX console as explained in the
Follow the link titled: Daisy:name=FullTextIndexer
Look for the operation named optimizeIndex and invoke it (by pressing the Invoke button).
Afterwards, choose "Return to MBean view". In the IndexerStatus field, you will see an indication that the optimizing of the index is in progress. If you have a very small index, the optimizing might go so fast that it is already finished by the time you get back to that page. On larger indexes, the optimize procedure can take quite a bit of time.
4.7.3.2 Rebuilding the fulltext index
Rebuilding the fulltext index can be useful in a variety of situations:
- new or updated text extractors are available, and so you want to reindex the documents
- something went seriously wrong or you restored a backup and want to be sure the index is complete
- sometimes installing new Daisy versions requires rebuilding the index
The index can be rebuild for all documents or a selection of the documents.
If you want to completely rebuild the index, you might fist want to delete all the index files, which can be found in:
<daisydata dir>/indexstore
It is harmless to delete these, as the index can be rebuild at any time. Better don't delete them while the repository server is running though.
To trigger the rebuilding, go to the JMX console as explained in the
Follow the link titled: Daisy:name=FullTextIndexUpdater
In case you want to rebuild the complete index, invoke the operation named reIndexAllDocuments.
In case you want to rebuild the index only for some documents, you can use the operation reIndexDocuments. As parameter, you need to enter a query to select the documents to reindex. For example, to re-index all documents containing PDFs, you can use:
select id where HasPartWithMimeType('application/pdf')
What you put in the select-clause of the query doesn't matter.
After invoking the reindex operation, choose "Return to MBean view". Look at the attribute ReindexStatus. This will show the progress of the reindexing (refresh the page to see its value being updated). Or more correctly, of scheduling the reindexing. It is important that this ends completely before the repository server is stopped, otherwise the reindexing will not happen completely.
If you have a large repository, the ReindexStatus might show a long time the message "Querying the repository to retrieve the list of documents to re-index". This is because after just starting the repository, the documents still need to be loaded into the cache.
Note that the reindexing here only pushes reindex-jobs to the work queue of the fulltext indexer, the reindexing doesn't happen immediately.
To follow up the status of the actual indexing, go again to the start page of the JMX console, by choosing the "Server view" tab.
Over there, follow this link: org.apache.activemq:BrokerName=DaisyJMS,Type=Queue,Destination=fullTextIndexerJobs
Look for the attribute named QueueSize. This indicates the amount of jobs waiting for the fulltext indexer to process. Each time you refresh this page, you will see this number go lower (or higher of new jobs are being added faster than they are processed).
If you have a large index, it could be beneficial to optimize it after the reindexing finished, as explained above.
Previous