Sitemap

Showing posts with label Indexing. Show all posts
Showing posts with label Indexing. Show all posts

Friday, June 19, 2015

UCM: Running indexer from the command line

First make sure that stand alone applets are working. If not working, please check Note 1265076.1. Basically first its resetting the password of sysadmin user like
UPDATE USERS SET DPASSWORD='welcome1' WHERE DNAME='sysadmin';
UPDATE USERS SET DPASSWORDENCODING='' WHERE DNAME='sysadmin';

and then setting the JDBC connection in the System Properties application.

Create indexer.hda. This is the data that needs to be present in the file:
<?hda version="5.1.1 (build011203)" jcharset=Cp1252 encoding=iso-8859-1?>
#Full Collection Rebuild
@Properties LocalData
IdcService=CONTROL_SEARCH_INDEX
cycleID=rebuild
action=start
getStatus=1
fastRebuild=0
GetCurrentIndexingStatus=1
PerformProcessConversion=1
@end
<<EOF>>

Run the following command
IdcCommand -f C:\Work\indexer.hda -u sysadmin -l C:\Work\indexer.log

NOTE: To perform a "fast" rebuild instead of a "full" collection rebuild, set fastRebuild=1 in the HDA file

NOTE: If you receive Error: Executing 'CONTROL_SEARCH_INDEX command, then make the following entry in intradoc.cfg file
IdcCommandServerHost=10.141.107.1 or IdcCommandServerHost=localhost

Thursday, June 18, 2015

UCM: Indexing

To configure UCM to place a .hcst file in the weblayout directory instead of a copy of the native file, set IndexVaultFile=true. This will work only when the file is a passthru file (didn't go through IBR). The .hcst file in the weblayout points to the vault file only.
IndexVaultFile=true
NOTE: IndexVaultFile=true was replaced with UseNativeFormatInIndex=true. Either of these configuration settings will force the indexer to index the native file.
NOTE: When using webless storage, use UseNativeFormatInIndex=true. IndexVaultFile=true should not be used at all.

If the above env variable is set as true, and still the user wants to allow some documents to be copied to the weblayout directory
IndexVaultExclusionWildcardFormats=*/hcs*|*/ttp|*/xsl|*/wml|*template*|*/jsp*|*/gif|*/png|*/pdf|*/doc*|*/msword|*/*ms-excel|text/plain


When a large file is being indexed, and textexport times out, you can increase the timeout. The default value is 15 seconds.
TextExtractorTimeoutInSec=60
IndexerTextExtractionGuardTimeout=60


UCM will not index files larger than 10485760(10 MB) by default unless the configuration entry MaxIndexableFileSize is set (in this example 20 MB). Setting this to 0 (zero) stops full text indexing but still allows use of Oracle Text Search. This is useful if you still need case insensitive searches but do not need full text indexing.
MaxIndexableFileSize=20971520


This parameter lists what formats will be text indexed. If a file format extension is not on the list, the textexport will not get invoked and it will be indexed as metadata only.
TextIndexerFilterFormats=pdf,msword,ms-word,doc*,ms-excel,xls*,ms-powerpoint,powerpoint,ppt*,rtf,xml,msg,zip

More information in depth: Doc ID 445871.1