Have a growing index – at about 3 million items – not humongous but enough to cause crawl headaches.
So here are some lessons learnt:
- Use a dedicated web front end server for crawl – preferably one that doesn’t get user traffic. The caveat here is that in a farm where you load balance your web front ends, with multiple web apps that each have their own IP assigned in IIS, on your Index server in Central Administration website (CA), you have to set the value to “Use all web front ends for crawling.” – Say what? – Yes, if you pick a dedicated crawl server in CA, SharePoint does this thing where it automatically adds the Host entries to the hosts file on your index server – and it grabs the default IP from that server and associates to all of your web apps (timer job overwrites value so don’t bother trying to correct the IPs to point to your load balanced member IPs). So what you do is set to crawl all web front ends in CA – which will stop this host entry re-write madness – and than go to the Index server and enter the correct IPs (even though in CA, you set to crawl all front ends, the index server will use whatever is in hosts file instead). So enjoy the added maintenance task of updating the hosts file every time you add a new host header site collection To their credit, Microsoft documents this here (http://technet.microsoft.com/en-us/library/cc261810.aspx). Also, MS enterprise search team mentions too many crawl hosts can starve your crawl here – enforces the dedicated crawl server scenario.
- Know the stasdm commands osearch and spsearch. We had an admin reset all the search services, in attempt to resolve crawl issues, and he really screwed things up – he’s no longer an admin – he turned on WSS Help search through CA, but CA doesn’t let you pick an Index location for WSS search (Only Office Server Search – MOSS search – has this option in CA interface) so we started getting alerts about the C:\drive running out of space. I learned that you can only set this value through the stsadm -o spsearch -indexlocation switch. We have an automated farm install that shells out to stsadm which set all of these search settings over a year ago when we built this farm – so the guy that created the script, who knew about this, accommodated for it – the admin did not.
- Make sure your search crawl account has FULL READ only in Policy for Web Application setting in CA. An admin set to FULL CONTROL which causes search to crawl unpublished items – again, he’s not an admin any longer.
- If you’re fortunate enough to have a dedicated Index server, set Indexer Performance to Maximum. Keep an eye on this with some performance monitors and tune with crawler impact rules (part of MS search team’s post from link in item 1 above). Right now I have a crawler impact rule against all sites to Request 64 documents at a time. Performance monitors okay with this for now but I’m keeping an eye on this.
- Make sure you’re using RAID 1+0 on all disks related to search (Index, Query, and DB). We designed the search DB to reside on it’s own dedicated disk but someone came along and put a bunch of content DBs on the same disk so we had to go through extra maintenance task of moving search DB file.
- Read the search team’s post about crawl performance (see link in item 1 above). With the guidance in that post, it prompted me to stop all of our content source crawls, which were starving, and rerun a full crawl of each individually (we have over 5000 site collections in this farm and use only host header URL site collections for portals so we have content sources broken out to crawl managed path based team sites in one web app, portals in 4 separate web apps, my sites, and some individual site content crawls that have higher SLA for crawl)…thought is that this should speed up future incremental crawls and will also help in configuring better crawl schedule.
- On your dedicated crawl front end server, disable Internet Explorer Enhanced Security. Was getting thousands of errors with message “An unrecognized http status was received…” MS KB says to turn off proxy server settings in IE – we don’t have proxy server settings in IE. We disable IE Enhanced security on all of our WFE servers anyway but this one snuck by. I disabled through Add/Remove Programs > Windows Components, did a new, full crawl on an 850,000 item content source and watched errors go from over 20,000 to 1000 with 1.3 million items now attributed to this content source.
- Be careful with metadata property mappings. Someone added a mapping that made the This list.. contextual search stop functioning properly. MS KB968476 addresses this. However, I think this KB is missing steps. If the KB doesn’t correct the search issue, perform extra step of going to Metadata Properties > browse to find the “Path” Managed Property and remove all mapping except for Basic:9 (text).
- Don’t stop search services or add new query server while crawl is running. Pause or stop crawl before making changes to search configurations.
If you get a chance, check out the Microsoft Enterprise Search Team Blog (http://blogs.msdn.com/enterprisesearch/)