Wednesday, July 15, 2009

Points about SharePoint Search

Have a growing index – at about 3 million items – not humongous but enough to cause crawl headaches.

So here are some lessons learnt:

  1. Use a dedicated web front end server for crawl – preferably one that doesn’t get user traffic.  The caveat here is that in a farm where you load balance your web front ends, with multiple web apps that each have their own IP assigned in IIS, on your Index server in Central Administration website (CA), you have to set the value to “Use all web front ends for crawling.” – Say what? – Yes, if you pick a dedicated crawl server in CA, SharePoint does this thing where it automatically adds the Host entries to the hosts file on your index server – and it grabs the default IP from that server and associates to all of your web apps (timer job overwrites value so don’t bother trying to correct the IPs to point to your load balanced member IPs).  So what you do is set to crawl all web front ends in CA – which will stop this host entry re-write madness – and than go to the Index server and enter the correct IPs (even though in CA, you set to crawl all front ends, the index server will use whatever is in hosts file instead).  So enjoy the added maintenance task of updating the hosts file every time you add a new host header site collection  To their credit, Microsoft documents this here (http://technet.microsoft.com/en-us/library/cc261810.aspx).  Also, MS enterprise search team mentions too many crawl hosts can starve your crawl here – enforces the dedicated crawl server scenario.
  2. Know the stasdm commands osearch and spsearch.  We had an admin reset all the search services, in attempt to resolve crawl issues, and he really screwed things up – he’s no longer an admin – he turned on WSS Help search through CA, but CA doesn’t let you pick an Index location for WSS search (Only Office Server Search – MOSS search – has this option in CA interface) so we started getting alerts about the C:\drive running out of space.  I learned that you can only set this value through the stsadm -o spsearch -indexlocation switch.  We have an automated farm install that shells out to stsadm which set all of these search settings over a year ago when we built this farm – so the guy that created the script, who knew about this, accommodated for it – the admin did not.
  3. Make sure your search crawl account has FULL READ only in Policy for Web Application setting in CA.  An admin set to FULL CONTROL which causes search to crawl unpublished items – again, he’s not an admin any longer.
  4. If you’re fortunate enough to have a dedicated Index server, set Indexer Performance to Maximum.  Keep an eye on this with some performance monitors and tune with crawler impact rules (part of MS search team’s post from link in item 1 above).  Right now I have a crawler impact rule against all sites to Request 64 documents at a time.  Performance monitors okay with this for now but I’m keeping an eye on this.
  5. Make sure you’re using RAID 1+0 on all disks related to search (Index, Query, and DB).  We designed the search DB to reside on it’s own dedicated disk but someone came along and put a bunch of content DBs on the same disk so we had to go through extra maintenance task of moving search DB file.
  6. Read the search team’s post about crawl performance (see link in item 1 above).  With the guidance in that post, it prompted me to stop all of our content source crawls, which were starving, and rerun a full crawl of each individually (we have over 5000 site collections in this farm and use only host header URL site collections for portals so we have content sources broken out to crawl managed path based team sites in one web app, portals in 4 separate web apps, my sites, and some individual site content crawls that have higher SLA for crawl)…thought is that this should speed up future incremental crawls and will also help in configuring better crawl schedule.
  7. On your dedicated crawl front end server, disable Internet Explorer Enhanced Security.  Was getting thousands of errors with message “An unrecognized http status was received…”  MS KB says to turn off proxy server settings in IE – we don’t have proxy server settings in IE.  We disable IE Enhanced security on all of our WFE servers anyway but this one snuck by.  I disabled through Add/Remove Programs > Windows Components, did a new, full crawl on an 850,000 item content source and watched errors go from over 20,000 to 1000 with 1.3 million items now attributed to this content source.
  8. Be careful with metadata property mappings.  Someone added a mapping that made the This list.. contextual search stop functioning properly.  MS KB968476 addresses this.  However, I think this KB is missing steps.   If the KB doesn’t correct the search issue, perform extra step of going to Metadata Properties > browse to find the “Path” Managed Property and remove all mapping except for Basic:9 (text).
  9. Don’t stop search services or add new query server while crawl is running.  Pause or stop crawl before making changes to search configurations.

If you get a chance, check out the Microsoft Enterprise Search Team Blog (http://blogs.msdn.com/enterprisesearch/)

Thursday, July 9, 2009

SharePoint Load Balancing with F5 BIG-IP LTM

NOTE: Since originally posting, I realized you may want to do this by binding to individual ports in IIS instead of individual IPs (saves the need for all of those extra IPs and steps to add and configure as presented below, i.e. the members can be any port but the VIP would still be port 80)

In the MOSS 2007 deployments a co-worker and I designed for my company, the F5 BIG-IP Local Traffic Manager is the network device used to load balance Web Applications.  In the F5 configuration (sample below), each client facing MOSS Web Application requires a VIP (Virtual IP Address).  The VIP has several host (AKA member) server IP addresses associated to it to make up a Virtual Server Pool.  The VIP is what we associate to the end-user URL in DNS.

In sample, you see the MOSS Web Front Ends (WFE) listed as Hosts.  One WFE host can be included as a member in many Virtual Server Pools; however, one WFE host IP address cannot be a member of more than one Virtual Server Pool (Huh?..You may need to re-read that last sentence a few times – graphic at bottom of post should help).

So for the WFE hosts, get as many IP addresses as there are MOSS Web Applications (consider getting even more in case the need to add or extend web apps in the future arises…it will).  The IPs are added to the Network Adapter on each WFE (steps with screenshots here).  Once the IPs are added to the Network Adapter, they become available in IIS so a unique IP can be assigned to each IIS Website (Note:  While you’re in the IIS website properties, remove the host header from IIS website that was created when you first added the web app, otherwise, you’ll never be able to reach host header site collections created in this web app by adding a DNS alias to the VIP – don’t worry about this if you’re going with managed path based sites…I’ve said too much already…moving on).  As an example, in the figure below, there are 4 client facing web applications (Portal 1, Portal 2, Team Sites, and My Sites) and each of the 5 WFE Hosts has 4 unique IP addresses.

Further, the design takes load weights into consideration.  For high availability, the deployments consist of multiple WFEs.  With the multiple WFEs, you can distribute the load to maximize performance and availability.  MOSS installs all web applications and their corresponding application pool on every WFE host within the farm. However, to maximize CPU, not every application pool runs on every host (requires configuration of application pools performance settings – Joel Oleson has guidance here – If you’re dealing with load weights using multiple front ends and web apps, this can get complex -- I’ll post my PowerShell script for configuring IIS on multiple servers with multiple websites by reading in settings from a CSV file in a future post).

Using load weights, traffic load is distributed only on a select number of hosts allowing hosts to concentrate processes on a few application pools instead of all application pools.  The design leaves us with spare IPs.  This provides flexibility to add or remove WFEs to the Virtual Server Pools as needed.

 

lb

For more information about load balancing and F5, the F5 folks have a SharePoint deployment guide here.  The also have a nice Load Balancing 101 paper here.

Friday, July 3, 2009

SharePoint and Novell eDirectory LDAP

Setting up an LDAP authentication source in SharePoint is well documented so I'm not going to recreate the wheel here but I will give some of the nuances of how I deployed and glue together the documentation.

My company has invested in Novell eDirectory for it's identity management deployment. With eDirectory, we're able to consolidate and sync multiple LDAPs so current users can experience a single-sign-on experience. It also provides account self-provisioning and management for users via an externally facing website.

First the links:
  • Microsoft documents multiple authentication providers for SharePoint and setting up an LDAP membership provider with examples here.
  • Nick Kellett did a great job documenting every step required to connect the eDirectory LDAP here.
  • Wen He provided a piece Nick missed to expose the LDAP through the SharePoint people picker here.
  • My own blog details some steps to push out the web.config changes to all of your farm web apps here.

Now here's what I did:

  • Planned out the membership provider attributes to use in the web.config.
  • Downloaded and installed the free Softerra LDAP browser recommended in Nick Kellett's post on a Web Front End Server in my test Farm.
  • Obtained the Novell LDAP server name and port information from the team at my company that manages Novell eDirectory.
  • Entered server information and port into the LDAP Browser and started browsing objects.
  • Drilled down the browser to the user container object and simply right clicked on the container and took the LDAP path from properties screen to use in the the provider.


  • Started browsing the available attributes of users.
  • Decided on using the mail attribute for the login in the provider. Setting userNameAttribute='mail' so when people use this form of authentication, they will use e-mail address as their username. We have both employees and non-employee content managers using this authentication to SharePoint sites in our DMZ. With this, our employees can login using the same password they use to login to our network in conjunction with their e-mail address. For external users, they can also login with their e-mail address and password set and maintained by visiting an externally facing site.
  • Tested and discovered the useDNAttribute="false" has to be set -- which kind of renders some of the other attributes useless -- but value of "true" makes your entire provider useless
  • My provider looks like this:
  • <add name='IDM' 
    type='Microsoft.Office.Server.Security.LDAPMembershipProvider, Microsoft.Office.Server, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71E9BCE111E9429C'
    server='[ldap server name]'
    port='389'
    useSSL='false'
    useDNAttribute='false'
    userDNAttribute='cn'
    userNameAttribute='mail'
    userContainer='[Ldap path discovered using ldap browser]'
    userObjectClass='person'
    userFilter='(ObjectClass=*)'
    scope='Subtree'
    otherRequiredUserAttributes='sn,givenname,cn'>

  • The usage will be in a DMZ so I had firewall opened up on ports 389 and 686 (used for SSL) bi-directionally between the internal NIC IP addresses of all of the Web front end servers and the eDirectory LDAP server IP.
  • Through SharePoint Central Administration site, extended web apps to Custom zone (our default zone uses Windows Based Authentication).
  • Manually added the peoplepicker and provider values to the Central Admin site web.config.
  • Ran Powershell script detailed in my previous blog to update the web.config files for all of my portal web applications (see link above).
  • Updated the Zone provider for all the portals via a batch file -- for this, I threw in stsadm command like following for all the newly extended sites (used batch file to save my Operations team from having to perform same steps through Central Admin -- also, this prevents chance of them putting in wrong value):
  • stsadm -o authentication -url http://[extended site host header value] -type forms -membershipprovider [membership provider name] -allowanonymous
  • In Central Administration, added site collection administrator for one of the root portal sites (browsed through people picker and was able to select my e-mail address).
  • Opened IIS and browsed new website where I was prompted with the standard Sharepoint forms login page. Entered my e-mail address and network password and successfully signed in.