Crowd Synchronisation against Active Directory times out with a load balancer in place
Scenario - Does this apply to you?
After upgrading to Crowd 2.7 or higher, you may notice that synchronisations against your Active Directory may fail, with timeouts similar to the following:
2014-12-02 15:29:20,185 scheduler_Worker-6 ERROR [atlassian.crowd.directory.DbCachingDirectoryPoller] Error occurred while refreshing the cache for directory [ 13467652 ]. com.atlassian.crowd.exception.OperationFailedException: Error looking up attributes for highestCommittedUSN at com.atlassian.crowd.directory.MicrosoftActiveDirectory.fetchHighestCommittedUSN(MicrosoftActiveDirectory.java:807) at com.atlassian.crowd.directory.ldap.cache.UsnChangedCacheRefresher.synchroniseAll(UsnChangedCacheRefresher.java:159) at com.atlassian.crowd.directory.DbCachingRemoteDirectory.synchroniseCache(DbCachingRemoteDirectory.java:1120) at com.atlassian.crowd.manager.directory.DirectorySynchroniserImpl.synchronise(DirectorySynchroniserImpl.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at com.sun.proxy.$Proxy39.synchronise(Unknown Source) at com.atlassian.crowd.directory.DbCachingDirectoryPoller.pollChanges(DbCachingDirectoryPoller.java:50) at com.atlassian.crowd.manager.directory.monitor.poller.DirectoryPollerJobRunner.runJob(DirectoryPollerJobRunner.java:93) at com.atlassian.scheduler.core.JobLauncher.runJob(JobLauncher.java:135) at com.atlassian.scheduler.core.JobLauncher.launchAndBuildResponse(JobLauncher.java:101) at com.atlassian.scheduler.core.JobLauncher.launch(JobLauncher.java:80) at com.atlassian.scheduler.quartz1.Quartz1Job.execute(Quartz1Job.java:32) at org.quartz.core.JobRunShell.run(JobRunShell.java:223) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: org.springframework.ldap.UncategorizedLdapException: Uncategorized exception occured during LDAP processing; nested exception is javax.naming.NamingException: LDAP response read timed out, timeout used:120000ms.; remaining name '/'
Other applicable factors:
- There is a load balancer between Crowd and Active Directory
- The same setup with no changes has worked before the upgrade (for example, a customer reported no problems with Crowd 2.6.4; and only after the ugprade to 2.7+ did problems begin appearing)
There are several possible causes; which may or may not resolve the problem. After investigating each cause, proceed to the next cause if the timeouts persist:
Has "Follow Referrals" been enabled?
The most common cause of timeouts is due to "Follow Referrals" being enabled. Generally, these timeouts have two root causes
- The DNS for the domain is not valid, causing timeouts. See User Lookups Fail With PartialResultExceptions for more information
- A large domain (particularly if the domain is partitioned) can also cause similar timeouts. Disabling this option will prevent Crowd from following referrals into other partitions which should speed up sync time (but may not give a complete result)
If "Follow Referrals" has been enabled, try disabling it before performing a second synchronization.
Restricting the LDAP Scope
Using a smaller filter, see if you can limit the LDAP search to just a single, smaller OU; the smaller the better.
Does upgrading to Crowd 2.8.2 or higher resolve the problem?
Crowd 2.8.2 introduced some important improvements for Crowd performance; particularly in large directories. See the Crowd 2.8.2 Release Notes for more information.
Can you bypass the load balancer?
Bypass the load balancer; and connect directly to Active Directory. Bypassing the load balancer (even just temporarily) will help to confirm the load balancer as the cause of the problem (or remove it from consideration). Some customers have reported problems with certain load balancer products / configurations after upgrading to Crowd 2.7 or higher. The same configuration works without any problems in Crowd 2.6. Some customers have reported success with using HAProxy as a load balancer to Active Directory
Please note that the setup or configuration of a load balancer is not covered by Atlassian Support Offerings.
We are currently unsure as to why some load balancer configurations may time out during LDAP Searches. If you have information regarding timeouts against Active Directory with a load balancer after upgrading to Crowd 2.7, we'd love to hear from you!
Use one of the following workarounds:
- Bypass the load balancer if possible
- Utilize an alternative load balancer, such as HAProxy