Thursday 22 March 2007

Crawling like Ants

During experiments yesterday I discovered that running java processes from inside the Ant build tool can have unforseen performance issues. I had written an Ant task to build the browse indices for my DSpace system. This involves producing 9 separete indices for around 80,000 records, and is not a rapid process at the best of times. Previous executions of this code have yielded index rates of approximately 10 - 15 items per second, which I was pretty happy with. Running it in Ant, though, dropped my performance right down to a low of 1 item every 2 seconds! After this had run for several hours I killed it, and tried it again directly from the command line; up came the performance again to usual standard.

So, what is going on? Here are some details:

- while indexing in Ant, the box was under almost no load - no physical memory shortages, disk io bottlenecks, etc.

- I toyed with the idea that memory allocation to the JVM was the problem, but I've seen the indexer run with different memory allocations, and it has so far never caused a speed problem (just OutOfMemory errors)

Answers on a postcard. Or in a comment.

7 comments:

Jim said...

Gaaargh! Ant is not an execution environment!

Jim said...

On a more serious note - how low have you pushed the memory allocation to the indexing process. I've no idea what the default for processes running under ant is.

Perhaps ant defaults its sub-processes to a low priority? Grabbing at straws, really.

Richard said...

Jim, you are right, of course, Ant is not an execution environment :) But. The initial data import and index process is an installation procedure, and I needed to provide a script that our sytem administrators can use which would build everything from scratch to the final thing at the flick of a command, as it were. Everything else was in ant, so I figured why not.

The memory allocation for the index process is based on what you give the JVM at the start (so I have modified my dsrun to give me lots and lots of memory on the production environment). I don't expect that it is this, though, as I've run the importer with -Xmx256M and -Xmx2056M with the same performance. Low prioritising the thread seems possible, although there was barely anything else running on the box!

Jim said...

So are you running it directly as a java task or as an exec task to dsrun?

Richard said...

As a Java task:

<!-- build the browse indices -->
<java classname="org.dspace.browse.IndexBrowse"
classpathref="build.class.path"
fork="yes"
failonerror="yes"
maxmemory="2056m">
<sysproperty key="log4j.configuration" value="file:${live}/config/log4j.properties"/>
<sysproperty key="dspace.configuration" value="${dscfg}"/>
<arg line="${flag}" />
</java>

Jim said...

OK. Had a quick look at the Java task in ant. It looks like using fork='true' ends up using Runtime.exec.

1/ Perhaps your Runtime exec launches into some screwy shell that slows all the IO down to a crawl (not sounding v likely).

2/ Perhaps your indexing has actually finished, but Ant's watchdog thread that alerts Ant when the indexing has finished is failing to do so. This would explain the 0 processor load...

Have you tried it with fork=false?

Anonymous said...

Good read! Thanks!