-
Notifications
You must be signed in to change notification settings - Fork 295
Closed
Description
This issue was initially brought up by Mohamed Elsayed on the openwayback-dev group:
https://groups.google.com/forum/#!topic/openwayback-dev/Kv57MEzOAqw
What follows is quoted from post above...
Running under either OpenJDK IcedTea6 1.12.6 or Oracle JDK 1.8.0-b132, requests through the ARCRecordingProxy (tested through the ARCUnwrappingProxy) give 'HTTP 504 Gateway Timeout' on the first fresh attempt after a Tomcat restart, as seen in liveweb/arcs/live-.arc.gz. Then, all subsequent requests just say "connecting" for a very long time.
This was tested in Tomcat 6.0.35 on Debian 7.
This is the live web configuration (LiveWeb.xml):
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">
<bean name="8099" class="org.archive.wayback.liveweb.ARCRecordingProxy">
<property name="arcCacheDir">
<bean class="org.archive.wayback.liveweb.ARCCacheDirectory"
init-method="init">
<property name="arcDir" value="${wayback.basedir}/liveweb/arcs/" />
<property name="arcPrefix" value="live" />
</bean>
</property>
<property name="cacher">
<bean class="org.archive.wayback.liveweb.URLtoARCCacher">
<property name="recorderCacheDir" value="${wayback.basedir}/liveweb/tmp/" />
<property name="backingFileBase" value="recorder-tmp" />
<property name="userAgent" value="ia_archiver(OS-Wayback)" />
<property name="connectionTimeoutMS" value="10000" />
<property name="socketTimeoutMS" value="10000" />
</bean>
</property>
</bean>
<bean name="8098" class="org.archive.wayback.liveweb.ARCUnwrappingProxy">
<property name="proxyHostPort" value="localhost:3128" />
</bean>
<bean id="proxylivewebcache"
class="org.archive.wayback.liveweb.RemoteLiveWebCache">
<property name="proxyHostPort" value="localhost:8099" />
<!--
If you've set up a local squid/varnish to cache requests to the above
ARCRecordingProxy, you should use the port for that, instead of 8099:
<property name="proxyHostPort" value="localhost:3128" />
-->
</bean>
<bean id="excluder-factory-robot" class="org.archive.wayback.accesscontrol.robotstxt.RobotExclusionFilterFactory">
<property name="maxCacheMS" value="86400000" />
<property name="userAgent" value="ia_archiver" />
<property name="webCache" ref="proxylivewebcache" />
</bean>
</beans>
And this is the exclusion filter configuration (from wayback.xml):
<import resource="LiveWeb.xml"/>
<bean id="excluder-factory-robot" class="org.archive.wayback.accesscontrol.robotstxt.RobotExclusionFilterFactory">
<property name="maxCacheMS" value="86400000" />
<property name="userAgent" value="ia_archiver" />
<property name="webCache" ref="proxylivewebcache" />
</bean>
<!--
The 'excluder-factory-static' bean defines an exclusionFactory object which
consults a local text file containing either URLs or SURTs of content to
block from the ResourceIndex. These URLs or SURTs are treated as prefixes:
"http://www.archive.org/ima" will block anything starting with that string
from being returned from the index.
-->
<!--
<bean id="excluder-factory-static" class="org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory">
<property name="file" value="/var/tmp/os-cdx/exclusion-2008-09-22-cleaned.txt" />
<property name="checkInterval" value="600000" />
</bean>
-->
<!--
The 'excluder-factory-composite' bean creates a single exclusionFactory
which restricts from both a static list of URLs, and also by live web
robots.txt documents.
-->
<!--
<bean id="excluder-factory-composite" class="org.archive.wayback.accesscontrol.CompositeExclusionFilterFactory">
<property name="factories">
<list>
<ref bean="excluder-factory-static" />
<ref bean="excluder-factory-robot" />
</list>
</property>
</bean>
-->