-
Notifications
You must be signed in to change notification settings - Fork 772
Description
Dear Heritrix3 Community,
Thank you for this great tool! Please help me with this issue:
I am using version 3.10.0.
I need to crawl a site's previous version that has undergone a major upgrade. The old site was placed under a domain that the developers configured to be behind a basic login. (Every request header sent out includes the Authorization
field, which supplies credentials for basic authentication with the base64-encoded value of the username and password, as granted by the site administrators.)

I configured the job as I learned from the docs. So the crawl has these two beans for the basic authentication:
<bean id="credentialStore" class="org.archive.modules.credential.CredentialStore">
<property name="credentials">
<map>
<entry key="OLDSiteLoginCredential" value-ref="OLDSiteLoginCredential"/>
</map>
</property>
</bean>
<bean id="OLDSiteLoginCredential" class="org.archive.modules.credential.HttpAuthenticationCredential">
<property name="domain" value="https://old.site.edu:443"/>
<property name="realm" value="oldsiterealm"/>
<property name="login" value="myloginname"/>
<property name="password" value="passwordformyloginname"/>
</bean>
But every time I build and launch it, it stops and finishes with the DNS resolve, and two 401s regarding the main page URL and the robots.txt
401 381 https://old.site.edu/ - - text/html #001
401 381 https://old.site.edu/robots.txt P https://old.site.edu/ text/html #001
1 51 dns:old.site.edu P https://old.site.edu/ text/dns #001
Could you please help me identify what I am doing wrong here? Or would you happen to know how I should do this?
Thanks a lot!