How to crawl a site behind basic authentication (CredentialStore/HttpAuthenticationCredential ends up with 401)

Dear Heritrix3 Community,
 
Thank you for this great tool! Please help me with this issue:
I am using version 3.10.0.
 
I need to crawl a site's previous version that has undergone a major upgrade. The old site was placed under a domain that the developers configured to be behind a basic login. (_Every request header sent out includes the `Authorization` field, which supplies credentials for basic authentication with the base64-encoded value of the username and password, as granted by the site administrators._)

<img width="590" height="337" alt="Image" src="https://github.com/user-attachments/assets/befdfbc4-79d0-4ef1-b0d7-7ba09f3a462d" />

I configured the job as I learned from the docs. So the crawl has these two beans for the basic authentication:
 
```xslt
<bean id="credentialStore" class="org.archive.modules.credential.CredentialStore">
   <property name="credentials">
     <map>
       <entry key="OLDSiteLoginCredential" value-ref="OLDSiteLoginCredential"/>
     </map>
   </property>
</bean>

<bean id="OLDSiteLoginCredential" class="org.archive.modules.credential.HttpAuthenticationCredential">
   <property name="domain" value="https://old.site.edu:443"/>
   <property name="realm" value="oldsiterealm"/>
   <property name="login" value="myloginname"/>
   <property name="password" value="passwordformyloginname"/>
</bean>
```
 
But every time I build and launch it, it stops and finishes with the DNS resolve, and two 401s regarding the main page URL and the robots.txt
 
```
401        381 https://old.site.edu/ - - text/html #001
401        381 https://old.site.edu/robots.txt P https://old.site.edu/ text/html #001
1          51  dns:old.site.edu P https://old.site.edu/ text/dns #001
```
 
Could you please help me identify what I am doing wrong here? Or would you happen to know how I should do this?
Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to crawl a site behind basic authentication (CredentialStore/HttpAuthenticationCredential ends up with 401) #662

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to crawl a site behind basic authentication (CredentialStore/HttpAuthenticationCredential ends up with 401) #662

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions