I've just run into a situation where the reuse of an SSL session caused an exception and Spidr subsequently skipped the page. Currently, the exception is silently swallowed, so I modified it to grab the following trace:
EOFError (end of file reached):
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/openssl/buffering.rb:174:in `sysread_nonblock'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/openssl/buffering.rb:174:in `read_nonblock'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:141:in `rbuf_fill'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:2562:in `read_status_line'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:2551:in `read_new'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1319:in `block in transport_request'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1293:in `request'
rest-client (1.6.7) lib/restclient/net_http_ext.rb:51:in `request'
/home/nirvdrum/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/net/http.rb:1026:in `get'
spidr (0.4.1) lib/spidr/agent.rb:513:in `block in get_page'
spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'
app/models/cookie_login_option.rb:150:in `fetch_remote_form'
app/models/cookie_login_option.rb:158:in `block in fetch_remote_form'
spidr (0.4.1) lib/spidr/agent.rb:518:in `block in get_page'
spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'
app/models/cookie_login_option.rb:150:in `fetch_remote_form'
app/models/cookie_login_option.rb:158:in `block in fetch_remote_form'
spidr (0.4.1) lib/spidr/agent.rb:518:in `block in get_page'
spidr (0.4.1) lib/spidr/agent.rb:684:in `prepare_request'
spidr (0.4.1) lib/spidr/agent.rb:512:in `get_page'
If I modify the code to remove the session cache, I am able to fetch the page okay. It might be good to catch EOFError and retry with a new session in the event this happens. Catching the error all over the place could be messy though.
I've just run into a situation where the reuse of an SSL session caused an exception and Spidr subsequently skipped the page. Currently, the exception is silently swallowed, so I modified it to grab the following trace:
If I modify the code to remove the session cache, I am able to fetch the page okay. It might be good to catch EOFError and retry with a new session in the event this happens. Catching the error all over the place could be messy though.