Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,11 @@ A basic Stroom index designed for `event-logging` XML.
| [v1.0](https://github.com/gchq/stroom-content/releases/tag/example-index-v1.0) | No | No | Y |


### Other Content

* _Proxy_
* [squidplus-proxy](./source/proxy/squidplus-proxy/README.md) `Agent and Stroom content`

## Building the content packs

Each content pack is defined as a directory within _stroom-content-source_ with the name of content pack being the name of the directory.
Expand Down
22 changes: 22 additions & 0 deletions source/proxy/squidplus-proxy/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Change Log

All notable changes to this content pack will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).

## [Unreleased]

### Added

### Changed

### Removed

## [squidplus-proxy-v1.0]

Initial version.


[Unreleased]: https://github.com/gchq/stroom-content/compare/squidplus-proxy-v1.0...HEAD
[squidplus-proxy-v1.0]: https://github.com/gchq/stroom-content/compare/squidplus-proxy-v1.0...squidplus-proxy-v1.0
54 changes: 54 additions & 0 deletions source/proxy/squidplus-proxy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# _squidplus-proxy_ Content Pack

## Summary

The _squidplus-proxy_ Content Pack provides both client artefacts that acquire then post Squid access logs to a Stroom instance and Stroom content artefacts that will normalise the Squid access logs into the Stroom [`event-logging-schema`](https://github.com/gchq/event-logging-schema) format.

This package does not use the standard Squid access log format, but a bespoke format called _squidplus_. This format records additional information such as port numbers, complete request and response headers, additional status information, data transfer sizes and next hop host information.

Client deployment information can be found in the supplied [README](clientArtefacts/README.md) file.


## Stroom Contents

The following represents the folder structure and content that will be imported in to Stroom with this content pack.

* _Event Sources/Proxy/Squid-Plus-XML_
* **Squid-Plus-XML-V1.0-EVENTS** `Feed`

The feed used to store and process Squid Proxy events using the enriched SquidPlus Proxy XML format

* **Squid-Plus-XML-V1.0-EVENTS** `Xslt`

The xslt translation to convert the SquidPlus Proxy XML format into <Event> type XML.

* **Squid-Plus-XML-V1.0-EVENTS** `Pipeline`

The pipeline to process SquidPlus Proxy XML format into <Event> type XML.

### Dependancies

| Content pack | Version | Notes |
|:------------ |:------- |:----- |
| [`template-pipelines` Content Pack](../../../template-pipelines/README.md) | [v0.3](https://github.com/gchq/stroom-content/releases/tag/template-pipelines-v0.3) | Content pack element is the Event Data (XML) Pipeline |
| [`event-logging-xml-schema` Content Pack](../../../event-logging-xml-schema/README.md) | [v0.3](https://github.com/gchq/stroom-content/releases/tag/event-logging-xml-schema-v3.2.3) | Content pack element is the Event Logging Schema |

## Client Contents

The client artefacts are

* **README.md** `Document`

Basic documentation to configure and deploy the SquidPlus logging capability on a Linux Squid server.

* **squidplusXML.pl** `Script - Perl`

Perl script that ingests squidplus format Squid logs, correcting for possible errant log lines (due to large reqest/response header values), resolves and adds fully qualified domain names from IP addresses and converts to a simple XML format.

* **squid_stroom_feeder.sh** `Script - Bash`

Bash script that orchestrates the rolling over of the Squid logs, runs the **squidplusXML.pl** script then posts the resultant output to the appropriate feed within Stroom.

## Documentation Contents

There are no separate documentation artfacts.
7 changes: 7 additions & 0 deletions source/proxy/squidplus-proxy/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
//squidplus-proxy

dependencies {
compileSource project(path: ':event-logging-xml-schema', configuration: 'distConfig')
compileSource project(path: ':template-pipelines', configuration: 'distConfig')
}

85 changes: 85 additions & 0 deletions source/proxy/squidplus-proxy/clientArtefacts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Synopsis
The script `squid_stroom_feeder.sh` and it's supporting perl script, `squidplusXML.pl`, is designed to run from a crontab entry that periodically collects and enriches events from an appropriately configured single Squid Proxy service and posts the enriched events to an instance of Stroom. It is expected that

- your squid deployment has been set up to generate SquidPlus format squid logs (see later)
- Stroom has been configured to accept streams of events in the Squid-Plus-XML-V1.0-EVENTS feed.

If you need to deploy multiple Squid Proxy services on the one system then you WILL need to modify the `squid_stroom_feeder.sh` and Squid configurations to cater for the multiple instances.
NOTE: Currently the SquidPlus format captures both 'original receive request header' (%>h) and 'reply header' (%<h). There is a risk in poorly secured environment, where authentication requests are serviced in the clear, that credentials may be captured in 'Authorization' header variables. A Squid-cache bugzilla enhancement request exists to allow users to add a 'header name defeat' concept to the existing optional header field name/value filter mechanism. Reference is http://bugs.squid-cache.org/show_bug.cgi?id=4737


# Storage Imposts
- Deployment size of approx 40K
- Temporary storage of up to 8.00GB for a period of 90 days (given failure of web service) within the configured Squid log directory (see later). These values are the default posture, if you need to change this, then edit `squid_stroom_feeder.sh` and change one or both of the **FAILED_RETENTION** and **FAILED_MAX** environment variables

# Prerequisites
- Squid configured appropriately (see later)
- requires the bind-utils, coreutils, curl, net-tools, gzip and sed packages. Also requires both perl and perl(XML::Simple) packages as well (including perl-Socket6, perl-XML-Simple)

# Capability Workflow
The workflow is such that, every 10 minutes cron starts a script which
- delays randomly between 7 and 580 seconds before collecting audit data in order to balance network load. One doesn't want many Linux systems 'pulsing' the network every 10 minutes.
- runs the squid command with the -k rotate to roll over the current squid log file
- runs the supporting perl script `squidplusXML.pl` on all rolled over log files (ie log files of the form 'access.log.N') which parses and enriches the original logs reformatting them as simple XML in a queuing directory
- the resultant log files are gzip'd within the queuing directory
- all files in the queue directory are posted to the Stroom web service. On failure the files are left in place and will be posted on the next iteration when the web service is available. A protection mechanisms that places a size and time limit on the amount of files stored is applied to remove (age-off) the oldest files.

# Squid Configuration
The main Squid configuration file, `/etc/squid/squid.conf`, holds the configuration information for Squid Access logging. The default posture is to log using the standard 'squid' format via the directive
```
access_log stdio:/var/log/squid/access.log squid
```
which results in standard squid logs being collected in /var/log/squid/access.log.

To gain better information from the Squid proxy we use the so-called SquidPlus log format which gathers more information about a proxy transaction. We also save the resultant SquidPlus format logs in a different directory to deconflict any pre-configuration log rotation mechanisms. Further, by default, Squid does not log query terms on URLs. Should you require the terms, you should turn off the `strip_query_terms` option. So the configuration changes need to be
- Create the SquidPlus log directory and also a squidplus log queuing directory. Note these directories are preset in the `squid_stroom_feeder.sh` script so if you change them, change the script (see **STROOM_LOG_QUEUED** and **STROOM_LOG_SOURCE** variables). Ensure the directories are writable by the appropriate user (typically squid:squid)

```bash
mkdir /var/log/squid/squidCurrent /var/log/squid/squidlogQueue
chown squid:squid /var/log/squid/squidCurrent /var/log/squid/squidlogQueue
```
- Change the /etc/squid/squid.conf file to
- Comment out the existing 'access_log' directive
- Add a logformat directive to define the SquidPlus format
- Define a new 'access_log' directive to use the SquidPlus format that saves logs in tge /var/log/squid/squidCurrent directory
- Allow for log file rotation
- Optionally turn off the 'strip_query_terms' option
- If the proxy can use user attribution, then if the standard %un element doesn't record the attributed user identify, then use an appropriate directive
The above would look like


```
# Logging
# access_log stdio:/var/log/squid/access.log squid
logformat squidplus %ts.%03tu %tr %>a/%>p %<a/%<p %<la/%<lp %>la/%>lp %Ss/%>Hs/%<Hs %<st/%<sh %>st/%>sh %mt %rm "%ru" "%un" %Sh "%>h" "%<h"
logfile_rotate 10
access_log stdio:/var/log/squid/squidCurrent/access.log squidplus
```

```
# Turn off stripping query terms
strip_query_terms off
```
- Restart squid service


# Manual Deployment

You need to configure you Squid service as per above and after a restart, validate that the access log is now being formed in the file `/var/log/squid/squidCurrent/access.log`

Ensuring you have met the prerequisites (especially bind-utils and perl(XML::Simple)), you should deploy the scripts `squidplusXML.pl` and `squid_stroom_feeder.sh`. Ensuring the squid user can execute them. A location might be __/usr/audit/squid__. The following assumes both scripts have been deployed in /usr/audit/squid and ownerships set correctly.

You will need to modify the file `/usr/audit/squid/squid_stroom_feeder.sh` and change at least one variable and ensure other variables are appropriately set

You must change the **URL** variable to have it direct processed log files to your Stroom instance. By default it posts to `https://stroomp00.strmdev00.org/stroom/datafeed` as per
```
URL=https://stroomp00.strmdev00.org/stroom/datafeed
```

You may also want to change the **SYSTEM** and **mySecZone** variables as well. Further check that **STROOM_LOG_SOURCE** and **STROOM_LOG_QUEUED** are using the correct directories set up in the Squid configuration and also that the 'standard' Squid access log location is correct and that the log rotate mechanism will rotate it. We concatenate all our logs into this file, defined by the variable **PrimaryLog**

If you want to change the periodicity of execution ensure the **MAX_SLEEP** variable is LOWER than the crontab periodicity. The default is 10 minutes, so your crontab entry should look like
```
*/10 * * * * /usr/audit/squid/squid_stroom_feeder.sh >> /var/log/squid/stroom_squid_post.log
```

Loading