3
3
The aggregation framework is a powerful tool in your MongoDB toolbox. It allows
4
4
you to run complex queries on your data, shaping and modifying documents to suit
5
5
your needs. This power comes through a lot of different pipeline stages and
6
- operators, which comes with a certain learning challenge. MongoDB Compass comes
7
- with an aggregation pipeline builder that allows you to see results in real-time
6
+ operators, which in turn brings a certain learning challenge. MongoDB Compass
7
+ includes an aggregation pipeline builder that allows you to see results in real-time
8
8
for each stage and fix mistakes early on. Once your pipeline is complete, you
9
9
can export the pipeline to your language and use it in your code. In the PHP
10
- driver, from now on your pipeline lives as an array, completely untyped, and
11
- sometimes a relatively complex structure of stages and operators. As an example,
12
- let's take this pipeline from one of my projects:
10
+ driver, that pipeline would live on as an array, completely untyped, and
11
+ sometimes with a relatively complex structure of stages and operators. Let's
12
+ take this pipeline from one of my projects as an example :
13
13
14
14
``` php
15
15
$pipeline = [
@@ -78,7 +78,7 @@ $pipeline = [
78
78
];
79
79
```
80
80
81
- Phew, that 's a lot of logic. To better understand what this pipeline does, let's
81
+ That 's a lot of logic! To better understand what this pipeline does, let's
82
82
look at a single source document:
83
83
84
84
``` json
@@ -93,7 +93,7 @@ look at a single source document:
93
93
```
94
94
95
95
I've left out some fields that we're not using right now. The aggregation
96
- pipeline aggregates all of these documents, producing a document for each day :
96
+ pipeline aggregates all of these documents, producing a document for each month :
97
97
98
98
``` json
99
99
{
@@ -122,17 +122,17 @@ Without going into more details on this, even if we were to comment on parts of
122
122
the aggregation pipeline to explain what it does, there will still be a high
123
123
cognitive load when going through the aggregation pipeline. One reason for this
124
124
is that any PHP editor will not know that this is an aggregation pipeline, and
125
- thus can't provide any better syntax highlighting other than "this is a string
126
- in an array". Couple that with a few levels of nesting, and you've got yourself
125
+ thus can't provide much help beyond syntax highlighting (e.g. "this is a string
126
+ in an array") . Couple that with a few levels of nesting, and you've got yourself
127
127
this magical kind of code that you can write, but not read. We can of course
128
128
refactor this code, but before we get into that, we want to move away from these
129
129
array structures.
130
130
131
131
## Introducing the Aggregation Pipeline Builder
132
132
133
- Previously released as a standalone package, version 1.21 of the MongoDB Driver
134
- for PHP now comes with a fully grown aggregation pipeline builder. Instead of
135
- writing complex arrays, you now get factory methods to generate pipeline stages
133
+ Previously released as a standalone package, version 1.21 of the MongoDB PHP
134
+ driver now includes a comprehensive aggregation pipeline builder. Instead of
135
+ writing complex arrays, you can use factory methods to generate pipeline stages
136
136
and operators. Here is that same pipeline as we had before, this time written
137
137
with the aggregation pipeline builder:
138
138
@@ -199,9 +199,10 @@ $pipeline = new Pipeline(
199
199
);
200
200
```
201
201
202
- Ok, this is still a complex pipeline, and we'll be working on this, but it now
203
- becomes significantly easier to look at and differentiate operators from field
204
- names, etc.
202
+ This is still a complex pipeline, but compared to the original array example
203
+ it is now much easier to infer the context of each pipeline component. Operators
204
+ are clearly differentiated from field names, and this typing can enable code
205
+ editors and tooling to better assist the developer.
205
206
206
207
To run an aggregation pipeline, you can pass a ` Pipeline ` instance to any method
207
208
that can receive an aggregation pipeline, such as ` Collection::aggregate ` or
@@ -217,24 +218,23 @@ to represent the somewhat flexible type system and give better guidance to users
217
218
when writing aggregation pipelines. That's why you will see expressions like
218
219
` dateFieldPath ` , ` doubleFieldPath ` , or ` arrayFieldPath ` . Each expression
219
220
resolves to a certain type when it's evaluated. For example, we know that the
220
- ` $year ` operator expression resolves to an integer. The argument is an
221
+ ` $year ` operator expression resolves to an integer, and its argument is an
221
222
expression that resolves to a date, timestamp, or ObjectId. While we could use
222
- ` $reportDate ` to use the ` reportDate ` field from the document being evaluated,
223
+ ` $reportDate ` to reference the ` reportDate ` field of the document being evaluated,
223
224
` dateFieldPath ` is more expressive and shows intent of receiving a date field.
224
225
This also allows IDEs like PhpStorm to make better suggestions when offering
225
226
code completion.
226
227
227
228
For all expressions, there are factory classes with methods to create the
228
229
expression objects. The use of static methods makes the code a little more
229
- verbose, but using functions was impossible due to aggregation pipeline using
230
- operator names that are reserved keywords in PHP (such as ` and ` , ` if ` , and
231
- ` switch ` ). I'll show alternatives to using these static methods later in this
232
- blog post.
230
+ verbose, but using functions was impossible due to conflicts between aggregation
231
+ operator names and reserved keywords in PHP (e.g. ` and ` , ` if ` , ` switch ` ). I'll
232
+ show alternatives to using these static methods later in this blog post.
233
233
234
234
## Bonus Feature: Query Objects
235
235
236
236
As a side effect of building the aggregation pipeline builder, there's now also
237
- a builder for query objects . This is because the ` $match ` stage takes a query
237
+ a builder for query filters . This is because the ` $match ` stage takes a query
238
238
object, and to avoid falling back to query arrays like you would pass them to
239
239
` Collection::find ` , we also built a builder for query objects. Here you see an
240
240
example of a ` find ` call, along with the same query specified using the builder:
@@ -253,7 +253,8 @@ $collection->find(
253
253
```
254
254
255
255
While this is a little more verbose, it provides a more expressive API than PHP
256
- array structures do. It's up to you to decide which option you like better.
256
+ array structures and brings the same improvements for IDEs and tooling. It's
257
+ up to you to decide which option you like better.
257
258
258
259
## Refactoring For Better Maintainability
259
260
@@ -262,10 +263,10 @@ array structures do. It's up to you to decide which option you like better.
262
263
With the basic builder details explained, there's still one problem: the builder
263
264
helps you write a pipeline, but it doesn't really make existing pipelines more
264
265
maintainable. Yes, it makes them easier to read, but a complex pipeline will
265
- remain just as complex. So, let's discuss some refactorings we can make to make
266
- the aggregation pipeline easier to read, but also to make parts of the pipeline
267
- reusable. Note that all of these example apply the same way to pipelines written
268
- as PHP arrays, but I'll use the aggregation builder in the example .
266
+ remain just as complex. So, let's discuss some refactorings we can make to both
267
+ improve the pipeline's readability and make parts of the pipeline more reusable.
268
+ Note that although the following example uses the aggregation builder, the same
269
+ suggestions can also be applied to pipelines written as PHP arrays .
269
270
270
271
Let's look at the first ` $group ` stage in the original example:
271
272
@@ -284,8 +285,8 @@ Stage::group(
284
285
);
285
286
```
286
287
287
- As you can see, we use the ` reportDate ` and ` price ` fields multiple times. An
288
- obvious refactoring would be to extract a variable for this :
288
+ As you can see, we reference the ` reportDate ` and ` price ` fields multiple times.
289
+ A quick refactoring would be to extract those to variables :
289
290
290
291
``` php
291
292
$reportDate = Expression::dateFieldPath('reportDate');
@@ -305,9 +306,8 @@ Stage::group(
305
306
);
306
307
```
307
308
308
- The ` fuelType ` and ` station.brand ` fields could be extracted as well. Since
309
- these are only used once, I didn't do that, but you may want to do so to favour
310
- consistency.
309
+ The ` fuelType ` and ` station.brand ` fields could be extracted as well. I opted not
310
+ to since they are only used once, but you may want to do so in favor of consistency.
311
311
312
312
### Comments Or Methods
313
313
@@ -361,8 +361,8 @@ fuel types with their prices, which is then converted to an object in
361
361
` $addFields ` . Ideally, we want to hide this implementation detail and extract
362
362
both stages together.
363
363
364
- To do so, we once again extract a factory method, except that this time we'll be
365
- returning a ` Pipeline ` instance:
364
+ To do so, we once again extract a factory method, except that this time we'll
365
+ return a ` Pipeline ` instance:
366
366
367
367
``` php
368
368
public static function groupAndAssembleFuelTypePriceObject(
@@ -401,7 +401,7 @@ public static function groupAndAssembleFuelTypePriceObject(
401
401
}
402
402
```
403
403
404
- By once again keeping fields as parameters, we keep the method flexible and
404
+ By once again keeping fields as parameters, the method remains flexible and we
405
405
allow using it in a pipeline that produces slightly different documents up to
406
406
this point. Since the method works independently of how we group documents, we
407
407
also keep the identifier as a parameter. Using this method further simplifies
@@ -468,7 +468,7 @@ $pipeline = new Pipeline(
468
468
469
469
So far, we've only extracted entire pipeline stages that contain relatively
470
470
simple expressions. Sometimes your aggregation pipeline will contain a more
471
- complex expression. From the same project that I took the previous example from ,
471
+ complex expression. From the same project that yielded the previous example,
472
472
there's also this gem that is part of a pipeline that computes the weighted
473
473
average price for each day:
474
474
@@ -519,8 +519,7 @@ $pipeline = [
519
519
];
520
520
```
521
521
522
- Once again, the builder can make this a little more concise, but the complexity
523
- remains:
522
+ The builder can make this a little more concise, but the complexity remains:
524
523
525
524
``` php
526
525
$prices = Expression::arrayFieldPath('prices');
@@ -603,7 +602,7 @@ public static function computeDurationBetweenDates(
603
602
}
604
603
```
605
604
606
- Again, this reduces the complexity of the pipeline stage tremendously:
605
+ This reduces the complexity of the pipeline stage tremendously:
607
606
608
607
``` php
609
608
$prices = Expression::arrayFieldPath('prices');
@@ -667,7 +666,7 @@ returns a date, e.g. `$dateFromString`.
667
666
Now that we know about these value holder objects, we still need to make sure
668
667
the server knows what we're talking about. When you call ` Collection::aggregate `
669
668
with the pipeline you built, what happens internally to it? Here, a series of
670
- encoders springs into action. We use a single entry point, the ` BuilderEncoder `
669
+ encoders spring into action. We use a single entry point, the ` BuilderEncoder `
671
670
class. This class contains multiple encoders that are able to handle all
672
671
pipeline stages, operators, and accumulators and transform them into their BSON
673
672
representations.
@@ -682,24 +681,24 @@ accordingly.
682
681
When creating a ` MongoDB\Client ` instance, you can now pass an additional
683
682
` builderEncoder ` option in the ` $driverOptions ` argument. This specifies the
684
683
encoder used to encode aggregation pipelines, but also query objects. All
685
- ` Database ` and ` Client ` instances inherit this value from the client, but you
686
- can override it through the options when fetching such an instance . This allows
684
+ ` Database ` and ` Collection ` instances inherit this value from the client, but you
685
+ can override it through the options when selecting those objects . This allows
687
686
you to have your custom logic applied whenever pipelines or queries are encoded
688
687
for the server.
689
688
690
689
With factories, value holders, and encoders, we wanted to ensure that creating
691
690
the builder does not turn into a repetitive chore. As you can imagine, many
692
691
operators will mostly consist of the same logic, resulting in tons of code
693
- duplication. To make matters worse, every new server version adds some new
694
- operators or even stages, so we wanted to make sure that we can easily expand
692
+ duplication. To make matters worse, every new server version may introduce new
693
+ operators and stages, so we wanted to make sure that we can easily expand
695
694
the builder.
696
695
697
696
We could try to rely on generative AI to help us with this, but this only goes
698
697
so far. Instead, we leverage code generation to make the task easier. All
699
698
factories, value holders, and encoders are generated from a configuration. When
700
699
a new operator is introduced, we create a config file with all of its details:
701
700
input types, what the operator resolves to, documentation for parameters, and
702
- even the examples from the documentation are included . We then run the
701
+ even examples from the MongoDB documentation . We then run the
703
702
generator, and are given all code necessary to use the operator.
704
703
705
704
As if that wasn't good enough, the generator also takes the examples we added
0 commit comments