@@ -121,12 +121,12 @@ pipeline aggregates all of these documents, producing a document for each month:
121
121
Without going into more details on this, even if we were to comment on parts of
122
122
the aggregation pipeline to explain what it does, there will still be a high
123
123
cognitive load when going through the aggregation pipeline. One reason for this
124
- is that any PHP editor will not know that this is an aggregation pipeline, and
125
- thus can't provide much help beyond syntax highlighting (e.g. "this is a string
126
- in an array"). Couple that with a few levels of nesting, and you've got yourself
127
- this magical kind of code that you can write, but not read. We can of course
128
- refactor this code, but before we get into that, we want to move away from these
129
- array structures.
124
+ is that the only way to express the aggregation framework domain-specific
125
+ language (DSL) is through untyped arrays, and any PHP editor can't provide much
126
+ help beyond syntax highlighting. Pair that with a few levels of nesting, and
127
+ you've got yourself the kind of code that you can write, but not read. We could
128
+ start off by refactoring the code, but instead let's try to move away from array
129
+ structures and use a better solution .
130
130
131
131
## Introducing the Aggregation Pipeline Builder
132
132
@@ -137,6 +137,12 @@ and operators. Here is that same pipeline as we had before, this time written
137
137
with the aggregation pipeline builder:
138
138
139
139
``` php
140
+ use MongoDB\Builder\Accumulator;
141
+ use MongoDB\Builder\Expression;
142
+ use MongoDB\Builder\Pipeline;
143
+ use MongoDB\Builder\Stage;
144
+ use function MongoDB\object;
145
+
140
146
$pipeline = new Pipeline(
141
147
Stage::group(
142
148
_id: object(
@@ -209,7 +215,9 @@ that can receive an aggregation pipeline, such as `Collection::aggregate` or
209
215
` Collection::watch ` . In addition, methods like ` Collection::updateMany ` and
210
216
` Collection::findOneAndUpdate ` can receive a ` Pipeline ` instance to run an
211
217
update. Keep in mind that you won't be able to use all available aggregation
212
- pipeline stages in update operations.
218
+ pipeline stages in update operations. Since a pipeline instance could be used
219
+ everywhere, the builder will allow you to add stages to a pipeline that aren't
220
+ supported in updates.
213
221
214
222
### Builder Design
215
223
@@ -223,7 +231,8 @@ expression that resolves to a date, timestamp, or ObjectId. While we could use
223
231
` $reportDate ` to reference the ` reportDate ` field of the document being evaluated,
224
232
` dateFieldPath ` is more expressive and shows intent of receiving a date field.
225
233
This also allows IDEs like PhpStorm to make better suggestions when offering
226
- code completion.
234
+ code completion, as well as a certain amount of type checking to ensure you're
235
+ not creating pipelines that would result in server errors.
227
236
228
237
For all expressions, there are factory classes with methods to create the
229
238
expression objects. The use of static methods makes the code a little more
@@ -314,17 +323,17 @@ to since they are only used once, but you may want to do so in favor of consiste
314
323
In complex pipelines, you'll often find comments explaining what a certain
315
324
pipeline stage or segment does. You should definitely include comments like
316
325
that, but you can also consider extracting parts of a pipeline to your own
317
- builder method. If you choose a descriptive method name, this can already
318
- explain what the stage or segment does, without the reader having to see the
319
- internal workings.
326
+ builder method. Choosing a descriptive method name that explains what the stage
327
+ or segment does can also remove the need for a comment and spare the reader from
328
+ having to dive into the internal workings.
320
329
321
330
``` php
322
331
public static function groupAndComputeStatistics(
323
- stdClass $_id ,
332
+ stdClass $groupBy ,
324
333
Expression\ResolvesToDouble $price,
325
334
): GroupStage {
326
335
return Stage::group(
327
- _id: $_id ,
336
+ _id: $groupBy ,
328
337
lowest: Accumulator::min($price),
329
338
highest: Accumulator::max($price),
330
339
average: Accumulator::avg($price),
@@ -342,7 +351,7 @@ $reportDate = Expression::dateFieldPath('reportDate');
342
351
$price = Expression::doubleFieldPath('price');
343
352
344
353
self::groupAndComputeStatistics(
345
- _id : object(
354
+ groupBy : object(
346
355
year: Expression::year($reportDate),
347
356
month: Expression::month($reportDate),
348
357
fuelType: Expression::fieldPath('fuelType'),
@@ -366,7 +375,7 @@ return a `Pipeline` instance:
366
375
367
376
``` php
368
377
public static function groupAndAssembleFuelTypePriceObject(
369
- stdClass $_id ,
378
+ stdClass $groupBy ,
370
379
Expression\ResolvesToString $fuelType,
371
380
Expression\ResolvesToInt $count,
372
381
Expression\ResolvesToDouble $lowest,
@@ -375,7 +384,7 @@ public static function groupAndAssembleFuelTypePriceObject(
375
384
): Pipeline {
376
385
return new Pipeline(
377
386
Stage::group(
378
- _id: $_id ,
387
+ _id: $groupBy ,
379
388
count: Accumulator::sum($count),
380
389
prices: Accumulator::push(
381
390
object(
@@ -409,7 +418,7 @@ the pipeline:
409
418
410
419
``` php
411
420
self::groupAndAssembleFuelTypePriceObject(
412
- _id : object(
421
+ groupBy : object(
413
422
year: Expression::fieldPath('_id.year'),
414
423
month: Expression::fieldPath('_id.month'),
415
424
brand: Expression::fieldPath('_id.brand'),
@@ -432,7 +441,7 @@ $price = Expression::doubleFieldPath('price');
432
441
433
442
$pipeline = new Pipeline(
434
443
self::groupAndComputeStatistics(
435
- _id : object(
444
+ groupBy : object(
436
445
year: Expression::year($reportDate),
437
446
month: Expression::month($reportDate),
438
447
fuelType: Expression::fieldPath('fuelType'),
@@ -441,7 +450,7 @@ $pipeline = new Pipeline(
441
450
price: $price,
442
451
),
443
452
self::groupAndAssembleFuelTypePriceObject(
444
- _id : object(
453
+ groupBy : object(
445
454
year: Expression::fieldPath('_id.year'),
446
455
month: Expression::fieldPath('_id.month'),
447
456
brand: Expression::fieldPath('_id.brand'),
@@ -453,7 +462,7 @@ $pipeline = new Pipeline(
453
462
average: Expression::fieldPath('average'),
454
463
),
455
464
self::groupBrandsAndSort(
456
- _id : object(
465
+ groupBy : object(
457
466
year: Expression::fieldPath('_id.year'),
458
467
month: Expression::fieldPath('_id.month'),
459
468
),
@@ -588,16 +597,16 @@ public static function computeElapsedSecondsOnDay(
588
597
* argument.
589
598
*/
590
599
public static function computeDurationBetweenDates(
591
- Expression\ResolvesToDate $previousReportDate ,
592
- Expression\ResolvesToDate $reportDate ,
600
+ Expression\ResolvesToDate $startDate ,
601
+ Expression\ResolvesToDate $endDate ,
593
602
): Expression\ResolvesToInt {
594
603
return Expression::min(
595
604
Expression::dateDiff(
596
- startDate: $previousReportDate ,
597
- endDate: $reportDate ,
605
+ startDate: $startDate ,
606
+ endDate: $endDate ,
598
607
unit: 'second',
599
608
),
600
- self::computeElapsedSecondsOnDay($reportDate ),
609
+ self::computeElapsedSecondsOnDay($endDate ),
601
610
);
602
611
}
603
612
```
@@ -638,7 +647,7 @@ This would include all the logic included in the pipeline. By extracting complex
638
647
parts into separate methods, you can test those in isolation under controlled
639
648
conditions, knowing that they behave as you intended. Looking at the last
640
649
example where we've extracted the expression that computes the duration between
641
- two dates, you can now easily verify that all the individual parts of that
650
+ two dates, you can now better verify that all the individual parts of that
642
651
complex expression work as expected.
643
652
644
653
Just beware of premature abstractions: extract complex logic with the goal of
@@ -656,12 +665,12 @@ but they allow us to know what types an expression will resolve to.
656
665
For an example, let's take the ` $hour ` operator. When you use
657
666
` Expression::hour ` , you will receive an instance of an ` HourOperator ` class.
658
667
This class implements an ` OperatorInterface ` , telling the builder that this is
659
- an operator that can be used in aggregation pipeline. It also implements a
660
- ` ResolvesToInt ` interface, as we always know that evaluating the expression
661
- results in an integer value. The required ` date ` parameter of the operator is a
662
- date, which in the builder is one of many things. It could be a BSON
663
- ` UTCDateTime ` instance, but it could also be the result of any operator that
664
- returns a date, e.g. ` $dateFromString ` .
668
+ an operator that can be used in an aggregation pipeline stage . It also
669
+ implements a ` ResolvesToInt ` interface, as we always know that evaluating the
670
+ expression results in an integer value. The required ` date ` parameter of the
671
+ operator is a date, which in the builder is one of many things. It could be a
672
+ ` MongoDB\BSON\ UTCDateTime` instance, but it could also be the result of any
673
+ operator that returns a date, e.g. ` $dateFromString ` .
665
674
666
675
Now that we know about these value holder objects, we still need to make sure
667
676
the server knows what we're talking about. When you call ` Collection::aggregate `
@@ -674,8 +683,8 @@ representations.
674
683
This allows us to keep the logic customizable. For example, the Doctrine MongoDB
675
684
ODM allows users to specify different names for fields in the database. In turn,
676
685
it needs to convert the name of a property in the mapped class to the name of
677
- the field in the database. Such a feature could easily be built by creating a
678
- custom encoder for all ` fieldPath ` expressions, and changing the field path
686
+ the field in the database. Such a feature could be built by creating a custom
687
+ encoder for all ` fieldPath ` expressions, and changing the field path
679
688
accordingly.
680
689
681
690
When creating a ` MongoDB\Client ` instance, you can now pass an additional
@@ -690,7 +699,7 @@ With factories, value holders, and encoders, we wanted to ensure that creating
690
699
the builder does not turn into a repetitive chore. As you can imagine, many
691
700
operators will mostly consist of the same logic, resulting in tons of code
692
701
duplication. To make matters worse, every new server version may introduce new
693
- operators and stages, so we wanted to make sure that we can easily expand
702
+ operators and stages, so we wanted to make sure that we can quickly expand
694
703
the builder.
695
704
696
705
We could try to rely on generative AI to help us with this, but this only goes
@@ -706,7 +715,6 @@ before and adds them to the tests. In these tests, we manually write the builder
706
715
code that would generate the pipeline or expression from the example. This
707
716
ensures that the generated logic behaves as we expect, is protected from
708
717
regression should we make any changes to the generator, and it allows us to use
709
- the builder and feel what it's like. To top it all off, we could add this code
710
- to the server documentation, similar to how we add other language-specific code
711
- snippets. We're not quite there yet, but we'd love to include this in the
712
- documentation.
718
+ the builder and feel what it's like. Ultimately, we love to add this builder
719
+ code to the server documentation, similar to how we add other language-specific
720
+ code snippets, but we're not quite there yet.
0 commit comments