Blogs

How GroupBy Works

By Leo Liu posted 12-21-2021 03:00

  

GroupBy is a function that accepts a list of objects as input and generates a grouped list as output. 

It's a powerful tool to implement the use cases of total/subtotal. For example, group the invoice items with the same charge name together and show the subtotal amount of each charge.

Data Transformation

Given a list-type input Items like the following:

[
    { "A": "a1", "B": "b1", "C": "c1", "D": "d1" },
    { "A": "a1", "B": "b1", "C": "c1", "D": "d2" },
    { "A": "a1", "B": "b2", "C": "c1", "D": "d1" },
    { "A": "a2" }
]

The value object has four named properties: A, B, C, D. Items|GroupBy(A,B,C) will output the following:

[
    {
        "A": "a1",                                                         ①
        "_Group": [                                                        ②
            {
                "B": "b1",
                "_Group": [                                                ③
                    {
                        "C": "c1",
                        "_Group": [
                            { "A": "a1", "B": "b1", "C": "c1", "D": "d1" }, ④
                            { "A": "a1", "B": "b1", "C": "c1", "D": "d2" }
                        ]
                    }
                ]
            },
            {
                "B": "b2",
                "_Group": [
                    {
                        "C": "c1",
                        "_Group": [
                            { "A": "a1", "B": "b2", "C": "c1", "D": "d1" }
                        ]
                    }
                ]
            }
        ]
    },
    {
        "A": "a2",
        "_Group": []
    }
]

Note:

① Instead of using the value of groupBy property as key name, like  {"a1": [ { "b1": [ { "c1": [.. ] } ] } ], "a2": []}, we keep the original key name, because we need to refer to this key value by property name.

② "_Group" is a reserved keyword, it's the hard-coded property name of the grouped list. 

③ There can be multiple "_Group" properties at different levels, it's a good idea to give it a meaningful name by using Cmd_Assign command.

④ At the deepest level of grouped items, the object schema is the same as the original in the input list.

Nested Loops

With the above in mind, let's take a look at the following template:

{{#Items|GroupBy(A,B,C)}}
    {{Cmd_Assign(ItemsWithSameA,_Group)}}
    {{#ItemsWithSameA}}
        {{Cmd_Assign(ItemsWithSameB,_Group)}}
        {{#ItemsWithSameB}}
            {{Cmd_Assign(ItemsWithSameC,_Group)}}
            {{#ItemsWithSameC}}
                {{A}} - {{B}}
            {{/ItemsWithSameC}}
        {{/ItemsWithSameB}}
    {{/ItemsWithSameA}}
{{/Items|GroupBy(A,B,C)}}

If you are not familiar with the Mustache template yet, the following pseudo-code can explain what it does.

GroupedItems = Items.GroupBy(A, B, C)
for {A,_Group} in GroupedItems {             // `{A,_Group}` is a deconstructed object
  ItemsWithSameA = _Group
  for {B,_Group} in ItemsWithSameA {         // B and _Group are key names
    ItemsWithSameB = _Group
    for {C,_Group} in ItemsWithSameB {
      ItemsWithSameC = _Group
      for item in ItemsWithSameC {           // item is in shape of the original
        print(item.A)

        print(" - ")
        print(item.B)
      }
    }
  }
}

Let's put them side by side:

Mustache Template

Pseudo-Code

{{#Items|GroupBy(A,B,C)}}

GroupedItems = Items.GroupBy(A, B, C)

for {A,_Group} in GroupedItems {

  {{Cmd_Assign(ItemsWithSameA,_Group)}}

  ItemsWithSameA = _Group

  {{#ItemsWithSameA}}

  for {B,_Group} in ItemsWithSameA {

    {{Cmd_Assign(ItemsWithSameB,_Group)}}

    ItemsWithSameB = _Group

    {{#ItemsWithSameB}}

    for {C,_Group} in ItemsWithSameB {

      {Cmd_Assign(ItemsWithSameC,_Group)}}

      ItemsWithSameC = _Group

      {{#ItemsWithSameC}}

      for item in ItemsWithSameC {

        {{A}} - {{B}} 

        print(item.A)

        print(" - ")
        print(item.B)

      {{/ItemsWithSameC}}

      }

    {{/ItemsWithSameB}}

    }

  {{/ItemsWithSameA}}

  }

{{/Items|GroupBy(A,B,C)}}

}


Hopefully, the above line-by-line comparison could help you understand how the nested loops work.

Aggregation

If you want to aggregate some numeric fields for a certain group, you need to be aware of the data structure and the variable scope. For example, let's say, you want to aggregate field E for all items with the same B property:

{{#Items|GroupBy(A,B,C)}}
    {{Cmd_Assign(ItemsWithSameA,_Group)}}
    {{#ItemsWithSameA}}
          {{Cmd_Assign(ItemsWithSameB,_Group)}}
          {{#ItemsWithSameB}}
            {{Cmd_Assign(ItemsWithSameC,_Group)}}
            {{#ItemsWithSameC}}
                {{A}} - {{B}}
            {{/ItemsWithSameC}}
          {{/ItemsWithSameB}}
          {{ItemsWithSameB|FlatMap(_Group)|Sum(E)}}               ①
    {{/ItemsWithSameA}}
{{/Items|GroupBy(A,B,C)}}

See line ①, 

  • ItemsWithSameB is a variable defined in the context of ItemsWithSameA, so you can only refer to it within the section of ItemsWithSameA.
  • The object schema of item in ItemsWithSameB is { "C": "..", "_Group": [] }, so you can't directly do {{ItemsWithSameB|Sum(E)}}, you need to flatten the _Group list first, please refer to the doc of FlatMap function for its usage.

Now, you should understand how GroupBy function works.

Here are some side notes:

  • Right now, GroupBy function can support up to 3 arguments, if you need more, please reach out to our support and let us know your use case.
  • If the value of a group-by property is null, it will be treated as blank.
The argument property name can be dotted data path, for example: InvoiceItems|GroupBy(RatePlanCharge.ChargeType).

------------------------------
Leo Liu
Zuora
------------------------------
0 comments
65 views