Pivoting in Endeca

The impetus for Branchbird’s SmartStateManager

The prologue.

Pivoting across entities in your organization’s data is a central feature of any data discovery application.  For example, understanding what parts in your organization’s supply chain have the highest number of quality issues, and then pivoting to the discrete list of suppliers that provide those parts is a powerful, yet expected, data discovery capability.

Those familiar with Endeca’s approach to data modeling are well-versed in data “denormalization” — or joining the entities in the data together upon ingest.  This denormalization process makes for wider, fully-attributed records which allow for very performant information access as offered by the Endeca Server.   As an example, in this denormalized data model, our “part” record in Endeca is fully-attributed with the information about the supplier who supplies it and the technical attributes that describe it.

This denormalized approach is straightforward when consuming data from the star schemas in the data warehouse.  The “fact” drives the grain of our Endeca records and all of the surrounding dimension tables can merely be joined on to our fact to make our records wider and further attributed.  With data combined in this fashion, pivoting in Endeca is as easy as filtering our the records by attributes from one dimension table (say, “product_type:bikes”) and aggregating by attributes from another dimension table (say, “GROUP BY supplier_name”).

The challenge.

Pivoting in Endeca becomes challenging when the data to be consumed does not join together nicely into one “record type”.

(Note: A record type is nothing more than an additional attribute that we tag our Endeca records with to help our front end queries distinguish between different types of data.  Adding an additional attribute (e.g. “record_type:sales”) allows us to classify and summon certain types of records comprised of different entities at different grains for different purposes.)

To those who haven’t spent large amounts of time modeling data in Endeca, the “record type” concept can be confusing.  It is best illustrated by example.  Take the following logical data model…

In this logical data model, we have three entities: vendors, contracts, and payments.  In one system, a company establishes contracts (or purchase orders) with vendors to acquire their goods and/or services.  In a separate system, said company issues payments to those vendors once the company has been invoiced.  Under this design, there is no way to join contracts and payments together.  They are “sibling” entities that share the same parent, in this case vendors.  A company may have multiple contracts with a single vendor and, similarly, may have issued multiple payments to that same vendor, but with no direct link between the two.

As a common data store with self-describing records, the Endeca Server can bring this contract and payment data together as two record types.  The resulting data model would likely look like the following:

The two record types in the Endeca Server can clearly be seen in the figure above, as can where they overlap and share attribution.  (Note: For simplicity sake, only some of the entities’ attribution is shown.)  There is no surprise that they both share attribution from their common parent, vendor (e.g. attribute=vendor_num).

The rub.

Even under this multiple “record type” data model, pivoting remains simple if the end user of the application refines by one of the shared attributes (e.g. Vendor_Num, Department, or a range filter on Amount).  In this case, both of the record types, contracts and payments, refine in lockstep and the application can still easily summon metrics around both contracts and payments (e.g. How many contracts and payments were in place in 2011 by the IT department?).

The OEID pivoting challenge is truly encountered when the end user of the application chooses to refine by one of the attributes that is not shared.  By design, the Endeca Server will only return those records that carry the attribute value selected by the end user.  Thus, if the end user wanted to see all records where the purchase order description was “DEMOLITION” only contract records would be returned since the “PO_Desc” attribute only exists on our contracts records.  This refinement would wipe out our payments records altogether, offering the end user no means to pivot to associated payments based on those “DEMOLITION” contracts.

This problem surfaces on enough OEID implementations, that experienced Endeca consultants have resorted to referring to shared attributes as “safe” and unshared attributes as “unsafe”.

There’s hope.

Branchbird has developed a solution to this problem.  Branchbird’s own Patrick Rafferty has developed our own SmartStateManager that addresses this challenge specifically.  The solution is elegant since OEID StateManagers are an expected extension point of the OEID Studio product.

The Branchbird SmartStateManager can be configured to address this problem in two ways:

1) Avoid filtering unsafe/unshared attributes.  

The Branchbird SmartStateManager can be configured to interrogate the Endeca Server to discern which attributes belong to which record types.  Thus, it knows not to apply filters to record types that do not carry the said attribution.  In the example above, the SmartStateManager could be configured to keep payments in context when a contracts-specific attribute like “PO_Desc:DEMOLITION” is used as the filter.

2) Allow for a key to be designated between the sibling entities.  

In our example above, we could also use the Branchbird SmartStateManager to define a shared key between contracts and payments, such as “Vendor Name”.  Thus, when the end user decides to filter down to contracts with a purchase order description of “DEMOLITION”, the Branchbird SmartStateManager subsequently filters down the associated payments by the collection of “Vendor Name” still in context.

To fully illustrate the problem and the solution, per the example above, I downloaded some publicly accessible contract and payment data from the City of Chicago.

You can see from the screen captures below, this commonly encountered issue is easily addressed allowing for new insights in the OEID product based on sophisticated pivoting that would otherwise not be available.

Without Branchbird’s SmartStateManager:

With Branchbird’s SmartStateManager:

If you’re interested to know more about Branchbird’s SmartStateManager or how Branchbird can help with your OEID initiatives, please drop us a note at info@branchbird.com.