This month, I celebrated 4 years of engaged on a giant knowledge platform that makes use of the medallion structure for knowledge group. All my earlier experiences have been linked to completely different knowledge group approaches. Due to this fact, contemplating this milestone, I made a decision to share some classes I’ve discovered alongside the way in which, which I consider could also be useful to others working with an analogous strategy.
First… some context: What’s the medallion structure?
In abstract, it’s a design sample within the knowledge area meant for logical knowledge group inside a lakehouse. The aim is to progressively enhance the info construction and high quality by means of the layers outlined within the design sample (bronze, silver, gold). In abstract:
- Bronze layer: Uncooked knowledge. That is the touchdown zone the place the ingested knowledge arrives. There are not any knowledge transformations at this stage.
- Silver layer: Right here, you’ll be able to apply easy cleanings to your uncooked knowledge and retailer it on this layer. An outlined schema and knowledge varieties are anticipated.
- Gold layer: This layer represents your consumption layer. It’s the place for advanced aggregations, joins, and enterprise logic. The platform’s customers contemplate it the place to go when operating their knowledge queries.
The next diagram illustrates this idea and is almost certainly self-explanatory:
If you wish to study extra about this matter, I can refer you to those hyperlinks:
Now, let’s discuss concerning the classes discovered!
All proper, conceptually talking, every little thing is gorgeous, however when the each day routines arrives, we must always typically be versatile and elaborate on the perfect answer for the precise scenario in our enterprise.
So, these are the 4 takeaways that I wish to share primarily based on my final 4 years of expertise on this matter:
#1 Don’t be orthodox making use of the medallion structure
As you’ll be able to see within the references, the official documentation outlines just a few key steps for making use of the medallion structure. For instance: no schema is required within the bronze layer, and even solely minimal knowledge cleansing is indicated within the silver layer. Nevertheless, relying in your calls for, you need to be assured in making some changes primarily based on our undertaking’s enterprise actuality.
In my expertise, there have been a number of cases the place we needed to adapt the rules to attain the perfect outcomes. I can share the next:
- Now we have knowledge schemas for all of the layers, together with the bronze layer, within the knowledge platform. As we work together with completely different knowledge sources (resembling EventHubs, CSV recordsdata, Oracle connections, and others), schema enforcement has been adopted to make sure that any non-expected adjustments within the knowledge sources and their knowledge contracts are detected and addressed promptly, stopping disruptions to downstream processes.
#2 New layers, why not?
This one is a bit related with the merchandise above, but it surely wants a selected part since it might be disruptive for some individuals. The purpose right here is that, in some instances, it’s higher to outline your individual particular knowledge layer than attempt to determine what the best layer within the design by the ebook can be.
I’m fairly assured that if you happen to labored on the medallion structure, you already questioned your self concerning the layers’ goal and the place you must find sure knowledge after a few transformations (go away it in silver? Transfer it to gold?).
Over the previous years, now we have arrived at a degree the place:
- Our undertaking knowledge group consists of layers resembling ‘Reference’, ‘Sandbox’, and ‘Checkpoints’, amongst others. These layers have been launched to deal with the necessity for exact knowledge location in some cases. For instance, the ‘Reference’ layer was created to retailer lookup knowledge. This clear separation ensures that everybody on the group is aware of precisely the place to seek out and add lookup knowledge, eliminating any confusion between the silver and gold layers.
#3 Mappings within the catalog software
You’re doubtless utilizing a knowledge catalog software. In my present undertaking, we use Unity Catalog for knowledge governance, requiring us to map our exterior storage places (aligned with the medallion structure) to the catalog. This mapping requires cautious consideration.
Since there are not any strict constraints between schema names and storage roots, it’s essential to keep away from mismatches. For instance, mapping a bronze desk to a silver schema or every other complicated configuration can result in misinterpretations and errors.
- In our particular case, now we have completely different permission units for Personally Identifiable Data (PII) and non-PII knowledge. To handle this throughout the catalog, we mapped each PII and non-PII knowledge to the identical silver layer however differentiated them by inserting them in two separate schemas. This strategy permits us to keep up the logical grouping of the silver layer whereas implementing granular entry management primarily based on knowledge sensitivity.
#4 Versatile however not a large number
Though I’ve simply talked about that you just would possibly really feel assured making transformations in numerous layers and even including new ones, we needs to be cautious to not utterly reconfigure the design sample, as this might make upkeep considerably tougher.
The secret’s that the underlying medallion Structure ought to stay recognizable even with the modifications. For instance, a brand new group member ought to simply perceive the info movement and acknowledge the established design sample. This consistency is essential for long-term maintainability.
Moreover, watch out when mixing completely different knowledge ideas resembling knowledge mesh, knowledge vault, knowledge warehouse, and others as you arrange your knowledge group. Think about how these ideas combine together with your medallion Structure quite than being applied inside it.
Closing Ideas
I hope that is helpful to you indirectly. In case you’ve made it this far, I genuinely respect your consideration and encourage you to share your ideas within the feedback part. 🙂
See you!