Challenge
An international Life Sciences organisation was generating huge amounts of R&D data, but was struggling to surface it for data analytics and machine learning.
As an MVP a single product was chosen to be the first as part of the mesh in order to evidence its ability to improve data use throughout R&D.
Large amounts of equipment was use to complete scientific experiments. The variety meant data was provided in different data formats and structures, requiring extensive manual processing before it could be manually provided to data science teams.
Solution
A Google Cloud Platform (GCP) hosted data mesh was designed and the client team was educated on data mesh principles and governance.
Data was ingested automatically into a data lake and underwent schema validation. It was then tagged with appropriate metadata and logged in a data catalogue.
Data was also processed into a consistent structured format using a relational database (Postgres) to enable more rapid analytics and BI over experiment data.
Project Details
Project Duration - 1 month design, 12 week product development
Project Team - 1 delivery lead, 1 data architect, 3 data engineers, 1 data scientist
Benefits
The data mesh was successfully established with a working MVP.
Experiment data was discoverable via the data catalogue and searchable using metadata.
Structured experiment data was available for analytics.