Challenge

An international Life Sciences organisation was generating huge amounts of R&D data, but was struggling to surface it for data analytics and machine learning.

As an MVP a single product was chosen to be the first as part of the mesh in order to evidence its ability to improve data use throughout R&D.

Large amounts of equipment was use to complete scientific experiments. The variety meant data was provided in different data formats and structures, requiring extensive manual processing before it could be manually provided to data science teams.

Solution

A Google Cloud Platform (GCP) hosted data mesh was designed and the client team was educated on data mesh principles and governance.

Data was ingested automatically into a data lake and underwent schema validation. It was then tagged with appropriate metadata and logged in a data catalogue.

Data was also processed into a consistent structured format using a relational database (Postgres) to enable more rapid analytics and BI over experiment data.

Project Details

Project Duration - 1 month design, 12 week product development

Project Team - 1 delivery lead, 1 data architect, 3 data engineers, 1 data scientist

Benefits

  1. The data mesh was successfully established with a working MVP.

  2. Experiment data was discoverable via the data catalogue and searchable using metadata.

  3. Structured experiment data was available for analytics.

Previous
Previous

Property Management: Data Strategy & Implementation

Next
Next

Life Sciences: Azure Data Platform