Cloud Based Data Lake Streamlines The Analysis of Data
Cloud Based Data Lake Streamlines The Analysis of Complex Data
Data Lake Creation
An international chemical process development and licensing organization who specializes in helping production plants manufacture plastic bottles, antifreeze and a variety of other products.
Data analysis and data processing was a long-winded and complex task for an international chemical process development and licensing organization. The company, who specializes in helping production plants manufacture plastic bottles, antifreeze and a variety of other consumer products, needed an effective, routine method to record and collect data being generated daily. They had no proper way to monitor and assess the performance of catalysts and Ethylene Oxide, Ethylene Glyco and derivative production. While the company had no shortage of data, the biggest problem was that the data was acquired in a variety of inconsistent formats which included handwritten documents and inconsistent spreadsheets. The integration and evaluation of this data was also done manually – meaning team members and leaders would spend large amounts of time wrangling data. In addition, data was often not provided in the same units of measure and it was not stored in a normalized central repository. Due to the challenges listed above, it was difficult for company scientists to analyze the data sets and create detailed insights to deliver value to their customers. Without effective data analysis, production processes could be impacted with no way to fine-tune them. Dunn Solutions was brought onboard to conduct a much-needed data intervention to solve their data processing and data storage woes.
It was critical to automate and optimize data ingestion and transformation and store the data in a unified repository. The company also required a scalable and performant data repository for sensor and measurement data. The Dunn Solutions implementation team designed and developed an Azure and Snowflake based modern data warehouse which consisted of a data lake working in conjunction with a data warehouse. The data lake used a combination of Azure Blob Storage and Snowflake to house a raw zone, structured zone and Kimball dimensional data warehouse. Data was ingested from files provided by customers, typically in a CSV or Excel file, using Azure Data Factory to move data to the raw zone in Blob Storage. From there, data was transformed and processed into the structured zone. Processing was done in memory using Data Bricks. This allowed critical data manipulation functions (i.e., convert to common units of measure, identify outliers, and statistical measurements) to quickly execute regardless of volume or complexity. From the structured zone, data was further processed to populate a dimensional star schema for analytics. This solution provided the companywith speed and flexibility. Data analysts could query data from the highly polished star schema and/or from the structured data in the data lake.
Now equipped with a new modern data warehouse stored in the cloud, company scientists can now focus on analyzing and processing customer sensor and process data. This enables them to gain valuable insight needed to optimize catalysts and patented processes. Also, the new data solutions easily ingested sensor and measurement data provided by customers in various formats, automated data integration across the customer sources in minutes – far from the hours it took before and unified data so calculations and analysis can be performed. The Dunn Solutions team was pleased that their data solution was able to simplify, streamline and shorten the company’s robust data processes for long-term success and impact.