manual effort needed to produce unified and consolidated reports
infrastructure maintenance needed
A North American Health group was struggling with consolidating their accounting reporting as the group consists of a number of companies and clinics each of them using different accounting software solutions. The group was also looking at a centralized repository for storing and reporting on their EMR data (Electronic Medical Records).
The solution is comprised of two major components – a data lake to store any data, in any format, from any source and a Redshift data warehouse which sources data from the data lake and hosts a robust integrated data model, allowing for consolidated reporting. The predefined reports are implemented using AWS QuickSight.
Business Case Story
Our client was experiencing difficulties in consolidating their accounting data and achieving consolidated reporting on it, as different companies and health clinics from the group were using different accounting software solution. The ways to extract data and its format varied greatly across the sources, which posed a challenge from technical perspective, as well – these solutions could provide structured or semi-structured data in various format – XML, JSON, CSV, etc. Also, the different accounting solution implemented different models and concepts, which required significant effort to standardize the multiple data sets. Apart from the consolidated reporting requirements, the client needed a centralized and secure repository to store large EMR data sets, which contain patient sensitive information.
The solution is built entirely within AWS cloud environment, with no on-premises elements and using only serverless or AWS managed services, which leads to no need for maintenance and/or upgrades. The solution consists of two main areas – a data lake and a data warehouse.
AWS Data lake
The goal of this area is to ingest and store any data using schema-on-read approach, regardless of the data structure (structured or semi-structured), format (plan text, json, xml, etc.) or nature of the source system. Within the data lake, the data is processed and standardized to a common, efficient format (parquet) which allows for quick and easy access to the data, as well as reduced cost for storage.
Data is ingested and processed with AWS Lambda and Glue, described in the Glue catalogue, profiled with AWS DataBrew, and stored securely in S3. On the fly access to the data is available via AWS Athena and all processes are orchestrated with AWS StepFunctions.
AWS Data Warehouse
The data warehouse models and integrates data from multiple source systems into a single custom-tailored model, enabling consolidated analytics on it. Data deltas are exported from the data lake and copied into Redshift on a daily basis.
AWS QuickSights is used to implement the required pre-defined reports which allowed the business users to automatically gain precious insights into consolidated accounting data sets and take well-informed decision.
Centralizing data from all various source systems into the AWS data lake and the Redshift Data Warehouse allowed for a number of benefits and improvements:
10x increased productivity of analytics team – being able to pull all necessary data into the data lake fully automatically and having access to all the data they need, the analysts stopped spending time on trying to get access to the data but focused on creating much more complex analytics, delivering additional value and gaining precious advantage.
0 manual effort needed to produce unified and consolidated reports – with all of the data being processed fully automatically and available in one central repository, it is now possible to create all necessary reports and dashboards which provided consolidated insights into the results of all companies and clinics within the group.
0 infrastructure maintenance needed – the data lake and data warehouse were created using only either serverless or fully managed services, which eliminates the need for manual support, maintenance, and patching.