7. Implementation time – The completeness of the HPCC Systems data workflow means users can deploy a complete data lake platform in just a few hours. Creating a data lake with capabilities like HPCC Systems requires a combination of individual solutions from different vendors that could require weeks of compatibility testing before it is fully operational.
8. Ongoing Maintenance – Because HPCC Systems can run on commodity hardware and requires a smaller cluster footprint in comparison to competing platforms, if an HPCC Systems data lake needs additional processing or data storage capacity, that capacity can be added at lower cost. Additionally, the platform has native support features like user authentication, authorization, reliability monitoring, performance monitoring, and job management that make managing the data lake less burdensome in terms of time and number of IT staff needed.
9. Flexibility – The HPCC Systems roadmap uses a monthly feature release plan that consists of release candidates and gold releases. As the platform is open source, outside developers can contribute directly to the feature release plan. This balance creates a flexible development model to quickly bring new features to market.
10. Developer ecosystem – HPCC Systems has a global developer network actively working to customize the platform for use in a range of vertical markets, including agriculture, finance, healthcare, insurance, IoT, and others.
11. Reliability/maturity – The HPCC Systems data lake platform was created in 1999. Since then, there have been numerous successful deployments of the platform that serve as proof of its reliability and efficacy. LexisNexis Risk Solutions and their customers use HPCC Systems to execute production workloads daily using a platform that is battle tested and production ready. Several of these deployments are detailed in case studies available on the HPCC Systems website.
12. Data support – HPCC Systems is very adept at handling and storing data, no matter the format (CSV, XML, JSON, plain text, and binary files are all supported). Raw data files are maintained in standard Linux filesystems, and data storage is implemented via a modernized version of Indexed Sequential Access Method (ISAM) files to quickly create search indexes that make it easier to find individual records within that dataset.