Okera Inc., a startup founded by two former Cloudera Inc. executives to simplify the management of large heterogeneous data stores at scale, today is introducing a schema management tool designed to make it easier for companies to find, access and structure data from popular data analytics tools running on top of Amazon Web Services Inc.’s S3 cloud storage service.
The company, which launched out of stealth mode in May with $14.6 million in venture financing, specializes in data governance for data lakes, which are collections of largely unstructured data that aren’t organized according to a schema, which is a visual representation of the relationship between tables in a database. Schemas are typically applied to structured data prior to being used in production, but unstructured data can defy such rigid classification.
“All of the functionality that we’ve become used to in the world of relational databases has been missing from data lakes,” said Okera CEO Amandeep Khurana. ” We’re bringing that functionality.”
The new release of Okera’s Active Data Access Platform features what the company calls “intelligent schema management,” which it says enables data administrators to automatically discover new data sets, infer their schemas and assign universal access permissions at a fine-grained level.
It also features a new file system manager that the company said streamlines the discovery, access, governance and use of unstructured data in S3 data stores. Supported analytics platforms include Amazon’s Elastic Map Reduce, Apache Hive, Apache Presto, Apache Spark and business intelligence software from from Tableau Software Inc., Birst Inc. and Qlik Inc.
The platform is similar to a data catalog in that it enables data to be registered and governed according to an assigned set of metadata. However, “Most catalogs focus on business metadata. We are the technical and operational metadata,” Khurana said. “With schema ingest, we’re making life easier for the data producer who’s on-boarding the data set.”
Data lakes have been plagued by a lack of tools to provide structure and access control, both of which are essential to performing reliable analysis without risking inadvertent disclosure.
Okera says its platform not only enables administrators keep track of all their data in one place but also enforce access rules down to the field level. Okera says it can automate these administrative procedures at scale, and that it is already managing multi petabyte data lakes for customers.
Pricing is based on usage, but Okera didn’t provide details.