UpGuard can now report the discovery of multiple misconfigured cloud storage buckets under the control of Hortonworks, an enterprise data processing company which completed a merger with Cloudera in January of 2019. The massive amount of files accessible in those buckets were largely intended for public distribution, as Hortonworks contributes to and supports the open source Apache Hadoop project. Amidst terabytes of intentionally public files, however, were numerous system credentials and other internal developer information
Discovery and Notification
UpGuard analysts identified a cloud storage bucket configured for public access located at the URL "dev.hortonworks.com.s3.amazonaws.com" and proceeded to download and review a sample of the files. Initial analysis by the UpGuard team showed sufficient reason to believe sensitive information stemming from Hortonworks was most likely exposed to the public internet through the discovered file repositories. We initiated notification efforts on July 27, 2020.
On August 8th, UpGuard received an email from Cloudera stating that "investigation and remediation is complete", "[t]he S3 buckets will remain open for downloads", and that analysis had shown only three files potentially containing confidential information which had been removed by July 30th, 2020.
However, Cloudera's determinations would soon be revised. A follow-up email arrived the next morning. Cloudera had realized a backup of their company-internal collaboration and automation system– a local installation of the popular "Jenkins" software– was also available within the publicly accessible files. Jenkins is used to automate parts of the software development lifecycle, like building a software package and running tests. The backup of the configuration for Hortonworks' Jenkins installation stored here contained the usernames of developers and their encrypted passwords.
After receiving the follow-up email UpGuard analysts checked on the status of the bucket and found that all public access to the dev.hortonworks.com S3 bucket had now been removed.
Additionally, the email that morning from Cloudera's CIO informed our team that if we knew of additional relevant details, the information would be appreciated. As researchers who have reported many data exposures, we always appreciate a constructive response like this one. Open communication reduces risk and expedites remediation.
Chris Vickery, UpGuard's Director of Risk Research, replied with a list of 8 additional buckets we knew of bearing either "hortonworks" or "cloudera" in their title, three of which we knew were still publicly accessible at that point in time but did not appear to contain files at the same level of security concern as the original discovery.
Significance
While the vast majority of files we reviewed were indeed the kind normally made available for public consumption and community contribution, this incident illustrates the risks inherent in extremely large cloud storage containers. After several hours of recording the names of files available for download, the resulting list was 2.4 gigabytes of pure text– and that was just the names of the files with no content. The scale of these storage containers makes it such that even fully automated processes (like downloading and text searches) were slow by human standards, to say nothing of the time it would take for a human to review the contents. But within that vast collection were files that contained credentials to systems that appear to be core to Hortonworks' software development.
This single file, named hwx_secrets.conf, contains plaintext credentials for numerous other systems. UpGuard never attempts to use discovered credentials, so the potential impact of this exposure is unknown, but their potential reach extends into multiple sites for the production and delivery of software. There are passwords for:
- The user "relengjenkins" on Linux
- The user "jenkins-daemon" on Windows
- Three API keys for authentication on Github
- A database password
- A Jenkins signing key
- Credentials for users on the public and private Nexus instances
- The "releng" Jira instance
When that many directories and files of varying format are all stashed away together, it becomes all too easy for something to be mistakenly put among them and remain unnoticed, as is what appears to have happened here. Even after their designated security response email address received UpGuard's formal notification of a potentially serious security concern, it took eleven days for Cloudera/Hortonworks to recognize the true gravity and scope of concern.
Conclusion
Hadoop, Hortonworks’ primary product, is for big data problems at the largest scale. Facebook, for example, uses Hadoop; Cloudera’s home page currently boasts that their customer base includes most of the top ten largest companies in any industry. And both Cloudera and Hortonworks company names are present among the portfolio of In-Q-Tel, the private investment branch of the CIA. For companies that develop technology central to the backbone of huge swathes of commerce and network resilience, security practices preventing data leaks and accelerating incident response are vital.