Political History: How A Democratic Organization Leaked Six Million Email Addresses

The Discovery
Over 6 Million Email Addresses
Email Domain Analysis
Bucket Permissions
The Significance
Political Data
The Longevity of Data
Conclusion

The UpGuard Data Breach Research Team can now disclose that approximately 6.2 million email addresses were exposed by the Democratic Senatorial Campaign Committee in a misconfigured Amazon S3 storage bucket. The comma separated list of addresses was uploaded to the bucket in 2010 by a DSCC employee. The bucket and file name both reference “Clinton,” presumably having to do with one of Hillary Clinton’s earlier runs for Senator of New York. The list contained email addresses from major email providers, along with universities, government agencies, and the military.

Political campaigns rely now more than ever on data driven decision making to maximize the effectiveness of their electioneering efforts. This bucket shows the reach and longevity of such data, and how operational errors in the handling of that data can leave it exposed to the public.

The Discovery

At approximately 4PM on Thursday, July 25th, 2019, UpGuard researchers discovered an Amazon S3 storage bucket named “toclinton.” This bucket was available to globally authenticated AWS users, one of the two public groups available in S3 permissions. This means that anyone with a free AWS account could access the bucket and its contents. The bucket contained a single file, EmailExcludeClinton.zip. The unprotected zip file contained a .csv file with over 6 million email addresses.

Upon examining the permission set of the S3 bucket, a user was found with the prefix “DSCC.” This acronym represents the Democratic Senatorial Campaign Committee, a Democrat electioneering group. According to their website, the DSCC “is the only organization solely dedicated to electing a Democratic Senate. From grassroots organizing to candidate recruitment to providing campaign funds for tight races, the DSCC is working hard all year, every year to elect Democrats to move our country forward.” The username matched up to an individual who worked for the DSCC at the time the zip file was uploaded, whose job would be relevant to the data present in the bucket.

UpGuard contacted the DSCC the next morning, Friday, July 26th, and notified them of the exposure. By 2PM the same day, the bucket had been secured, preventing future malicious use of the data.

Over 6 Million Email Addresses

The 145MB .csv file contained over 6,235,397 lines, each of which was an email address. The filename, “EmailExcludeClinton.csv” seems to indicate that this was a list of people who had opted out or should otherwise be excluded from DSCC marketing emails. From 2000 to 2009 Hillary Clinton served as Senator for New York. In 2008 she unsuccessfully sought the nomination of the Democratic Party as a candidate for President, and in 2009 began serving as Secretary of State under Barack Obama. The file “EmailExcludeClinton.csv” was last modified on September 17, 2010. How the contents of the file fit into the timeline of Clinton’s career in politics is unknown from what is in this bucket, but it is certain that it predates her 2016 presidential bid by several years.

Email Domain Analysis

In viewing the contents of the file, the vast majority looked like plausible email addresses from real people. Analyzing the number of each address per email domain provider supports the hypothesis that these are real email addresses from ordinary citizens. The chart below shows the number of email addresses per provider for the top ten most common domains. As far as consumer email addresses go, this is not a surprise: it looks like a list of commonly email providers because that is most likely what it is.

aol.com	1,662,935
yahoo.com	1,017,538
hotmail.com	760,295
gmail.com	255,294
msn.com	224,343
comcast.net	190,042
sbcglobal.net	111,413
earthlink.net	92,875
verizon.net	82,594
bellsouth.net	65,351

Analysis also showed a long tail of thousands of other, less commonly used email domains, including email domains associated with businesses and 492 distinct .edu email domains. The most frequently used .edu domains were those belonging to large universities, which again is not surprising: large universities provide email address to tens of thousands of people, and in a sample of six million email addresses, those common providers will show up frequently. The list of email addresses also included 7,766 .gov addresses and 3,457 .mil addresses, as one would expect in any sufficiently large sample of Americans’ email addresses.

Bucket Permissions

The contents of Amazon S3 buckets are public when they are configured to allow at least read access to all users or globally authenticated users (anyone logged into their free AWS account). In some cases, however, those global user groups have more extensive permissions, allowing them to modify the contents or permissions of the bucket or its content. In this case, both the owner of the bucket and the global authenticated user group had “FULL_CONTROL” permissions, allowing anyone to download or modify the contents of the bucket, as well as the permission set itself.

The Significance

Political Data

Data collection and analysis has grown rapidly as one of the core capabilities needed for a political campaign, but the nature of those campaigns– short lived exercises that quickly raise and spend large amounts of money with third party revolving door consultants in a winner-take-all competition– is antagonistic to the conditions of good data management. Both Republican and Democratic campaigners benefit from having easy access to huge amounts of personal data on American citizens; those citizens, whose data is at stake, do not. It is a situation that predictably and consistently results in data exposures.

UpGuard has previously reported on two significantly larger exposures related to the political data economy. In one case, a data analytics provider exposed the Republican National Committee’s enriched voter database, which included both personal and psychographic information for every registered American voter. In another, a software provider for that kind of analysis exposed their code base, revealing the mechanisms for how voter data is gathered, tracked, and enriched across platforms.

The list of six million email addresses, with some link to Clinton and the DSCC, is a much smaller exposure than that with data for the entire U.S. electorate. But it is still a large number of potential targets for a malicious actor, and enough context to make reasonable guesses about how to craft such a cyber attack. In sum, these exposures highlight the problem of passing large amounts of personal data through the modern political campaign, where the need for mass marketing and data sharing contributes to the risk of exposures.

The Longevity of Data

The most obvious interpretation of the evidence here is that this file was uploaded in 2010, meaning it has been publicly available for almost a decade. Whether it was accessed by any parties other than UpGuard is not knowable with the information we have available.

Data was important in 2010. The same tactics and strategies deployed in the 2016 election were created and honed long before that. But the scale of political data has grown significantly along with its importance. Consideration should be paid to what artifacts of our current political data system will be unearthed, and who they will affect. This list contained only email addresses, but other political data sets contain far more information on individuals, down to psychographic information such as their habits, behaviors, and likely beliefs. The same things that make this data valuable to political campaigns makes it valuable to malicious actors-- intel on individuals that can be used to contact and influence them. If political data can be exposed for ten years, the risk created by that data has an unknown half-life.

Conclusion

The digitization of every sphere of life has created a myriad of consequences that are just now coming to light. Healthcare, finance, and politics are among the major convergences of personal data being collected and used every day. Interactions are tracked, behavior is modeled by analytics that compile huge data sources, and information is microtargeted to audiences that are known better than they know themselves. The crumbs of data that fall from these operations and end up in misconfigured storage locations or are otherwise unintentionally exposed are but a fraction of the total data circulating in a vicious and competitive economy of knowledge. Unless steps are taken to better control the way in which data is gathered, concentrated, and processed, exposures of this kind will continue, and their scope and scale will increase. Organizations should treat their data with the same respect they give to the success it allows them to achieve.

Protect your organization

Get in touch or book a free demo.

Contact sales

Free demo

Related breaches

Learn more about the latest issues in cybersecurity.

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

UpGuard can now report that it has secured an Elasticsearch database for AngelSense, a GPS tracker for children and adults with special needs.

UpGuard Team

January 30, 2025

Stolen Data: National PTA Database Available on Dark Web

On May 13th, UpGuard discovered a new set of data recently posted on a prominent dark web forum, this time allegedly belonging to the National Parent Teacher Association.

UpGuard Team

May 14, 2024

Student Applications: How an Education Software Company Exposed Millions of Files

UpGuard can now report that a public Google Cloud Storage bucket containing approximately 1.5 terabytes of data used to administer funding programs for college students has been secured. The bucket belonged to SmarterSelect, a company that provides software for managing the application process for scholarships, grants, and awards. The more than 2.8 million files included documents like transcripts, resumes, personal essays, tax returns, and invoices for approximately 1.2 million applications to funding programs.

UpGuard Team

November 22, 2021

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

38 million records were exposed in multiple data leaks resulting from misconfigured Microsoft Power Apps portals. Data included sensitive information such as COVID-19 contact tracing data, COVID-19 vaccination appointments, social security numbers for job applicants, employee IDs, and millions of names and email addresses.

UpGuard Team

August 23, 2021

Florida County Database Mistake: Election Officials’ Logins Among Exposed Data

UpGuard can now disclose that an Amazon S3 storage bucket containing publicly exposed backups of systems representing the intranet and web presence for Martin County, Florida has been secured.

UpGuard Team

October 30, 2020

Watching the Watcher: How a Security Company Leaked Customer Data

UpGuard can now report that it has secured an Elasticsearch database containing data from APIsec.ai, a security company that claims to be used by 80% of the Fortune 100.

Greg Pollock

March 31, 2025

View all breaches

Sign up for our newsletter

Stay up-to-date on everything UpGuard with our monthly newsletter, full of product updates, company highlights, free cybersecurity resources, and more.

Free instant security score

How secure is your organization?

Request a free cybersecurity report to discover key risks on your website, email, network, and brand.

Instant insights you can act on immediately
Hundreds of risk factors including email security, SSL, DNS health, open ports and common vulnerabilities

Free score

Financial Services

Technology

Healthcare

ISO 27001

NIST Cybersecurity Framework

SIG Lite Questionnaire

APRA CPS 230 Compliance

The Risk of Third-Party AI Trained on User Data

How AI is Changing The Way We Manage Cyber Exposure

Threat Monitoring for Superannuation Security

Blog

Breaches

eBooks, Reports, & more

Events

Financial Services

Technology

Healthcare

Blog

Breaches

eBooks, Reports, & more

News

Events

Newsletter

Table of contents

Join 27,000+ cybersecurity newsletter subscribers

The Discovery

Over 6 Million Email Addresses

Email Domain Analysis

Bucket Permissions

The Significance

Political Data

The Longevity of Data

Conclusion

Protect your organization

Related breaches

Sixth Sense: GPS and AI Data Exposed for Assistive Devices

Stolen Data: National PTA Database Available on Dark Web

Student Applications: How an Education Software Company Exposed Millions of Files

By Design: How Default Permissions on Microsoft Power Apps Exposed Millions

Florida County Database Mistake: Election Officials’ Logins Among Exposed Data

Watching the Watcher: How a Security Company Leaked Customer Data

Sign up for our newsletter

Free instant security score

How secure is your organization?

Products

Compare

Tools

Solutions

Resources

Company

Insights