Microsoft Download Center Archive
![]() | Enron Stimuli for Text-Entry Experiments |
This download contains sets of 10, 20, 50, 100, 200, and 500 representative phrases from the Enron corpus. Last published: May 4, 2011.
- This download contains sets of 10, 20, 50, 100, 200, and 500 representative phrases from the Enron corpus. The phrases contain four words. The original Enron data source comes from a data set collected and prepared by the CALO (A Cognitive Assistant that Learns and Organizes) Project. It contains data from about 150 users, mostly Enron senior management, organized into folders. The corpus contains about a half-million messages. This data originally was made public and posted to the web by the Federal Energy Regulatory Commission during its investigation of Enron. The data set does not include attachments, and some messages have been deleted “as part of a redaction effort due to requests from affected employees.” To make the stimuli more representative of “general email,” as opposed to emails common in an Enron setting, we filtered the data to remove all email addresses and phone numbers. Furthermore, because a large portion of the emails contained replies quoting the original message, we removed duplicate sentences. This might have inadvertently removed duplicate sentences that were not quotations.
Files
![]() | Status: LiveThis download is still available on microsoft.com. The downloads below will come directly from the Microsoft Download Center. |
File | Size |
---|---|
![]() SHA1: ceef49db40c5e763f93af46e584f49e433e9897c | 24 KB |
File sizes and hashes are retrieved from the Wayback Machine’s indexes. They may not match the latest versions of files hosted on Microsoft servers.
System Requirements
Operating Systems: Windows 10, Windows 7, Windows 8
- Windows 7, Windows 8, or Windows 10
Installation Instructions
- Click Download and follow the instructions.