Microsoft Download Center Archive

Enron Stimuli for Text-Entry Experiments

  • Published:
  • Version: 1.0
  • Category: Tool
  • Language: English

This download contains sets of 10, 20, 50, 100, 200, and 500 representative phrases from the Enron corpus. Last published: May 4, 2011.

  • This download contains sets of 10, 20, 50, 100, 200, and 500 representative phrases from the Enron corpus. The phrases contain four words. The original Enron data source comes from a data set collected and prepared by the CALO (A Cognitive Assistant that Learns and Organizes) Project. It contains data from about 150 users, mostly Enron senior management, organized into folders. The corpus contains about a half-million messages. This data originally was made public and posted to the web by the Federal Energy Regulatory Commission during its investigation of Enron. The data set does not include attachments, and some messages have been deleted “as part of a redaction effort due to requests from affected employees.” To make the stimuli more representative of “general email,” as opposed to emails common in an Enron setting, we filtered the data to remove all email addresses and phone numbers. Furthermore, because a large portion of the emails contained replies quoting the original message, we removed duplicate sentences. This might have inadvertently removed duplicate sentences that were not quotations.

Files

Status: Live

This download is still available on microsoft.com. The downloads below will come directly from the Microsoft Download Center.

FileSize
EnronSimuli.zip
SHA1: ceef49db40c5e763f93af46e584f49e433e9897c
24 KB

File sizes and hashes are retrieved from the Wayback Machine’s indexes. They may not match the latest versions of files hosted on Microsoft servers.

System Requirements

Operating Systems: Windows 10, Windows 7, Windows 8

    • Windows 7, Windows 8, or Windows 10

Installation Instructions

    • Click Download and follow the instructions.