"The Henry A. Murray Research Archive is Harvard's endowed repository for quantitative and qualitative research data at the Institute for Quantitative Social Science. Our collection comprises over 100 terabytes of data, audio, and video. We provide long-term preservation of all types of data of interest to the research community, including numerical, video, audio, interview notes, and other data."
"ICPSR advances and expands social and behavioral research, acting as a global leader in data stewardship and providing rich data resources and responsive educational opportunities for present and future generations.... ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences."
"QDR curates, stores, preserves, publishes, and enables the download of digital data generated through qualitative and multi-method research in the social sciences.... QDR’s overarching goals are to make sharing qualitative data customary in the social sciences, to broaden access to social science data, and to strengthen qualitative and multi-method research."
"Data for Research (DfR) provides datasets of content on JSTOR for use in research and teaching. Researchers may use DfR to define and submit their desired dataset to be automatically processed. Data available through the service includes metadata, n-grams, and word counts for most articles and book chapters, and for all research reports and pamphlets on JSTOR. Datasets are produced at no cost to researchers and may include data for up to 25,000 documents."
"Project Gutenberg offers over 59,000 free eBooks. Choose among free epub and Kindle eBooks, download them or read them online. You will find the world's great literature here, with focus on older works for which U.S. copyright has expired. Thousands of volunteers digitized and diligently proofread the eBooks, for enjoyment and education."
"Scraping describes the method to extract data hidden in documents – such as Web Pages and PDFs and make it useable for further processing. It is among the most useful skills if you set out to investigate data – and most of the time it’s not especially challenging. For the most simple ways of scraping you don’t even need to know how to write code."
"Session on text analysis with NLTK, including discussion of cleaning data, creating text corpora, and analyzing texts programmatically."
Any portion of the DHRI curriculum is highly recommended, whether you work through it at your own pace, or attend a workshop. If you are interested in attending any future DHRI sessions on-campus, please email firstname.lastname@example.org to be added to the mailing list.