University Library: Find Data and Datasets: Home

What are data and datasets?

Data refers to raw, unprocessed information or facts. It can come in various forms, such as numbers, text, images, or measurements. Imagine it as the building blocks of knowledge.

Datasets are structured collections of data points or observations. Think of it as a big table with rows (instances) and columns (variables). Datasets are like organized treasure troves of information.

Remember, data provides the raw material, while datasets serve as the foundation for analysis and research. Now, let's explore how to find these datasets.

Before you begin your research, ask yourself these questions:

What is the population you want to study?
- People (individuals, couples, households)
- Organizations (companies, political parties, professional groups)
- Commodities or things (arrests, commercial travel, crops)
When is the time you want to study?
- A single point in time
- A time series (annually, quarterly, monthly, etc.)
Where is the location (geography or place) you want to study?
- Political boundary (county, state, or country)
- Census boundary (metropolitan statistical area, census tracts, block groups)
Who would collect or publish this information? Who would need this information?
- Government agencies (Census Bureau, Department of Labor, Centers for Disease Control and Prevention)
- NGOs, IGOs, or think tanks (World Bank, United Nations, Pew Research)
- Academic institutions (research funded by outside foundations is often publicly available)
- Non-profits or associations (American Cancer Society, National Education Association)
- Private sector (individual companies, marketing, and commercial firms who charge for data)

Remember, flexibility and resourcefulness are essential when working with data.

Finding Data and Datasets

You can find data and datasets in many different types of sources. Here are some of the best places to search:

A list of data and statistical databases can be found in the library by selecting the Research Databases icon on the homepage and filtering by the type Datasets and Statistics. Most of the databases listed have data and statistics incorporated into their collections. However, Sage Data is entirely devoted to data and statistics.

Sage Data
Browse by Subject, Source, or Geography. Once you find a dataset, export it into Excel, XML, or PDF format. Sage Data also has an excellent section called Data Basics that guides users through evaluating and understanding data.

Open data derives from publicly available sources and is available for use, sharing, and distribution by anyone. Generally, it requires only attribution of the original source. Open data differs from public sector data in that public sector data is usually mandated through laws and regulations to be shared. Open data includes public sector data and data that is published and shared by private sector organizations.

Open data is valuable to researchers not only because it is freely available, but also because new combinations of data can be used to create new knowledge, insights, and ultimately new fields of application.

For more information on open data, check out the Open Data Handbook for guides, case studies, and resources on open data.

A data repository is a storage space for researchers to deposit data sets associated with their research. Open-access data repositories store data in a way that allows immediate user access to anyone.

Please note that this is not an inclusive list of repositories available on the web. The repositories listed are general and cover multiple disciplines. This list should be considered a jumping-off point for your research. We recommend speaking to your instructor, advisor, or using the Ask Us! service for repository recommendations specific to your discipline.

A Search Engine for Repositories

Google Dataset Search
Google Dataset Search is like Google's regular search but strictly for data. You can search it by keyword to find links to data from publicly available data sources, such as Statista and Data.gov. The search provides clear summaries, descriptions, and information about data providers. Use the Dataset Search Quickstart Guide for tips.

General Data Repositories

Data.gov
Search by topic and filter by location, dataset type, government type (federal, state, or local), format, and publisher to find raw data on many topics.  Check out the Data.gov User Guide for assistance with searching this site.
Dryad
Best for science and healthcare-related topics. Use the search box at the top right of the homepage to look for a topic, e.g. Parkinson's Disease. The next screen will list results you can limit by subject keyword, geographical location, or journal. Once you have selected a specific study, look for the dataset download button on the right side of the page.
Figshare
Using the search box at the top of the home page, search for topics to find hundreds of researcher-contributed datasets, mainly from published journal articles. Use the limiters on the left side of the screen to limit to datasets under "Item Type" to get the best results.
ICPSR Data Search
A unique resource that contains a wealth of local, state, and federal government data. Browse ICPSR using the topic headings on the homepage, or use the search box to find datasets on more specific topics. Use limiters such as data type, collection method, and time period to narrow a search. Note: Not all of the ICPSR data sets are free.
Mendeley Data
This works like an aggregator of data repositories. Use the search box to enter your topic and find data on a vast array of topics. Limit by date, data type, or source to find data from repositories around the world, including the UK Data Archive, Harvard Dataverse, the Australian National University and more.

Using Data and Datasets

Sage Campus provides courses of various lengths on how to find, use, and interpret datasets, and includes in-depth courses such as Introduction to Data Management, Collecting Social Media Data, Cleaning Messy Data, and Gathering Your Data Online.

Another great resource is Sage Research Methods Datasets, which is a collection of datasets and guides indexed by method and data type that give users a chance to learn data analysis by practicing on and using examples with real data.

Understanding Biases in Datasets

When working with datasets, it's essential to recognize that data is not always objective. Biases can creep in due to a variety of factors. Here are some common biases to be aware of:

Sampling Bias
- Sampling bias occurs when the sample of data does not represent the entire population.
- Example: If you're studying smartphone preferences but only survey iPhone users, you're missing out on Android fans.
Selection Bias
- Selection bias happens when specific data points are systematically included or excluded.
- Example: If you analyze job satisfaction but only include responses from people who love their jobs, you are missing the dissatisfied coworkers.
Measurement Bias
- Measurement bias results from errors in data collection methods.
- Example: Weather data might be skewed if the thermometers consistently read a few degrees too high.
Cultural Bias
- Cultural bias reflects values specific to a particular group.
- Example: If you're studying reading instruction but only looking at data from one country, you might miss global trends.
Label Bias
- Label bias affects how you categorize customers. Be careful not to oversimplify their preferences based on broad labels.
- Example: If you label all millennials as "tech-savvy" without considering their diverse interests, your marketing strategy might miss the mark.
Social Desirability Bias
- Social desirability bias happens when respondents provide socially acceptable answers.
- Example: People might exaggerate how often they exercise or eat veggies in surveys about healthy habits.

Remember, being aware of biases helps us work with data more effectively.

Mitigating Bias

Awareness: Recognize potential biases and actively address them.
Diverse Data Collection: Include diverse samples to avoid underrepresentation.
Random Sampling: Use random sampling techniques to reduce bias.
Transparency: Document data collection methods.
Regular Audits: Regularly assess and correct biases in data.

Additionally, researchers should be aware of issues related to data sovereignty when working with data collected from or with Indigenous peoples. Indigenous data sovereignty is the right of Indigenous Peoples to control data collected about their members, knowledge systems, customs, or territories. By asserting data sovereignty, Indigenous communities can protect their information, ensure ethical data practices, and make informed decisions that align with their values and traditions.

Like this guide or have feedback on it? Let us know!

Find Data and Datasets Feedback

Ask Us!

Have a question or need help?

Citing Datasets in APA Style

For information and examples on citing datasets, see the APA's site:

Data Set References

Find Data and Datasets