Skip to Main Content
University of Phoenix logo
University Library

Find Data and Datasets

What are data and datasets?

Data refers to raw, unprocessed information or facts. It can come in various forms, such as numbers, text, images, or measurements. Imagine it as the building blocks of knowledge.

Datasets are structured collections of data points or observations. Think of it as a big table with rows (instances) and columns (variables). Datasets are like organized treasure troves of information.

Remember, data provides the raw material, while datasets serve as the foundation for analysis and research. Now, let's explore how to find these datasets.

Before you begin your research, ask yourself these questions:

  • What is the population you want to study?
    • People (individuals, couples, households)
    • Organizations (companies, political parties, professional groups)
    • Commodities or things (arrests, commercial travel, crops)
  • When is the time you want to study?
    • A single point in time
    • A time series (annually, quarterly, monthly, etc.)
  • Where is the location (geography or place) you want to study?
    • Political boundary (county, state, or country)
    • Census boundary (metropolitan statistical area, census tracts, block groups)
  • Who would collect or publish this information? Who would need this information?
    • Government agencies (Census Bureau, Department of Labor, Centers for Disease Control and Prevention)
    • NGOs, IGOs, or think tanks (World Bank, United Nations, Pew Research)
    • Academic institutions (research funded by outside foundations is often publicly available)
    • Non-profits or associations (American Cancer Society, National Education Association)
    • Private sector (individual companies, marketing, and commercial firms who charge for data)

Remember, flexibility and resourcefulness are essential when working with data.

Finding Data and Datasets

You can find data and datasets in many different types of sources. Here are some of the best places to search:

A list of data and statistical databases can be found in the library by selecting the Research Databases icon on the homepage and filtering by the type Datasets and Statistics. Most of the databases listed have data and statistics incorporated into their collections. However, Sage Data is entirely devoted to data and statistics.

Open data derives from publicly available sources and is available for use, sharing, and distribution by anyone. Generally, it requires only attribution of the original source. Open data differs from public sector data in that public sector data is usually mandated through laws and regulations to be shared. Open data includes public sector data and data that is published and shared by private sector organizations.

Open data is valuable to researchers not only because it is freely available, but also because new combinations of data can be used to create new knowledge, insights, and ultimately new fields of application.

For more information on open data, check out the Open Data Handbook for guides, case studies, and resources on open data.

A data repository is a storage space for researchers to deposit data sets associated with their research. Open-access data repositories store data in a way that allows immediate user access to anyone.

Please note that this is not an inclusive list of repositories available on the web. The repositories listed are general and cover multiple disciplines. This list should be considered a jumping-off point for your research. We recommend speaking to your instructor, advisor, or using the Ask Us! service for repository recommendations specific to your discipline.

A Search Engine for Repositories

General Data Repositories

Using Data and Datasets

Sage Campus provides courses of various lengths on how to find, use, and interpret datasets, and includes in-depth courses such as Introduction to Data Management, Collecting Social Media Data, Cleaning Messy Data, and Gathering Your Data Online.

Another great resource is Sage Research Methods Datasets, which is a collection of datasets and guides indexed by method and data type that give users a chance to learn data analysis by practicing on and using examples with real data.

Understanding Biases in Datasets

When working with datasets, it's essential to recognize that data is not always objective. Biases can creep in due to a variety of factors. Here are some common biases to be aware of:

  1. Sampling Bias
    • Sampling bias occurs when the sample of data does not represent the entire population.
    • Example: If you're studying smartphone preferences but only survey iPhone users, you're missing out on Android fans.
  2. Selection Bias
    • Selection bias happens when specific data points are systematically included or excluded.
    • Example: If you analyze job satisfaction but only include responses from people who love their jobs, you are missing the dissatisfied coworkers.
  3. Measurement Bias
    • Measurement bias results from errors in data collection methods.
    • Example: Weather data might be skewed if the thermometers consistently read a few degrees too high.
  4. Cultural Bias
    • Cultural bias reflects values specific to a particular group.
    • Example: If you're studying reading instruction but only looking at data from one country, you might miss global trends.
  5. Label Bias
    • Label bias affects how you categorize customers. Be careful not to oversimplify their preferences based on broad labels.
    • Example: If you label all millennials as "tech-savvy" without considering their diverse interests, your marketing strategy might miss the mark.
  6. Social Desirability Bias
    • Social desirability bias happens when respondents provide socially acceptable answers.
    • Example: People might exaggerate how often they exercise or eat veggies in surveys about healthy habits.

Remember, being aware of biases helps us work with data more effectively.

Mitigating Bias

  • Awareness: Recognize potential biases and actively address them.
  • Diverse Data Collection: Include diverse samples to avoid underrepresentation.
  • Random Sampling: Use random sampling techniques to reduce bias.
  • Transparency: Document data collection methods.
  • Regular Audits: Regularly assess and correct biases in data.

Additionally, researchers should be aware of issues related to data sovereignty when working with data collected from or with Indigenous peoples. Indigenous data sovereignty is the right of Indigenous Peoples to control data collected about their members, knowledge systems, customs, or territories. By asserting data sovereignty, Indigenous communities can protect their information, ensure ethical data practices, and make informed decisions that align with their values and traditions.

Ask Us!

Have a question or need help?

Citing Datasets in APA Style

For information and examples on citing datasets, see the APA's site: