Data refers to raw, unprocessed information or facts. It can come in various forms, such as numbers, text, images, or measurements. Imagine it as the building blocks of knowledge.
Datasets are structured collections of data points or observations. Think of it as a big table with rows (instances) and columns (variables). Datasets are like organized treasure troves of information.
Remember, data provides the raw material, while datasets serve as the foundation for analysis and research. Now, let's explore how to find these datasets.
Before you begin your research, ask yourself these questions:
Remember, flexibility and resourcefulness are essential when working with data.
You can find data and datasets in many different types of sources. Here are some of the best places to search:
A list of data and statistical databases can be found in the library by selecting the Research Databases icon on the homepage and filtering by the type Datasets and Statistics. Most of the databases listed have data and statistics incorporated into their collections. However, Sage Data is entirely devoted to data and statistics.
Open data derives from publicly available sources and is available for use, sharing, and distribution by anyone. Generally, it requires only attribution of the original source. Open data differs from public sector data in that public sector data is usually mandated through laws and regulations to be shared. Open data includes public sector data and data that is published and shared by private sector organizations.
Open data is valuable to researchers not only because it is freely available, but also because new combinations of data can be used to create new knowledge, insights, and ultimately new fields of application.
For more information on open data, check out the Open Data Handbook for guides, case studies, and resources on open data.
A data repository is a storage space for researchers to deposit data sets associated with their research. Open-access data repositories store data in a way that allows immediate user access to anyone.
Please note that this is not an inclusive list of repositories available on the web. The repositories listed are general and cover multiple disciplines. This list should be considered a jumping-off point for your research. We recommend speaking to your instructor, advisor, or using the Ask Us! service for repository recommendations specific to your discipline.
Sage Campus provides courses of various lengths on how to find, use, and interpret datasets, and includes in-depth courses such as Introduction to Data Management, Collecting Social Media Data, Cleaning Messy Data, and Gathering Your Data Online.
Another great resource is Sage Research Methods Datasets, which is a collection of datasets and guides indexed by method and data type that give users a chance to learn data analysis by practicing on and using examples with real data.
When working with datasets, it's essential to recognize that data is not always objective. Biases can creep in due to a variety of factors. Here are some common biases to be aware of:
Remember, being aware of biases helps us work with data more effectively.
Additionally, researchers should be aware of issues related to data sovereignty when working with data collected from or with Indigenous peoples. Indigenous data sovereignty is the right of Indigenous Peoples to control data collected about their members, knowledge systems, customs, or territories. By asserting data sovereignty, Indigenous communities can protect their information, ensure ethical data practices, and make informed decisions that align with their values and traditions.
Have a question or need help?
For information and examples on citing datasets, see the APA's site: