5 min read•june 18, 2024
Minna Chow
Milo Chang
Minna Chow
Milo Chang
One of the neat things about data is that we can get information from it! By examining it closely, we can identify trends, make connections and address problems. This is true in almost every field, from science to history.
One or a few data points may not be enough to create a coherent conclusion—you could be dealing with an outlier, and it's difficult to see trends. With larger data sets (often very large data sets, known as big data), you can establish more comprehensive patterns. However, it is important to note that correlation does not necessarily indicate that a casual relationship exists. A correlation just suggests areas for additional research to understand the exact nature of the relationship between
As the world gets bigger and more interconnected, a vast amount of data becomes more accessible and needs to be tracked.
🔗 Take a look at this visual of global shipping in the year 2012!
To create this graphic, researchers had to track millions of shipments over months. This is just one example of the rapidly-growing need for big data processing.
As the data sets get larger and larger, computers become a necessary tool to help us process it. They can process data faster and with less error than humans. (Imagine sorting through all of that shipping data by hand...). At larger scales, you may even need multiple computers or parallel systems to process all the data involved.
This demand has even led to the creation of server farms, areas where many large computers are housed for the purpose of meeting intense processing needs such as dealing with big data sets. Server farms are often located in large data centers, but they can also be stored in much smaller rooms as well.
In terms of data processing, you're only as good as the tools you have to work with. The more powerful your computer is, the better the data processing you'll be able to do.
One of the ways to make processing data easier is through the use of metadata.
Metadata is data about data.
It's like the packing label on a box in the mail or the tags on a piece of clothing: it gives you information about the item it's attached to. For example, the metadata for a YouTube video could include the title, creator, description, and tags for the video, as well as when it was uploaded and how large it is.
Metadata is used to help find and organize data: you can use it to sort and group it. It can also provide additional information to help you use your data more effectively. For example, metadata that tells you when a video was uploaded or a post was made can help you decide whether or not the information you're looking at is outdated.
Regardless of how large they are, data sets can be challenging to deal with. Fortunately, computers can help us deal with some of these issues.
For example, data may not be uniform due to its collection process.
Imagine you made a Google Survey to find out what class people like the most at your school. You create a form that looks like this:
You might also run into this uniformity issue if you're compiling data from many different resources, where formatting standards may be a little different. For example, let's say you're working with a friend to track what time of day people find the most productive, but you're writing down results using 12 hour time while they're using 24 hour time. (Probably just to be difficult.)
The way that computers deal with this is through a process known as cleaning data. This process makes data uniform by eliminating all of these differences. Cleaning data can also help flag or remove invalid and incomplete data.
Data sets may be biased for a variety of reasons.
Taking the favorite class questionnaire as an example, there are many ways in which bias could sneak in:
<< Hide Menu
5 min read•june 18, 2024
Minna Chow
Milo Chang
Minna Chow
Milo Chang
One of the neat things about data is that we can get information from it! By examining it closely, we can identify trends, make connections and address problems. This is true in almost every field, from science to history.
One or a few data points may not be enough to create a coherent conclusion—you could be dealing with an outlier, and it's difficult to see trends. With larger data sets (often very large data sets, known as big data), you can establish more comprehensive patterns. However, it is important to note that correlation does not necessarily indicate that a casual relationship exists. A correlation just suggests areas for additional research to understand the exact nature of the relationship between
As the world gets bigger and more interconnected, a vast amount of data becomes more accessible and needs to be tracked.
🔗 Take a look at this visual of global shipping in the year 2012!
To create this graphic, researchers had to track millions of shipments over months. This is just one example of the rapidly-growing need for big data processing.
As the data sets get larger and larger, computers become a necessary tool to help us process it. They can process data faster and with less error than humans. (Imagine sorting through all of that shipping data by hand...). At larger scales, you may even need multiple computers or parallel systems to process all the data involved.
This demand has even led to the creation of server farms, areas where many large computers are housed for the purpose of meeting intense processing needs such as dealing with big data sets. Server farms are often located in large data centers, but they can also be stored in much smaller rooms as well.
In terms of data processing, you're only as good as the tools you have to work with. The more powerful your computer is, the better the data processing you'll be able to do.
One of the ways to make processing data easier is through the use of metadata.
Metadata is data about data.
It's like the packing label on a box in the mail or the tags on a piece of clothing: it gives you information about the item it's attached to. For example, the metadata for a YouTube video could include the title, creator, description, and tags for the video, as well as when it was uploaded and how large it is.
Metadata is used to help find and organize data: you can use it to sort and group it. It can also provide additional information to help you use your data more effectively. For example, metadata that tells you when a video was uploaded or a post was made can help you decide whether or not the information you're looking at is outdated.
Regardless of how large they are, data sets can be challenging to deal with. Fortunately, computers can help us deal with some of these issues.
For example, data may not be uniform due to its collection process.
Imagine you made a Google Survey to find out what class people like the most at your school. You create a form that looks like this:
You might also run into this uniformity issue if you're compiling data from many different resources, where formatting standards may be a little different. For example, let's say you're working with a friend to track what time of day people find the most productive, but you're writing down results using 12 hour time while they're using 24 hour time. (Probably just to be difficult.)
The way that computers deal with this is through a process known as cleaning data. This process makes data uniform by eliminating all of these differences. Cleaning data can also help flag or remove invalid and incomplete data.
Data sets may be biased for a variety of reasons.
Taking the favorite class questionnaire as an example, there are many ways in which bias could sneak in:
© 2024 Fiveable Inc. All rights reserved.