【Geek Time】-“极客”时间,和“科技宅”Brad一起聊神秘有趣的“黑科技”“高科技”“硬科技”。欢迎关注公号【璐璐的英文小酒馆】,可以查看更多精彩内容,查看英语全文稿哦~
Welcome back to Geek Time. This is the advanced episode about Big Data. Hello, Lulu.
Hi Brad.
We're gonna start off by talking about the… some of the benefits of big data.
I mean, the benefits are pretty obvious, right?Because last time you were talking about the three Vs, the Volume, Variety, Velocity, so it's just basically the ability to be able to process huge amount of data that before it used to probably take people years to process.
Now it can be done in a matter of days or even hours or seconds.
Exactly. And it's not just yet looking at the amount of data, but we're looking at greater geographical areas just like I mentioned with like talking about the weather, but not just with whether, we can look at more like health related issues, it can be a lot of different things; but we can connect variables that typically wouldn't be found when we're looking at things may be related to our health.
And like a doctor, when you go in to see the doctor, he's gonna ask you certain questions. They can start making some correlations based on how you answer those questions.
But with big data, they can actually look at larger groups of people who have health conditions. And based on those health conditions, they might be able to find like a better reason why people have these particular health conditions.
So one possible case jumps to mind is, for example, if people from certain area, certain sort of geographical background, or let's say, other type of background, they have similar backgrounds and they all develop similar symptoms. The doctor might not know, but the big data would help the doctor to find or to build that connection.
Yes, so like they can look for those things much easier. It's not just like one doctor looking for everything, it's several doctors putting out their information and then looking at that data and finding out a more reliable cause for something.
And it's also just about everyone is able to access a lot more data than in the past in the age of big data.
When we look at the data that people have access to, we start to look at some of the difficulties, gets really hard to really randomize the data.
In the past, people would just go out and they would collect data from, you know, random people. They wouldn't collect their name or anything. But nowadays, when like companies are collecting data, there's all this information attached to one particular person. It's hard to really randomize that when you have like all these particular sets related to one person.
Does that mean that, for example, when they collect data from you, they say this is a random person and then… but because they collect your age, they collect your, for example, nationality, and then your geographical occasion, and eventually they will make up a pretty good picture, pretty precise picture of who you are.
So it's not really random, it's not really随机 anyways, it's a specific person.
They have all these data points. And so unless they strip away several of those data points, it's really hard to randomize whose data is what?It's something…
When we look at that amount of data, one of the other issues that comes is like we're looking at a lot of data overload. When someone's doing research, they're gonna look at specific sets of data, but because they have all this extra data, they're going to start just including that extra data, just because they have it at their hands.
When they start doing that, they start looking and finding correlations that aren't really there.
I see. So they start to read too much into the data just because they have it.
There is this data and it's kind of unclear if there's a correlation to it at all.
So for example, everyone who loves Hello Kitty seems to be developing a cough. And you are like… then you draw like a false causal link saying that people who likes Hello Kitty is likely to have like lung disease. B