Talk:Data mining/2014

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia


Making it more readable

Excuse my uncoordinated overhaul (jan-14) I'm experiencing the article as unclear: long blocks of text, containing redundant information and some parts consists of mainly examples. Am suggesting an overhaul to make the article readable. Philip Habing (talk) 12:30, 19 January 2014 (UTC)

The article is quite a mess because everybody wants his products and results to be prominently listed. Which is largely why the "notable uses" section is so big, gets additions again and again and isn't called "examples".
as you can see I added back most of your changes. But when done in smaller steps (have a look at the diff of your edit!) It's easier to see how much really has changed.
I'd prefer to keep the “buzzword” paragraph in the lead; because one of the reasons that the article is such a mess is that literally everything is being dubbed “data mining” by public media and marketing these days. NSA: data mining. Amazon: data mining. Google: data mining. Audi: data mining. Wikipedia: godfather of data mining.
The article should help to understand that not every data collection and processing is sensibly to be titled “data mining”, but sometimes we should stick to calling it “massive data collection”, “privacy breach”, “mass surveillance”. The current abuse of the term (fortunately, they tend to prefer the big data bullshit bingo these days) IMHO belongs to the introduction, not the “history”. 188.98.222.114 (talk) 16:53, 19 January 2014 (UTC)
Aha, I can see. :-) I understand doing it in smaller steps would've been better, sorry about that :-(
Your idea about splitting the lemma sounds as a good idea: Pull out all the examples.
I partly agree on your buzzwords text. Buzzwords itself adds to the information. The sentence "Even the popular book..." doesn't add to understanding the idea's behind data-mining.
When splitting the lemma, the Privacy concerns is still at place in the Data-mining-article.
btw. your name, an IP-adress, is sort of strange to talk to. No offence meant. Philip Habing (talk) 15:05, 20 January 2014 (UTC)

Sharad is .net devloper on LERA TECHNOLOGIES but now he is working on hadoop. Unfortunately he succeeded and became as hadoop expert and now he is going to start institute in UP nothing but his native place. Now — Preceding unsigned comment added by 183.82.3.79 (talk) 10:57, 24 June 2014 (UTC)

Data Access Mining

With the advent of the new cryptography based virtual currencies such as Bitcoin and myriads of other so-called alternative or Alt-coins, "Data Mining" has taken on a new meaning besides the older connotation that it has in this sense. I am currently in the process of creating a startup enterprise, named Data Access Mining And Gaming, and to differentiate traditional data mining from my business's activities I have decided to use the word "access" as a means of differentiating the two. Data Access Mining in this case would be running computer hardware, be it CPU, GPU, ASIC, or otherwise to process the public registry, or Blockchain, for reward based on the established algorithms of the networks of the virtual coin. I encourage others to help me find this topics rightful place on Wikipedia, because I feel like this topic deserves its own listing. Matthew Biebel (talk) 19:30, 24 June 2014 (UTC)

Can you name literature that uses it this way? Bitcoin mining seems to be the common term. If I were you, I would avoid overloading "data mining", it only causes confusion. --Chire (talk) 09:50, 25 June 2014 (UTC)
Thank you for your response, I will have to do some research to find "literature" that has used the terminology that I have described, but Bitcoin itself is essentially data. Also, the mining I am referring to does not necessarily restrict itself to only Bitcoin but to alternative coins based on similar algorithms as Bitcoin such as Litecoin or the myriads of other examples. Encountering "Data Mining" in reference to bitcoin mining has been something more of a first hand experience for me. For instance, I spoke recently with someone, and after telling him the name of my company, he said to me "Data Mining? Oh, like Bitcoin? I keep getting emails about that. Have you found any yet?" Perhaps the literature has yet to catch up with the lingo. Matthew Biebel (talk) 01:23, 26 June 2014 (UTC)
Everything is data. If you move your mouse, it is data. Yet, visualizing your move movements on the screen is not data mining. I don't think literature has to catch up with uninformed use of words. You will notice that the article Bitcoin does not make use of the term "data" much, nor does cryptocurrency. If you want to avoid the name Bitcoin because of alternative coins, why don't use use cryptocurrency mining? That would be much more standard and precise. Why use something ambiguous? Do yourself a favor, and avoid the stop word "data", and the buzzword "data mining"! --Chire (talk) 09:54, 26 June 2014 (UTC)
To be honest, the reason I chose the term for my company is because we had an acronym first and I developed and trademarked the name. At the time, I did not realize that "data mining" was its own field in computing, and when I spoke of "data mining" in the context of "bitcoin mining" people understood that I was talking about machines crunching data to produce bitcoin. I agree that "cryptocurrency mining" is more precise. Thank you for your contribution and clarifications.Matthew Biebel (talk) 19:00, 26 June 2014 (UTC)

Data mining in Econometrics

Is it possible/desirable to include the rather different approach/attitude to data mining that econometricians tend to have? I was expecting to find this here but found a rather different approach. The stuff I have in mind might be found here: Lovell, Michael C. (1983) ‘Data mining’, The Review of Economics and Statistics, 65: 1–12., here Hoover, Kevin D. (1995) ‘In defense of data mining: some preliminary thoughts’, in Kevin D. Hoover and Steven M. Sheffrin (eds) Monetarism and the Methodology of Economics: Essays in Honour of Thomas Mayer. Aldershot: Edward Elgar or here Kevin D. Hoover and Stephen J. Perez (2000) Three attitudes towards data mining, Journal of Economic Methodology 7:2, 195–210. In this article they offer a definition of data mining:

"Data mining’ refers to a broad class of activities that have in common a search over different ways to process or package data statistically or econometrically with the purpose of making the final presentation meet certain design criteria."
And they list three attitudes towards it. "Data mining is"
  1. "to be avoided and, if it is engaged in, we must adjust our statistical inferences to account for it"
  2. "inevitable and that the only results of any interest are those that transcend the variety of alternative data mined specifications.
  3. "essential and that the only hope that we have of using econometrics to uncover true economic relationships is to be found in the intelligent mining of data."

This stuff does seems to be using data mining more like a synonym for some aspects of data dredging. Anyway I was expecting to find stuff on this data mining here but didn't and think it ought to be somehwere. Best wishes (Msrasnw (talk) 10:38, 11 September 2012 (UTC))

Yes, this clearly refers to the "old" use of the term data mining with respect to generating hypotheses, which is covered by the article data dredging. The term "data mining" is way too broad to cover everything, and it is not used consistently (or correctly) throughout literature. The much clearer defined term is "knowledge discovery". I do not think we need to cover all abuses of the term in the article, but instead we should focus on the "knowledge discovery" based term; other are to be references as "maybe you are looking for: data dredging". --Chire (talk) 11:08, 15 September 2012 (UTC)
I think this alternative negative use of data mining needs to be more acknowledged in the article. It what is commonly meant by the term among many scientists. --Pengortm (talk) 22:13, 11 August 2014 (UTC)

Temporal data mining

This page needs to include info on temporal data mining. An attempt to do so was undone by user Chire in order to remove a valid reference. Can someone please add the temporal section again and use a more relevant reference if they can find it? — Preceding unsigned comment added by 172.219.28.49 (talk) 02:56, 31 August 2014 (UTC)