Talk:Group testing

Group testing has been listed as one of the Mathematics good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it.
Review: March 5, 2018. (Reviewed version).

A fact from Group testing appeared on Wikipedia's Main Page in the Did you know column on 21 May 2018 (check views). The text of the entry was as follows:

Did you know... that group testing can be used for DNA sequencing, cryptography, machine learning, and data forensics?

A record of the entry may be seen at Wikipedia:Recent additions/2018/May. The nomination discussion and review may be seen at Template:Did you know nominations/Group testing.

Wikipedia

About examples[edit]

Would it be a good idea to add a section for example algorithms? I.e. that explains the process of performing the Generalised Splitting Algorithm and also non-adaptive algorithms such as Combinatorial Orthogonal Matching Pursuit (COMP) and Combinatorial Basis Pursuit (CBP), or even their more recent (2014) extensions, DD, SSS and SCOMP (Sequential COMP)? Or maybe this should be a new article? 'Group Testing Algorithms' or something? ♫CheChe♫ talk 13:51, 8 October 2016 (UTC)[reply]

If anyone is interested in this I'm currently writing up what I mean over at my sandbox. This could either be it's own article or added to this one. --♫CheChe♫ talk 23:10, 2 November 2016 (UTC)[reply]

Changed opening example[edit]

I just wanted to say that I've changed the example in the lead to be (what I think is) a bit more accessible. The old example (involving a balance scale) was nice though, and even included an animated diagram. I've left the old text in a comment at the end of the lead if anyone feels like re-incorporating it into the main body of the article somehow. –♫CheChe♫ talk 10:35, 2 July 2017 (UTC)[reply]

GA Review[edit]

This review is transcluded from Talk:Group testing/GA1. The edit link for this section can be used to add comments to the review.

Reviewer: Mike Christie (talk · contribs) 22:22, 3 March 2018 (UTC)[reply]

I'll review this. Mike Christie (talk - contribs - library) 22:22, 3 March 2018 (UTC)[reply]

Looking through the article, I can see immediately that much of it is uncited; I'm going to pause the review to allow the citations to be added. Once everything is cited, I'll complete the review. Mike Christie (talk - contribs - library) 23:00, 3 March 2018 (UTC)[reply]

@Mike Christie Thank you for taking the time to review. It's my first time on this end so I'll do my best!

As for the citations, I don't think the issue is that we need more references (not that you said that), but that certain references apply to large sections of material. Combinatorial group testing and its applications (the canonical text-book for this field) is the chief example of this. It can be used as a reference for almost every claim in Basic description, as well as most of the Formalization section. I thought it would be wrong to spam sections of the article with citations all from this one source, so I left it as a general reference at the bottom. In particular, I believe the basic facts come under Scientific citation guidelines/Uncontrovertial knowledge. If you have any suggestions on how to make the sources of the information clearer that would be greatly appreciated. Thanks again. –♫CheChe♫ talk 10:09, 4 March 2018 (UTC)[reply]

OK, I'll go ahead with the review on that basis. It's been a while since I've seen an article that used this approach; I did wonder if that was your intention. The only suggestion I can give is directly contrary to SCICITE: you could use that basic reference as the citation at the end of each paragraph. It would at least stop readers making the same assumption I did, but it's not required for GA. FYI, featured status does require inline citations, if you decide to go on to FAC. Mike Christie (talk - contribs - library) 11:52, 4 March 2018 (UTC)[reply]

Thank you for the advice. I've gone for a hybrid approach, where the citation is introduced early and put at the end of the paragraph.

I am copyediting as I go; please revert if I make any mistakes. I should also mention that although my degree is in maths, it's decades old and I have no particular background in this area, so please excuse any stupid questions.

~~and although the computational complexity of group testing has not been determined, it is suspected to be hard in some complexity class: what does the second half of this mean?~~
I believe this is standard terminology in complexity theory. I've added a wiki-link to clarify matters. The 'it' in the second clause refers to group testing, if that isn't already clear.
The link explains it, so that's fine. By the way, I'm tweaking your indents a bit when I reply; the rule is to repeat the previous indent and then add a colon or asterisk, depending on whether you want an indent or a bullet. Lots of editors don't do it right, since most of the time it'll look OK, but if you occasionally see stray bullets to the left of the list this is the reason. Mike Christie (talk - contribs - library) 21:51, 4 March 2018 (UTC)[reply]
Optional, but you might consider moving the history section to the end, since it uses terms that are not defined till later in the article.
I would prefer to define any terms needed in the Basic description section, though moving the whole thing is also reasonable. Would you mind saying which terms they are? Or do you mean they haven't been formally defined yet?
Well, perhaps "terms" was the wrong word. For example, you discuss the generalised binary-splitting algorithm; naturally I was curious as to what this was, but you don't define it till later. Similarly COMP and DD are discussed, and then defined further down the article. I agree that the "Basic description" section is the right way to start, but if you were to move the history section to the end, you could rephrase it a little, since these algorithms would have been mentioned earlier in the article. You'd be able to be more concise, and everything would be explained when first mentioned. I'm certainly not going to hold up GA for this; it's just a suggestion. Mike Christie (talk - contribs - library) 23:11, 4 March 2018 (UTC)[reply]
Ok. For now I've put wiki-links to later in the article for the relevant algorithms. –♫CheChe♫ talk 09:56, 5 March 2018 (UTC)[reply]
In particular, it is known that zero-error algorithms require significantly more tests asymptotically (in the number of defective items) than algorithms that allow asymptotically small probabilities of error: I don't understand the first "asymptotically".

Done. It means as the number of defective items approaches the total number of items (for an arbitrarily large total number of items). I've changed it so that hopefully that is clear now.

The performance of this hypothetical algorithm suggests that there is, and how much, possible room for improvement there is when {\displaystyle d^{2}<n} {\displaystyle d^{2}<n}. Needs rephrasing; "there is...possible room for improvement there is".

Partly done. I've tried to rephrase this but it still feels a little clunky.

It is both possible to make an error with no incorrect tests: surely this is only possible with a flawed algorithm? Reading further, I see that COMP and its enhancements can generate false negatives; perhaps we could add something like "if the algorithm is designed to estimate the probability of items being defective"?
Not done. Not flawed as such. As the article explains later, most modern combinatorial algorithms work 'probabilistically' (even though they aren't probabilistic algorithms in the way described in the article), producing significant gains for even small probabilities of error. This makes me hesitate to add extra qualifiers.
OK, I've struck the point, since it's correct as written. I was surprised to see it, since I hadn't realized the algorithms might be designed to work that way, and it wasn't until later in the article that the explanation was apparent. If you come up with a concise way to clarify this I think it would be good to add. Mike Christie (talk - contribs - library) 22:04, 4 March 2018 (UTC)[reply]
Alright. I've added a little explanation similar to the one here. –♫CheChe♫ talk 22:26, 4 March 2018 (UTC)[reply]
That's helpful; looks like you missed the noun after "combinatorial", though? Mike Christie (talk - contribs - library) 22:56, 4 March 2018 (UTC)[reply]
Right you are. Fixed again.
~~required to always solve find: looks like a word needs to be deleted.~~

Done.

~~For any group testing problem with sample space ${\mathcal {S}}$ and group-testing algorithm, we have that: what's the relevance of "and group-testing algorithm" here?~~

Partly done. It's saying that the following statement is true for every problem and for every algorithm. I've tried to make it a bit clearing the the article. I guess it would be possible to leave it out, and hope that it is implied / clear from context.

In the simplest noisy case, where there is a constant probability, {\displaystyle q} q, that a group test will have an erroneous result: Surely this doesn't account for the case when false positives occur at a different probablity than false negatives, which I believe is the case for some medical applications? I see we say "In the simplest noisy case", so perhaps this is not in conflict, but I would have thought this situation occurs rarely in real life.
(Nothing changed.) That is indeed what it's trying to say. As mentioned in the article, this is called the Bernoulli noise model. I think it's a standard way of treating noise without it getting too messy.
OK -- there's certainly nothing wrong in the article; I'm just surprised the sources don't discuss this possibility. Mike Christie (talk - contribs - library) 21:51, 4 March 2018 (UTC)[reply]
~~A binary matrix, $M$ is called $d$ -disjunct if the Boolean sum of any $d$ columns does not contain any other column. What does "contain" mean here?~~
Done This one is difficult to explain succinctly. One column A contains column B if every index where B has a 1, A also has a 1. I've added this explanation to the article.
I think that does it, so I've struck the point. Mike Christie (talk - contribs - library) 21:59, 4 March 2018 (UTC)[reply]
You might consider a "See also" link to balance puzzle; a quick search finds this paper, for example. I recall seeing years ago, I think in one of the MAA journals, papers extending the problem to cases when a single error, or two, or three errors occur in the weighing; as I recall these were not probabilities, but were discrete situations: you knew that up to N errors could occur in the weighing.

Partly done. You're right. This article used to be linked when balance puzzles were used as the motivating example.

~~Are all the log functions base 2 where not otherwise specified? You give a base of 2 in one case but not in others.~~

Done. That's right they're all base 2. Most of the cases don't matter since they're inside big-O notation, but I've added all the 2's in anyway.

Hence there are $119$ remaining items, so $2^{\alpha _{2}}=14$ : Doesn't ${\alpha _{2}}$ have to be integer, since it's defined with a floor function? So this should be a power of 2, shouldn't it?
Partly done. That's a good point. I've corrected the text. However, when I uploaded a corrected version of the diagram nothing changed. Even stranger, when I tried to upload the corrected version again I got and error about uploading an identical copy of a file. I'll have to get in touch with an admin to sort it out.
It was probably cached in your browser. Shift-F5 will usually clear it; I think you can add a purge pulldown (which will purge your cache) to your top-of-page links; I can dig that up for you if you like. Mike Christie (talk - contribs - library) 21:51, 4 March 2018 (UTC)[reply]
Oh, of course, silly me. It's all fine now. –♫CheChe♫ talk 22:17, 4 March 2018 (UTC)[reply]
~~Why is "information lower bound" sometimes italicized and sometimes not?~~

Done. It should only have been italicised when it is defined.

~~The generality of the theory of group testing and lends it to many diverse applications: something wrong here.~~

Done.

The problem is now to find all the active users in a given epoch, and schedule a time for them to transmit (if they have not already done so successfully). This is then a group testing problem where the query asks a set of users to attempt a transmission if they are active. The outcomes are $\{0,1,2^{+}\}$ , corresponding respectively to the possible results of a query: no active users, exactly one active user (message successful) or more than one active user (message collision). I don't follow this. First we say the problem is to "schedule a time for them to transmit"; then we say it's a problem "where the query asks a set of users to attempt a transmission". The first phrasing indicates the users will be told when to transmit; the second indicates the users will make an attempt without instructions, and possibly fail. These seem to be in conflict.
There is no conflict. The users are scheduled to transmit (if they need to) and may be scheduled to transmit again. The algorithm chooses which users to transmit in a given query (and may be run independently by each user).
Not sure this is really relevant for GA, since the difficulty here is just my understanding, but I'm still not getting it, I think because I'm not clear what is happening. A set of users have received a message and now may wish to transmit; those users are active for the new epoch. They are aware they should transmit if they wish to; the scheduling is telling them they are able to transmit in this epoch? That is, no finer time granularity for sending their message than the entire epoch is given to them? So they attempt sends, some of which fail; and then they reattempt? Since the epoch doesn't end till they've all succeeded? I'm not clear what is meant by "the query"; this is a message to the users informing them they are active in the epoch, and receiving a response from them as to whether they succeeded? Mike Christie (talk - contribs - library) 23:03, 4 March 2018 (UTC)[reply]
In my opinion it is relevant, since it means the text has done a bad job of explaining itself. I'll see what I can do to fix it tomorrow. –♫CheChe♫ talk 23:41, 4 March 2018 (UTC)[reply]

I've tried to clarify things here.
I found a copy of Combinatorial Group Testing and Its Applications on Google Books, so I checked a couple of the paragraphs with no inline citation to verify that the book did indeed source them. I couldn't find the word "hash" in the source, so could you tell me where that's covered in the book? I couldn't see anything about data forensics in the contents page. Similarly, I couldn't find anything there about compressed sensing.

Hashes aren't covered in the book. That section uses Goodrich et al. I've added in citations for the stronger claims. The use of hashes generally in security is covered in the wiki-linked article.
The begginning part of the machine learning section is covered in the book in the final section. I've added citations for the other parts.

Not an issue for GA, but you don't have any footnotes linking to Kagan & Ben-gal. If this is in the references list as a further reading item, you might want to add "|ref=no" to the template, to prevent it from generating an anchor.

Done.

-- Mike Christie (talk - contribs - library) 15:23, 4 March 2018 (UTC)[reply]

@Mike Christie: Phew, I think that's everything for the time being. ~~Would you know where to go to ask about this problem with the image?~~ –♫CheChe♫ talk 18:38, 4 March 2018 (UTC)[reply]

@Mike Christie: Ok, I think that really is everything now. –♫CheChe♫ talk 10:11, 5 March 2018 (UTC)[reply]

Yes, that does it. Promoting to GA. Mike Christie (talk - contribs - library) 12:00, 5 March 2018 (UTC)[reply]

Pool testing of virus samples[edit]

Pool testing for coronavirus would seem to come under this topic, can something be added? [1] The Language Learner (talk) 09:14, 25 May 2020 (UTC)[reply]