Talk:Backup/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2

Storage media – hard drives

A problem with this paragraph is that the references don’t identify any “reduction” in hard drive fragility. Rather (in 2 of the 3 of the refs anyhow), they talk about “ramp load/unload technology” (first implemented in 2000) and set forth the results of in-house drop tests showing how the company’s portable drives exceed the “industry average” for drop tests. Indeed, Iomega warns, “survival in falls up to the specification is not guaranteed. Dropping your Iomega drive, or exposing it to incidental or repeated impacts, as well as other factors, can damage the hard drive despite any of the above features.”

The refs quantify additional shock-resisting characteristics of certain of their products, as of a point in time. They don’t describe changes over time, or represent that the drives are anything but more durable than drives built without these and other durability features. It’s WP:Synthesis and editorial inference to conclude that the documents say more. I’ve revised the text to match the refs; and I think too, removed any need for the disputed footnote. (Diff here.) JohnInDC (talk) 17:56, 26 July 2018 (UTC)

If we want to say that hard drives are closing the gap on tape on this issue, then all we need is a reliable source that says that. I've looked (though not exhaustively) and didn't find anything suitable. JohnInDC (talk) 18:07, 26 July 2018 (UTC)
IMHO the real problem with this paragraph is that JohnInDC rewrote a sentence in it on "17:54, 26 July 2018‎", and now complains that the refs in that rewritten sentence "don’t describe changes over time, or represent that the drives are anything but more durable than drives built without these and other durability features. It’s WP:Synthesis and editorial inference to conclude that the documents say more."
The sentence used to say (omitting refs) "However, as the technology of ramp loading and the accelerometer (sometimes termed a "shock sensor") has migrated over the last few years from laptop computers down to individual hard disks, three manufacturers' descriptions of their portable hard disk technology indicate that the transport vulnerability has been reduced." After JohnInDC's rewrite, it now says "To ameliorate this concern, several manufacturers produce portable drives employing ramp loading and accelerometer technology (sometimes termed a "shock sensor"), which exceed industry averages in drop tests." The words "which exceed industry averages in drop tests" aren't mine—they're his.
The second sentence in the first section in the Iomega article says "The baseline shock tolerance specification for Drop Shock Technology requires that sampled drives subjected to drop testing must be intact and working after a non-operating shock of 900G @ 1ms and an operating shock of 250G @ 2ms – the technical equivalent of a 36 inch drop." That 36 inch drop is exactly the 2010 industry average I calculated from the first two sentences in the second paragraph of the second section of the Iomega article (the "drop testing at a height of 51 inches" in that same paragraph is for their "semi-rugged" Drop Guard Technology that "features special internal cushioning that protects the hard drive inside the case, increasing its shock resistance"). The HGST 2007 white paper claims "1,000 Gs of non-operating shock.", vs. the 2010 industry-average "non-operating shock of 900G @ 1ms" for Iomega Drop Shock Technology, but that's only 11% over the industry average—a difference that would be hardly worth mentioning even if it didn't constitute "buying advice".
Thinking it over, I think the original fault was mine for imprecise wording of the sentence. I should have written something like "... the transport vulnerability has been reduced as other manufacturers adopted these mid-1990s-to-2000 inventions"—as the HGST ref clearly says they have. With this alteration, the purpose of my proposed footnote becomes clear; it is to aid the reader in answering an IMHO key question "Will the data on my non-"rugged" portable HDD be safe if I drop it out of my 36-inch-high pocket or purse?" The best answer—but one that will be OR pending a comprehensive reference—seems to be "Yes, if the drop is onto nothing less resilient than an industrial carpet" (per the 2010 Iomega ref) but "Maybe no, if the drop is onto a hard floor" (per the 2016 PCWorld ref cited in the preceding sentence). DovidBenAvraham (talk) 06:14, 27 July 2018 (UTC)
You misunderstand. I rewrote the sentence to conform to the refs and am happy with it. As for the footnote, the drop test information can be found within the first couple of pages of the refs and doesn't need to be echoed here. This is precisely the same issue on which you sought the previous third opinion. JohnInDC (talk) 10:55, 27 July 2018 (UTC)
Thank you, JohnInDC; you have conceded the need for the footnote and the sentence rewrite by showing—in your "17:56, 26 July 2018 (UTC)" comment—that you totally misunderstood the refs.
First, the Iomega ref discusses 3 different technologies with different degrees of "ruggedizing": Drop Shock Technology, Drop Guard Technology, and Drop Guard Extreme. Since I eliminated all discussion of "ruggedized" portable HDDs from the WP paragraph (because The Wirecutter deleted in its 2018 review revision its paragraph discussing enlightening tests of "ruggedized" HDDs), the only section of the Iomega ref we are concerned with is the one discussing non-"ruggedized" Drop Shock Technology. That is "the engineering mechanism inside all Iomega Portable Hard Drives that helps protect the heads and media in case the drive is accidentally bumped or dropped"—namely ramp loading and accelerometer technology. The only information we need to consider from the Iomega article's Drop Guard Technology section is the string of words "... 51 inches onto industrial carpeting. This shock tolerance specification is 40% above the industry average for portable hard drives ...", which by a simple calculation gives us the industry average—which is precisely what Drop Shock Technology achieves.
Second, none of the non-"ruggedized" portable HDDs in the refs appreciably exceeds that industry average in drop tests. Based on comparing its non-operating 1000G rating to the non-operating 900G rating for Iomega's Drop Shock Technology, the HGST drive exceeds it by about about 11%, but I've chosen to avoid that "buying advice". The really useful point of the HGST ref is the mid-1990s "patent to this invention (US 6,025,968), which is used in all HGST drives and any hard drive incorporating load/unload technology.".
I doubt that by 2018 there are any portable HDDs on the market that don't incorporate load/unload technology. The Toshiba ref says the Canvio portable HDD does include it. The reason all portable HDD published specifications leave out the non-operating drop shock ratings is the fear of litigation described in my "22:13, 10 July 2018 (UTC)" comment. The 2010 Iomega whitepaper is an exception, but its Note at the bottom you quote includes hedging that would allow any lawyer to sleep easy.
That's why I think, to avoid less-intellectually-gifted readers of the paragraph making the same mistakes you did, we need the footnote—plus the sentence rewrite proposed in the last paragraph of my "06:14, 27 July 2018 (UTC)" comment. DovidBenAvraham (talk) 23:32, 27 July 2018 (UTC)
I'm content to rely on my observations above, and as I laid out at DRN. And once again: Quit the personal attacks. Thanks. JohnInDC (talk) 00:08, 28 July 2018 (UTC)
You are intellectually-gifted, JohnInDC; it's just that IMHO you don't have even the modest computer hardware/software knowledge I do. You're an above-average representative of the likely readers of this article, and if you can misunderstand the refs—which you demonstrably have while insisting that you haven't—then they will too without the footnote and the sentence rewrite. DovidBenAvraham (talk) 05:01, 28 July 2018 (UTC)
Thanks. Clarification accepted. But if I am in fact confused about the refs after examining them for a week or more, with the *benefit* of the footnote, then the thing can’t sensibly be called illuminating or simplifying. As I’ve said from the outset. JohnInDC (talk) 11:42, 28 July 2018 (UTC)
You raise an excellent point, and it's led me to rethink the sequence of sentences within the paragraph. What was the next-to-last sentence could become the third-from-last sentence, and say—using the 3 manufacturers' refs—something like "Because of the introduction of ramp loading and accelerometer technology (sometimes termed a 'shock sensor') in the early 2000s, several manufacturers now produce improved portable drives whose tests show they survive an average 36-inch non-operating drop onto industrial carpet." What was the third-from-last sentence could become the second-from-last sentence, and say—using the PCWorld ref—something like "However the drives may still be damaged, especially while being transported (e.g., for off-site backups), if they are dropped some distance onto a hard floor such as is found in a bank safety-deposit vault." Would that be more understandable, JohnInDC? If so, and you thought it would be helpful, the new third-from-last sentence could have a very short footnote explaining how the 36-inch industry average drop test can be calculated from the words of the Drop Guard Technology section of the Iomega ref. DovidBenAvraham (talk) 20:28, 28 July 2018 (UTC)
The articles don't say what want them to say, and it's Synthesis to force them to do it. The PC World article is expressly comparing and contrasting backup media options. It lists several advantages to hard drives but then says they're mechanical devices, and subject to shock, and you can do everything right but if you drop them on a hard floor, they break. The manufacturers' white papers don't make any such comparisons. They just lay out specs. They don't compare hard drive specs to anything - not other media, not hard drives from particular prior years, nothing. (Contrast WD's year-by-year comparison chart for GST Areal Density.) They say, this-and-that technology gives their portable drives "enhanced" shock tolerance (compared to ... ?), that their own tests show that the portable drives survive a drop of so many inches onto carpet, but oh yes, even so, your drive may break anyhow, best be careful. In the words of one,
[Our technologies] offer extra shock protection for Iomega Portable Hard Drives, but survival in falls up to the specification is not guaranteed. Dropping your Iomega drive, or exposing it to incidental or repeated impacts, as well as other factors, can damage the hard drive despite any of the above features. [These technologies] will generally provide additional protection to your drive and data as compared to standard hard drives.
In other words: Hard drives are fragile mechanical devices. Be careful.
What you seem to want to do is to use these manufacturers' spec & marketing documents as counterpoint to the PC World article that, again, expressly compares backup media. But you can't do that, because the sources don't do that; they do not contradict the PC World article. They're just raw data, in a vacuum. I've said it at least twice above - if you want this Backup article to say that hard drives are much better than when PC World looked at them in 2016 and they really aren't so subject to mechanical failure any more, or that their fragility is kind of a myth - then find a reliable source that says that, and cite it. I've done as close a thing as possible with these refs, which is to recite what the PC World article says, and then say, well, manufacturers do make portable drives that are supposed to be "more" shock resistant (whatever that is), and cite to their documents. Even that's a bit dicey, because it implies counterpoint where really there is none - but at least the text matches what the refs say.
So the paragraph as now written tracks the refs, and readers don't need a footnote to explain any of this, because anyone who wants to know still more can click through to the conveniently wiki-linked, one to four-page-long references, and within one or three screens find exactly the information that they might want about just how shock-resistant these portable drives are, laid out in nice tables. That's how Wikipedia works. We summarize; and people who want to know more can read the references. These particular refs are not long, or obscure, or challenging. The footnote is at least unnecessary; but in fact it's worse because it requires a reader to click through to the refs to figure out what it's talking about. It forces the reader to do more work. It doesn't illuminate, but confuses. It doesn't make the article better; it makes it worse. JohnInDC (talk) 22:02, 28 July 2018 (UTC)
No, the paragraph as written does not track the references, because the words "which exceed [my emphasis] industry averages in drop tests" in the next-to-last sentence simply aren't correct as of 2010 or later. To see why, please reread this diff—especially my second and third paragraphs. It is still true as of mid-2018 that, as I wrote at the top of the third paragraph in the diff'd comment, "... none of the non-"ruggedized" portable HDDs in the refs appreciably exceeds that industry average in drop tests." That industry average is a 36-inch non-operating drop onto industrial carpet.
I now propose that the next-to-last sentence in the article paragraph be changed to "To ameliorate this concern, several manufacturers produce non-"ruggedized" portable drives employing ramp loading and accelerometer technology (sometimes termed a "shock sensor"), which do not deviate greatly from the industry average for drop testing of surviving a 36-inch non-operating drop onto industrial carpet", using the same refs as currently. Since the words "non-'ruggedized'" are likely to confuse many readers, I would suggest that the revised sentence include a footnote saying at a minimum "This article does not discuss 'ruggedized' portable hard drives, of which Iomega's non-longer-manufactured Drop Guard Xtreme is a minimally-cushioned technology example."
We could, instead of that suggested footnote, add an additional sentence mentioning "ruggedized" portable HDDs. However there would be two problems with doing so: (1) Because it was the only RS on that subject, I would have to use the Wayback Machine to resurrect the 2017 version of The Wirecutter article discussed in this diff. (2) As this example (on which you should mute the audio) shows, "portable for off-site backup" for such HDDs requires that the reader have a larger-than-normal pocket or purse because of modern non-minimal cushioning. DovidBenAvraham (talk) 21:25, 29 July 2018 (UTC)
I've now substituted a better example video, which also compares the "ruggedized" HDD to non-"ruggedized" HDDs to emphasize how much bulkier the "ruggedized" one is. IMHO the "ruggedized" HDD wouldn't fit in a reasonably-priced safety deposit box, which would rule out its use for off-site backup unless the backer-upper has a generous budget. DovidBenAvraham (talk) 20:26, 30 July 2018 (UTC)
No. The product sheets do not compare hard drives to other media, they don't claim that hard drives aren't fragile or aren't subject to shock damage. They merely lay out hard drive drop specs that the manufacturers won't even guarantee - because hard drives are fragile. As regards the PC World article, the spec sheets are a non-sequitur. (Indeed literally, in that at least 2 of them predate the PC World article by several years.) You can't use those sources to say, "hard drives aren't as bad as all that". In fact the more we talk about this, the more I think the article text (which, yes, I drafted) already goes too far in connecting the two separate points. And finally, even if they were related, the footnote - exactly like the ref quotes that you sought the 3O on - are unhelpful to readers in navigating these very short, well-captioned references. JohnInDC (talk) 22:33, 29 July 2018 (UTC)
Further, and again, it is not for us to guide prospective purchasers to a suitable product. Anyone who reads this article and wonders whether a hard drive is suitable for them in lieu of - say - more durable, but more expensive SSDs, can look at these (quite compact) refs, quickly see in plain English that certain hard drives are designed (but not warranted!) to survive a 36" drop onto carpet or even better, and decide. It's all right there. If you want the article to say, or imply, or suggest, or lead the reader to conclude, that "hard drives are probably a viable option for the average user", then find a source that says that and cite to it; and short of that, we aren't Consumer Reports or tech advisors. JohnInDC (talk) 23:00, 29 July 2018 (UTC)

But really - I just have to quit. I have made these same points over and over, and we have a 3O that says the same things, and still we're going in circles - and in yet another forum to boot. I intend now to await the views of the DRN volunteer, who has plenty of material to work with, and hope I have the willpower to refrain from further comment. JohnInDC (talk) 23:46, 29 July 2018 (UTC)

OK, I now propose to split the next-to-last sentence in the paragraph into two sentences: "To ameliorate this concern, several manufacturers started in 2007 to produce portable drives employing ramp loading and accelerometer technology (sometimes termed a "shock sensor"). By 2010 the industry average for drop testing of drives with that technology was being intact and working after a 36-inch non-operating drop onto industrial carpeting." There would be no footnotes. The ref for the first sentence would be the 2007 HGST whitepaper, which says "the patent to this invention (US 6,025,968) [enabling the sliders to move off the disk area to the ramp in a controlled fashion during an unexpected power down situation] is used in all HGST drives and any hard drive incorporating load/unload technology". The refs for the second sentence would be all three of the refs for the current next-to-last sentence. DovidBenAvraham (talk) 05:36, 31 July 2018 (UTC)
That is better, thank you. I would revise it along these lines:
One disadvantage of hard disk backups vis-a-vis tape is that hard drives are mechanical devices and may be more easily damaged, especially while being transported (e.g., for off-site backups). Beginning in the mid-2000s, several drive manufacturers began to produce portable drives employing ramp loading and accelerometer technology (sometimes termed a "shock sensor"), and by 2010, the industry average in drop tests for drives with that technology showed drives remaining intact and working after a 36-inch non-operating drop onto industrial carpeting. The manufacturers do not, however, guarantee these results and note that a drive may fail to survive even a shorter drop.
Refs distributed appropriately throughout. The PC World article substantially post-dates the white papers and this formulation clarifies that a bit. In other words - "drives are getting better but they still can break for stupid reasons". The revision still gets in the general idea of what kind of accident hard drives might be expected to survive, but reinforces that the tests are aspirational, and not guaranteed. The manufacturers take pains to point this out. (This isn't like - I don't know, bulletproof glass that's guaranteed to stop the first 5 rounds shot from a .45 at 6 feet or (well, pick your product promise).) That continuing caution is also consistent with the 2017 version of the Wirecutter article you've mentioned, archived here, which captions a section, "Don’t buy a rugged portable hard drive", devotes it to telling how ruggedized drives failed to meet their manufacturers' representations, and concludes, "If you’re concerned about dropping your drive, you should consider a solid-state drive." Let me know what you think. JohnInDC (talk) 11:02, 31 July 2018 (UTC)
I'm so glad we have solved this cooperatively; thanks. I've made the change as you suggested, making only a minor grammar fix (two versions of begin/began in the same sentence was a bit much, so I eliminated one). IMHO we should consider whether to include a sentence on "ruggedized" HDDs, since they are a viable choice for readers with either larger-than-normal pockets and/or larger-than-normal safety deposit boxes. DovidBenAvraham (talk) 13:28, 1 August 2018 (UTC)
Yes, that's gratifying! Do you want to go to DRN and report that they don't have to throw any thought our way? As for ruggedized drives - do we have a source that makes the observation, or offers up the recommendation to hard-drive-backup-purchasers? I'd prefer that, to our harvesting information and presenting it here as an option that sprang from our heads, rather than reliable sources. JohnInDC (talk) 13:29, 2 August 2018 (UTC)
I've reported that to DRN, and asked that they withdraw the request—which they've done. As for "ruggedized" HDDs, thanks to your efforts on the Wayback Machine we have this section of the 2017 Wirecutter review as a ref. What the experimenters reported in that section is that Silicon Power's claims of water-proofness are total BS, but back-handedly concede that their two older comparatively-cheap drives survived at least 16 48-inch (yes, not 36-inch) drops onto something apparently harder than industrial carpeting. There's a new PC Mag review of rugged portable HDDs and SSDs, but its "What Exactly Makes a Drive Rugged?" section says "However, not all enclosures are designed for maximum shock resistance; a rugged drive might have a metal shell, to provide crush protection as well as some safety in case of a drop. As a result, you're mostly at the mercy of the drive vendor to tell you the rated maximum drop distance for the drive."—meaning PC Mag didn't do any testing.
Over the weekend I think I can put together a couple of sentences that summarize the "ruggedized" portion of the preceding paragraph, and point out that "ruggedized" HDDs are worthy of some users' consideration so long as large-capacity SSDs are so much more expensive. I promise there will be no footnotes. I have to go into the hospital for major back surgery on Tuesday 7 August, so I'll leave those sentences there for JohnInDC to revise or totally delete. DovidBenAvraham (talk) 05:03, 3 August 2018 (UTC)
Good luck with your surgery. As for ruggedized drives, I'd suggest nothing more extensive than a sentence clause (if it fits) like, "some manufacturers also offer 'ruggedized' portable drives, which may include a shock-absorbing case, and are designed to higher drop specifications". We don't need to observe that they're bulkier, or that they may not do all the things the manufacturers claim, or any of that. Just - "these things exist too". Punto. JohnInDC (talk) 12:06, 4 August 2018 (UTC)
WRT to your edit "revise first sentence to remove aside; rm second sentence. We don't need to editorialize or reflect upon the value of these representations": As for my "aside", I put that in to clue the reader that 6 out of the 10 "Best Rugged ..." listed at the top and bottom of the PCMag article are SSDs—so he/she should ignore them for the purposes of that paragraph. If you've go a ref for "The Best Rugged Rotating Hard Drives", please give me its URL—but pending that I've got to work with the ref I've got. As for my "editorializing", the "...you're mostly at the mercy of the drive vendor to tell you ..." in that second sentence was a direct quote from the same PCMag ref, and I thought that my "... and in for one manufacturer's set of 'rugged' hard drives in 2017 ..." second clause was an appropriate warning—not in the PC Mag article—for the reader to take with caution what the drive vendor tells him/her. How can it be "editorializing" if it simply recounts what happened with The Wirecutter's tests of Silicon Power's "rugged HDDs? DovidBenAvraham (talk) 01:38, 5 August 2018 (UTC)
We can accomplish the same end without lapsing into the narrative voice, and I've edited the text to reflect both the range of specs and to remind the reader that they're merely claims. JohnInDC (talk) 12:47, 5 August 2018 (UTC)
I've put The Wirecutter article back in strictly as a ref—no added text. If you hate that added reminder to the reader about "ruggedized" claims, please feel free to remove it. More importantly, I've added one sentence to the renamed (to include the term everyone uses nowadays as opposed to 2007) "SSD/Solid state storage" paragraph, because the already-used Forbes ref says over a period of years SSD's stability is less than HDD's. I've also added the already-used PCMag ref to the now-next-to-last sentence, because its top table gives concrete examples of SSD capacities and prices as of mid-2018—supplementing an academic allusion as of 2017. If you hate that too, please feel free to remove it. DovidBenAvraham (talk) 22:22, 5 August 2018 (UTC)
Thanks, yes, I should have left that in. JohnInDC (talk) 23:13, 5 August 2018 (UTC)

Sketch diagram

A computer sends its data to a backup server, at a scheduled time.

-Inowen (nlfte) 03:52, 12 October 2018 (UTC)

Keep it simple. In computer storage, the term "backup" may refer to, a copying of files from main storage to separate data storage media, or a copying of files from main computer to distant storage media (often as a third-party service). Depending on whether the backup media is "always-connected" or removable, backups can be scheduled, or left to user action. In the simple picture, backups these days are almost always done with either a thumb drive, or a send to a data backup server, which often times scheduled. -Inowen (nlfte) 04:01, 14 October 2018 (UTC)

So why is the clock, which indicates a scheduled backup run—which would certainly be in the "backup window", and a backup server in the diagram? I decided that the only place the diagram makes any sense in the article is at the beginning of the Backup#Enterprise client-server backup section, and with the label as I revised it. As an alternative I considering deleting the diagram, because IMHO it doesn't add anything to the text in the article. However I took pity on Inowen, because drawing the diagram must have cost him some appreciable effort. DovidBenAvraham (talk) 04:30, 21 October 2018 (UTC)

Source file integrity

Added "Backing up interactive applications via coordinated snapshots" paragraph, initially linking and referencing only to Veeam. This addition is long overdue, but it took me months to figure out how to tie it into the existing section rather than having to start a whole new section of the article. I'll improve it. DovidBenAvraham (talk) 03:32, 28 October 2018 (UTC)

Adding the "Backing up interactive applications via coordinated snapshots" paragraph required making it clear, in a parenthetical note in the page number for the ref for the Retrospect Windows 12 User's Guide, that "Snapshots" in that ref refer to the Retrospect term instead of the current Computer Science term. The CS term was introduced in UNIX systems sometime shortly after 2000, but the Retrospect term was introduced shortly after 1985. Both the Windows and the Mac variants of the Retrospect application code still use "snapshot" in its older sense. However the Retrospect Mac 8 User's Guide, copyrighted in 2011, eliminated the use of the term "snapshot" in documentation—probably to avoid confusion with the new CS term. DovidBenAvraham (talk) 20:16, 18 November 2018 (UTC)

The "Live data" subsection of the article discusses the basic problem for which the "Source file integrity" subsection provides solutions—solutions that are being actually used by applications. IMHO the reason why the "Live data" subsection doesn't provide any solutions is that none of its references is any later than 2008, and the Oracle refs—which I have flagged as dead—date from 1997. I thought seriously about updating the "Live data" subsection, because non-enterprise backup applications such as Arq now implement the same type of NTFS Open File backup capability that Retrospect/NetBackup/Backup Exec do. However I couldn't find an explanation of how Arq's implementation works, although I suspect that it is by the same kind of VSS snapshotting at a natural pause that Retrospect uses. Therefore I decided to leave the "Live data" subsection alone, other than adding a single-sentence paragraph at the end linking to the "Source file integrity" subsection. DovidBenAvraham (talk) 16:34, 26 November 2018 (UTC)

I found an Arq Blog article written by the principal developer that confirms my suspicion, so I inserted a ref to it in a new first sentence in the final paragraph—preceded by a ref to a Macworld review of Arq. Retrospect and other enterprise client-server backup applications don't seem to have the same problem with MS Outlook .ost files, probably because they wait for a natural pause before doing NTFS Open Files backup. DovidBenAvraham (talk) 21:27, 27 November 2018 (UTC)
This is way too far down in the weeds for a general article on computer backups. We don't need to know, or explain, the different ways in which particular backup applications handle, or don't handle, source file integrity issues specific to one or two personal computer applications. JohnInDC (talk) 21:32, 27 November 2018 (UTC)
JohnInDC fails to understand that my adding that final paragraph to the "Live data" subsection was a first attempt to deal with the out-of-dateness and confusingness of that section compared to the "Source file integrity" subsection. A summary of the first two paragraphs of that section would lead a reader to conclude:
  • Snapshot backup "... is hardly an effective backup mechanism by itself." "An effective way to back up live data is to temporarily quiesce them (e.g., close all files), take a snapshot, and then resume live operations."
  • Open file backup ".... In order to back up a file that is in use, it is vital that the entire backup represent a single-moment snapshot of the file, rather than a simple copy of a read-through." .... "Either the database file must be locked to prevent changes, or a method must be implemented to ensure that the original snapshot is preserved long enough to be copied, all while changes are being preserved."
But wait; how about "Hot database (online) backup" two paragraphs below? "... This usually includes an inconsistent image of the data files plus a log of changes made while the procedure is running. Upon a restore, the changes in the log files are reapplied ...." Doesn't that contradict the conclusion in the "Snapshot backup" paragraph? And how can the NTFS Open File backup capabilities of Retrospect and NetBackup and Backup Exec work—as they have been doing for years—if they don't obey "During a cold backup, the database is closed or locked and not available to users"? Sorry, to find out how enterprise client-server backup applications really work, a reader must also consult the "Source file integrity" subsection. And that's why I added that final paragraph to the "Live data" subsection—choosing to override "we don't link to 'further discussions' elsewhere".
How did the "Live data" subsection get so messed up? IMHO it's because its main source is two links to the notes of a 1997(!) University of Wisconsin lecture by Nina Boss. Those notes are only available at the Wayback Machine now, but Nina Boss has been Sr. Database Administrator at University of Wisconsin-Madison 1986 – Present, for 32 years. "I am an experienced oracle database administrator." Whoever wrote the "Live data" subsection also threw in a couple of refs to books copyrighted in 2003 and 2009, before Microsoft SQL Server and Exchange started really using VSS and open source database management systems started using Linux's snapshotting capability. Oracle Corp. doesn't seem to be so dominant (see the Revenue column) in DBMS anymore.
So what I propose to do, with JohnInDC's OK, is to temporarily put back in a revised version of that final paragraph. Maybe the ref to Arq's problem with an NTFS Open File backup feature, that wasn't implemented with the inactivity checking that Retrospect's similar feature (and no doubt NetBackup's and Backup Exec's) was, belongs in one of the other paragraphs in the subsection. In the meantime I'll figure out how to rewrite the subsection so that it gives enough correct information for a reader contemplating using a personal backup application, which is what it should do. And I await a comment from JohnInDC pointing out what WP rule says "we don't link to 'further discussions' elsewhere". DovidBenAvraham (talk) 07:33, 28 November 2018 (UTC)
Again. Way, way, way too far into the weeds. This isn't an essay, or an analytical piece, it isn't an exhaustive review of every possible backup approach, their advantages and drawbacks, and the precise technical way in which they each accomplish their tasks. This Wikipedia article is, should be, a survey of what reliable sources have said on the subject of backups. Is there a reliable source that weaves all these issues together in the way that you have - or are you Synthesizing the sources into a narrative that is your own, rather than that of the sources? All of this that you want to add, you could accomplish in about three sentences. "On a live computer system, files may change during the backup process and be recorded in an inconsistent state. There are a variety of strategies for ensuring source file integrity, including pausing the system or particular files, or recording files at different stages and then coordinating their states thereafter" or whatever; followed by refs. You could do the same kind of thing for snapshots, open file backups and the rest. A general description of what each is is sufficient - that whole long paragraph under "open file backup" can be pared back (which I've just done).
I don't know where to direct you to a "rule" about articles referencing themselves - we've had this discussion before. Go see WP:Tone for starters. Otherwise the most I'm inclined to do for now is to say that, I've edited or contributed in some fashion to literally thousands of different articles, and can reliably represent that it's not done. The article aren't essays, wherein we try to impose our own order or understanding or terminology on a field. We report what the sources say. There is no narrative voice in Wikipedia, no time when we say things like, "throughout this article, we will use the term X to mean thus-and-such", or "for further information, see the discussion at (linked article)". JohnInDC (talk) 12:07, 28 November 2018 (UTC)
WP:YOU. Also MOS:NOTE and Wikipedia:Manual of Style/Self-references to avoid. JohnInDC (talk) 14:23, 28 November 2018 (UTC)
Seeming copyvio issues aside, I'm surprised and dismayed that on the heels of my comment above about the proper Wikipedia voice - including relevant links - you would edit the section to read more like a manual, with first person references like, "'Interrelated database files', which we will use—for lack of a better term—in this subsection..." and commentary expressly directed at the reader like, "One must consider that the snapshotting process could take several minutes to snapshot a large file such as a database". Did you read the links that you asked me for, and which I provided? JohnInDC (talk) 16:52, 29 November 2018 (UTC)
As for any copyvio, it was a mistake on my part. I wrote that edit early this morning, as a combination of material that was currently in the subsection, material that had been in the subsection until it was deleted by JohnInDC yesterday https://en.wikipedia.org/w/index.php?title=Backup&diff=871018545&oldid=870983323 , and material that I wrote from scratch this morning. AFAICT the subsection was previously basically unchanged since around 2008; much of it appears to have been written based on the notes of a 1997 University of Wisconsin lecture by Nina Boss. If there's a copyright violation, it's likely to be related to that lecture note material—which I had thought Ms. Boss may have copied from Oracle Corp.'s 1997 documentation. It didn't occur to me that nevertheless she'd be entitled to copyright on it; if I can get Username_Needed to let me revert his/her reversion, I'll put in plenty of cites to the archived copy of her notes page. If JohnInDC spotted any other kind of copyvio, I'd really appreciate it if he'd let me know on this Talk page.
As for "One must consider that the snapshotting process ...", that was a one-word-revised copy of "one must consider that the backup process ..."—which was probably in the subsection for years. It didn't occur to me early this morning that JohnInDC wouldn't have like that "commentary directed at the reader" if he had seen it years ago; I'll fix it if if I ever get the chance (see the last two sentences of the preceding paragraph).
As for "'Interrelated database files'", how else does JohnInDC propose that I deal with the fact that Microsoft Exchange Server has interrelated files—including log files—just the way an officially-anointed database application does? The only alternative I can think of is a clumsily-worded addition saying that what's sauce for databases is sauce for MS Exchange Server etc.. DovidBenAvraham (talk) 20:06, 29 November 2018 (UTC)
You don't create and define a needless new term but rather use simple English words - "interrelated files" - and note that if one or more of them changes between the beginning and end of a backup, they may be stored in an inconsistent state. Here is the whole paragraph:
If a computer system is in use during backup, a file or files may change while the process is underway, resulting in a backup that is internally inconsistent and cannot be usefully restored. This is especially true for interrelated files, as may be found in a conventional database or in applications such as Microsoft Exchange Server. The term fuzzy backup can be used to describe a backup of live data that looks like it ran correctly, but does not represent the state of the data at a single point in time.
JohnInDC (talk) 20:22, 29 November 2018 (UTC)
As far as the asserted copyvio is concerned, Username_Needed reverted that reversion yesterday. I'm very glad,because it seemed as if there had been a recent merger of "let the bot determine who is a suspect" and "Justification? We don't need no stinkin' justification". I intend to make a bit of a stink about that.
Other than that, I'm reasonably happy with JohnInDC's edits. DovidBenAvraham (talk) 05:32, 2 December 2018 (UTC)
All the gory details of the asserted-and-later-withdrawn allegation of copyright violation can now be found at this section of my personal Talk page. The short version is even more shocking, if less Kafkaesque, than what I suggested in the comment immediately above. User:Username_Needed used a copyright-violation-detecting tool he/she didn't understand; what it actually detected was two reverse copyright violations in 2013. At that time a Spanish/English-speaking LA-County consultant copied verbatim—without attribution—into two of his own blogs most of the contents of the Backup article as apparently it was then. User:Username_Needed said—over a week later than 29 November—on his /her own Talk page ( comment which I have copied into the section of my Talk page) "I was using User:Crow's copypartol tool, which has quite a complicated interface, and I misread it. I have no idea if there ever was any copyvio, whether it has been removed and whether it is still in the revs." DovidBenAvraham (talk) 00:00, 9 December 2018 (UTC)
This wasn't that big a deal in the first place - particularly after the editor's self-revert, with the accompanying edit summary - and I think we can safely lay it to rest now. JohnInDC (talk) 00:10, 9 December 2018 (UTC)
You're way off the mark here, JohnInDC; it was a big deal to me. For over 18 hours, between the time User:Username_Needed did the original revert of my edit and the time he/she did the self-revert, I stood accused of a copyright violation without having been informed of what copyright I had violated. During that time I (temporarily, because of later events described below) decided to take at least a 3-month break from doing any WP editing. I also, in desperation, put the comment beginning "I am totally mystified by your reversion for copyright violation on 15:19, 29 November 2018‎ (UTC) of my edit ...." on his/her Talk page (which I later copied to my own Talk page). It took 16 hours after that comment for User:Username_Needed to do the self-revert, and another 7 days for him/her to put on his/her own Talk page a first feeble apology which basically says "I didn't know what the f**k I was doing. Sorry for my own confusion, which resulted in your confusion."
An hour after I replied to that apology with a further comment beginning "What does "elsewhere" mean? ....", he/she posted an explanation of his/her error which more-technically expands on "I didn't know what the f**k I was doing." Since Wikipedia runs like the legendary late-1960s self-regulating San Francisco "hippie" community, I think we can all agree that community shouldn't include mentally-still-juveniles running around waving a dangerous edged tool and later saying—only when pressed—"My parents never showed me how to use that tool without cutting anybody; sorry I used it the wrong way". DovidBenAvraham (talk) 02:20, 9 December 2018 (UTC)
As you like. I tried! JohnInDC (talk) 02:34, 9 December 2018 (UTC)
I'm sorry to belabor this, JohnInDC, but the "accompanying edit summary" you refer to above says—in toto—"Reverted 1 edit by Username Needed (talk): SelfRev- misidentification." What useful information was that supposed to communicate to me? DovidBenAvraham (talk) 04:12, 9 December 2018 (UTC)
That the text you added was not a copyvio and had been mistakenly identified as such. Had it been me, I'd have then gone to my Talk page, removed the copyvio warning and maybe left the self-revert diff in my edit summary. Mistakes happen, case closed. JohnInDC (talk) 13:37, 9 December 2018 (UTC)
"SelfRev- misidentification." could have meant one of several things as of 30 November. (1) Least likely: Username_Needed could have discovered a different article had a copyvio, and misidentified it as Backup.
(2) More likely: Username_Needed could have discovered a different subsection in the first 7 pages of the article had a copyvio, and misidentified it as the subsection I had just edited. This seemed plausible, because I knew that the first 7 pages—except for the "Storage media" subsection—had not been substantially edited since around 2011—and I didn't know what copyvio-detecting tools had been in use back then.
(3) Inconceivable to me on 30 November: What actually happened was that Username_Needed was using CopyPatrol—with so little experience with the tool that he/she couldn't even spell its name in a comment 7 days later—and misidentified a WP-to-blog copyvio (I'm calling it a "copyvio", even though I don't know the legality of making an un-credited copy of a WP article) as a blog-to-WP copyvio. Once I figured out the CopyPatrol GUI, I knew immediately by the blog dates that Sr. Damicelli must have copied the version of the WP article existing as of April 2013 rather than the reverse. But Username_Needed didn't have my knowledge of the WP article, and didn't think to look in its View History for an edit date earlier than April 2013. That's what I characterized above as "mentally-still-juveniles running around waving a dangerous edged tool and later saying—only when pressed—'My parents never showed me how to use that tool without cutting anybody; sorry I used it the wrong way'".
What occurred to me was that I should at least initiate a RfC on this dispute. But that's likely to produce only the "Mistakes happen, case closed" response that I've already gotten from JohnInDC above. IMHO what's needed at the minimum is an administrative ruling that Username_Needed is only allowed to use Earwig's Copyvio Detector, not CopyPatrol, until he/she receives further training and demonstrates that it has been absorbed. But that would not be the correct systems solution, even for "mentally-still-juveniles" like Username_Needed. Earwig's Copyvio Detector didn't detect the reverse copyvio, but that may simply be because Earwig's Copyvio Detector doesn't look at non-institutional blogs such as Sr. Damicelli's. OTOH Earwig's Copyvio Detector may have coding to detect reverse copyvios and ignore them (comparing in only one direction and seeing which file creation date is earlier are two simple-minded techniques that occur to me); maybe CopyPatrol should have similar coding added. What Dispute Resolution method should I initiate to make sure the systems problem gets considered?
Speaking of WP systems,wouldn't my removing the copyvio warning from my Talk page have left my "handle"" in some centralized database of WP editors who have been previously warned? I don't want a future (no doubt unintentional) copyvio by me to be turned into an immediate ban. DovidBenAvraham (talk) 21:30, 9 December 2018 (UTC)
I don't know how much more clearly I can say it: This is trivial. It matters to no one except you. Also - no matter how steamed this episode may leave you, you've got to avoid personal attacks and name calling. It inclines others against you, not for you, particularly since this has been pointed out to you before. JohnInDC (talk) 21:51, 9 December 2018 (UTC)
I changed "mental juveniles"to "mentally-still-juveniles" in several comments; I hadn't meant to say that they were "mental" in the sense of being of unsound mind. Maybe that was the "name calling" JohnInDC is referring to. DovidBenAvraham (talk) 19:59, 10 December 2018 (UTC)
I note your adjustment here. If you're going to bother at all, you need to do better than that. There's not a hair's breadth of difference between calling someone a "mental juvenile" and "still-mentally juvenile". Both are comments directed at the person instead of at their edits. I've seen worse characterizations than this, and these aren't going to earn you more than a bit of disapprobation from anyone else who notices them, but I meant it when I said below that you should review Wikipedia:Don't call a spade a spade - you don't seem to have absorbed the point of that essay. Just - stop disparaging people. It's not hard. JohnInDC (talk) 20:36, 10 December 2018 (UTC)
I should add, don't keep tweaking it in order to make me happy. It's not that egregious, I've said my piece, and it's not like I'm going to turn you in. It's just that - you don't seem to be able to resist these embedded little insults, and I'm not sure what you suppose you gain by them. Now. I think I'm done with this issue. JohnInDC (talk) 22:04, 10 December 2018 (UTC)
Nevertheless if you are determined to escalate this matter, try Wikipedia talk:Copyright problems. I have no idea if that's the right place but someone there is bound to be able to point you in the right direction. JohnInDC (talk) 21:57, 9 December 2018 (UTC)
Thanks, JohnInDC, for your suggestion of Wikipedia talk:Copyright problems. I did a brief search of its Archives earlier tonight; there is plenty of discussion of reverse copyvios, but only one brief announcement of the availability of CopyPatrol last year. A charitable view of what Username_Needed did is to use a new tool having expanded capabilities with which he/she was unfamiliar, and to encounter a really-audacious reverse copyvio—at that point failing to check the View History for an edit date earlier than April 2013 as he/she should have done to find out in which direction the copvio was done. I think my name-calling is justifiable in this case, mostly because of Username_Needed's week of delay before admitting what he/she had done, but I'll report the case in Wikipedia talk:Copyright problems and let someone else deal with that editor. DovidBenAvraham (talk) 05:02, 10 December 2018 (UTC)

Perhaps you should take another look at WP:NPA and WP:CIVIL and see if you can find the part in either that excepts cases where one editor thinks another editor's work is particularly substandard. After you fail in that, go read Wikipedia:Don't call a spade a spade. The rules are simple and clear and you've been admonished about failing to follow them already. JohnInDC (talk) 12:20, 10 December 2018 (UTC)

I created a new section on the page JohnInDC suggested. After I added a comment giving an "executive summary" of what another editor had complained was a "wall of text", I was responded to by that same editor with a comment giving an invalid analogy and a mis-characterization of my proposed solutions. That comment was intended to make me walk away from the discussion, and for 4 days I considered doing so. Then I created a subsection of that section, and began it with a comment explaining again my proposed solutions—but without naming any other editors. Possibly because I was showing that I was not giving the matter up but was avoiding assigning any blame, that comment elicited only conciliatory responses by the other editors concerned.
I have since apologized—on his/her Talk page—to the editor who made the original erroneous copyvio determination, because I had come to the conclusion that the error resulted from a combination of his/her using an unfamiliar (and IMHO still imperfect) tool and—due to other commitments—not having the time to examine the result of using that tool. DovidBenAvraham (talk) 07:28, 22 December 2018 (UTC)

Is Rmokadem's added reference in the "Performance Impact" paragraph of the "Limitations" subsection spamming?

(adapted from a section I put into Rmokadem's Talk page)

I noticed his 23 February 2019 addition to the "Performance Impact" paragraph of the "Limitations" subsection. To be frank, I don't understand the relevance of that added reference to the text of the subsection—but then I only have a non-PhD-track Master in Computer Science degree. I see that his contributions on 23 February to other WP articles also consist of references to academic papers of which someone with the name Riad Mokadem is a co-author.

Is he just spamming non-relevant references to those papers into WP articles to promote his career? If he doesn't explain the relevance of his added reference in this section of the Talk page, I'm going to revert his edits to the article.

His "Disk Backup Through Algebraic Signatures in Scalable Distributed Data Structures" paper says in the Abstract that it is about backing up the RAM on each storage node onto the local disk. However the Backup article's lead is fairly clear—and I've now made its first sentence clearer—that it is about backing up data already on disk into an archive file. That is why I don't think his article is relevant. DovidBenAvraham (talk) 07:15, 25 February 2019 (UTC)

I moved the Rmokadem ref in this article to the Clustered_file_system article, which is where it seems to be applicable. DovidBenAvraham (talk) 05:20, 3 March 2019 (UTC)

"Archive file" as a term used in this article

The old lack-of-common-terminology problem in this article , especially in the "Enterprise client-server backup" section, has surfaced again. I added, as the second paragraph in the lead of the Archive file article, the following (I have changed the formatting of the refs): "The word "archive" in this term of art goes back in computing at least as far as Multics However that use of the word conflicts with its non-computer definition: "A place or collection containing records, documents, or other materials of historical interest: old land deeds in the municipal archives". What is stored in an archive file is not necessarily old or of historical interest. Consistent with the non-computer definition, several enterprise client-server backup applications—which do not use the term "archive file"—use the term "archiving" to describe a backup operation that deletes data from a client source once the data's backup is complete.[3][4][5]".

JohnInDC didn't like that paragraph; he deleted it with the Edit Summary "Rv GF edit - article is about what an archive file is, not what it isn't; this doesn't belong in the lead in any case". I had put the paragraph in that article because its first two sentences would IMHO be just as intrusive as the second paragraph in the lead of this article. AFAICT this article is the only one in Wikipedia to use "archive file" to mean the destination of a backup—rather than as a file formatted in special ways that go beyond the features in the OS's normal filesystem. The creators of Multics needed a jargon term to distinguish that special formatting, so they concocted "archive file"—disregarding a couple of thousand years of use of the word "archive" in its non-computer sense.

The jargon word "archive" persists in computer systems mainly abbreviated as 'ar' in filename extensions such as .tar and .jar. However AFAICT no backup applications use "archive file" as the term for the destination of their backups. The old version of Code42's CrashPlan application used to use the term when CrashPlan could back up to a folder on a computer or an external drive, but the term was apparently dropped when CrashPlan was limited to backing up to cloud destinations. The term is also used in MS Outlook, and in a securities application named TradeStation, but neither of these is a backup application.

Moreover the destinations for backups by Apple's Time Machine, a backup application that is very widely used and is mentioned in this article, are not specially-formatted files but normal macOS filesystem files. DovidBenAvraham (talk) 06:20, 27 December 2018 (UTC)

There's no confusion. An "archive" is a place for old documents, records, etc. of historical interest. An "archive file" - a different term - is where backups or other data are stored, whether or not they're important, or historical, or what have you. Twain said, “The difference between the almost right word and the right word is really a large matter. ’Tis the difference between the lightning bug and the lightning.” Which is true; but it's funny because in everyday use, people know the difference (and don't need it pointed out to them). JohnInDC (talk) 15:48, 27 December 2018 (UTC)
What you mean is that in your opinion there should be no confusion. However I think you are failing to consider that at least the first 7 pages of this article are supposed to be written for readers who are not already familiar with computer jargon, and the third paragraph of my comment that starts this section adequately demonstrates that "archive file" is sparsely-used computer jargon.
If that concept is difficult to understand, think about a reader of your Twain quotation who has grown up in a big city and has never seen a "lightning bug". I grew up in the suburbs, but I've lived in Manhattan for 50 years—which is less time than since my brief encounter with the predecessor of Multics—and I don't think I've ever seen a "lightning bug" there. If you look at references 24 and 25 for the article linked to in the first sentence of this paragraph, you'll see (my Portuguese is sketchy, but my knowledge of French enables me to get the general sense of reference 24) "Our findings suggest that light pollution is likely to adversely impact firefly populations ...." per reference 25. It may not be the result of light pollution, but AFAICT "archive"—in its non-computer sense—is likely to be much more familiar to the average reader of the article than "archive file".
That's why I intend to turn the first through third sentences of my rejected Archive file paragraph into a note following the term "archive file" in the first sentence of this article. I'll leave out the fourth sentence, since that's already in a note in the lead of the "Enterprise client-server backup" section of the article. DovidBenAvraham (talk) 21:45, 27 December 2018 (UTC)
This isn't Wikitionary. We don't have to define ordinary words for ordinary users. Please don't clutter this, or any other, article with this superfluous clarification. We don't need it any more than we need a note saying that the "lightning" in "lightning bug" isn't actually electrical or dangerous. Even though many of our readers are from cities. JohnInDC (talk) 22:07, 27 December 2018 (UTC)
Ah, I see you went ahead and added the note anyhow. I disagree that it's needed at all; but if it's going to be included it only needs to make the simple point that "archive files" aren't an "archive". We don't need to know about Multics or any of the rest of it. JohnInDC (talk) 23:22, 27 December 2018 (UTC)
Your simplification is fine; it gets the necessary distinction across. Thanks, and Happy New Year. DovidBenAvraham (talk) 01:11, 28 December 2018 (UTC)

I've been wondering for weeks what idiot 😉 introduced the term "archive file" in the first sentence of this article's lead. I just searched View History, and the idiot was me on 03:24, 13 May 2018 (UTC). Since we seem to be stuck with "archive file" as the best non-proprietary term for what a backup is copied into (I'd prefer "media set", but only Retrospect and MS SQL Server use that term), I've changed other terminology to that wherever appropriate—mostly in the " Enterprise client-server backup" section. DovidBenAvraham (talk) 04:29, 2 January 2019 (UTC)

‎Changed "data repository" to "archive file" in the "Manipulation of data and dataset optimization" section only; in other sections "data repository" means something larger than a single archive file. BTW, that "something larger" in this article doesn't correspond to what's defined in the "Data repository" article; however I'll let someone else deal with that problem—which is another example of the unreferenced received wisdom as of 2007 which pervaded the first 7 pages of this article. DovidBenAvraham (talk) 05:48, 15 January 2019 (UTC)

What I said about Time Machine in the last paragraph of the first comment in this section turns out not to be strictly true. Further research shows it depends on local destination disks allowing hard links to directories, which is a not-standard-Unix capability of Apple's old HFS+ filesystem but not of its new APFS filesystem. So I guess you could say that Time Machine, too, uses specially-formatted "archive files" as local disk destinations. DovidBenAvraham (talk) 16:33, 23 April 2019 (UTC)

In the article, I globally changed the term "data repository" to "information repository". When Austinmurphy inserted "data repository" in this article around 2006, he couldn't have known that it would later be made a synonym for data library—apparently around 4 July 2017 by JakobVoss. What Austinmurphy did in 2007 is a textbook example of Making Stuff Up as it used to be done on WP. The two refs he gives under Backup#Managing the information repository do not, as far as Google Books will let me see them, use the term "data repository". DovidBenAvraham (talk) 16:09, 29 April 2019 (UTC)

"Information repository", too, turns out to be an example of Making Stuff Up in 2007—for the apparent purpose of selling a (proposed at that time?) product. Both references in that article are dead, but I looked them up on the Wayback Machine. The first ref is just a session listing on a conference agenda, but the second one actually gives an session agenda detail. The session instructor is "Mark Armstrong, President, SoleraTec". The editor who created that article is User:SoleraTec, who has since had an article deleted "because the article appears to be a clear copyright infringement." And would you believe that SolaraTec LLC sells surveillance-oriented products, which are "based on the Phoenix Information Repository [my bolding], an active, tiered secondary-storage environment comprising a mixed set of storage resources"?
Maybe it would add to the value of the link from this article if I add a lead paragraph to the information repository article, quoting and referencing this 2005 definition of "repository" by Margaret Rouse. Do you think Soleratec LLC and/or Mark Armstrong—who is its founder and CEO—will object if I do that? DovidBenAvraham (talk) 17:31, 30 April 2019 (UTC)
I added that lead paragraph to the information repository article, and so far have had no objection. Let me clarify that nobody seems to have a better term than "information repository" for the unstructured/full-imaging/incremental/differential/reverse-delta/CDP organizing strategy described in the information repository models subsection of this article. The Kissell ref talks about "varieties" of "archives", but that would be confusing considering the way this article uses the term "archive file". DovidBenAvraham (talk) 03:48, 5 May 2019 (UTC)
I re-arranged the Information repository models sub-section lead paragraph for clearer distinction between repository organization and backup rotation scheme, and inserted or substituted "organization" in each of the type paragraphs. There doesn't seem to be any other group noun for the unstructured/full-imaging/incremental/differential/reverse-delta/CDP organizing strategy, and "strategy" didn't seem to quite fit. DovidBenAvraham (talk) 10:50, 9 May 2019 (UTC)
Now that I've inserted or substituted "repository organization", it would be very easy to change "organization" to "model", which may be what Austinmurphy wanted back in 2004. "Model" sounds even fuzzier to me; what do you editors think? DovidBenAvraham (talk) 00:27, 11 May 2019 (UTC)
I substituted "backup method" for "organization" in most occurrences in Information repository models, thus shifting to more-standard terminology and avoiding confusion between sense 1 of the noun "organization" as used in this section and sense 2 as used elsewhere in the article. Feel free to revise this edit, as I'm still struggling with lack of standard terminology. DovidBenAvraham (talk) 01:21, 18 May 2019 (UTC)

should a new-to-subject editor be allowed to damage two-audience-level structure of article because of his urge to "simplify"

Should a new-to-the-subject editor be allowed to damage the two-audience-level structure of the "Backup" article simply because of his irresistible urge to "simplify"?

The title of this section is Request for comment on whether new-to-subject editor is allowed to damage two-audience-level structure of article because of his urge to "simplify". DovidBenAvraham (talk) 05:51, 13 June 2019 (UTC)

Mainly between 2007 and 2011, the first 7 screen-pages of the article evolved as a comprehensive summary of what every computer-using person should know about backing up his/her data. In November 2017 I moved the description of certain backup features from another article to a new 2-screen-page "Enterprise client-server backup" section at the end of the article. The lead of that section clearly says it is about "a class of software applications that back up data from a variety of client computers centrally to one or more server computers, with the particular needs of enterprises in mind." The section goes on to describe special features typically incorporated in that class of applications, with an explanation of the enterprise need for each feature.

On 21 May 2019 an editor new to the article, whose contributions list since August 2016 shows no edits to articles dealing with IT less than 25 years old, did a "cut-and-paste move" of the "Continuous data protection" article into the "Backup article. The Talk page for that article shows there was no previous discussion of the "merge", and the only non-bot comment on that Talk page said in 2011 "It would be good to have a section discussing real-world implementations of CDP: which companies provide such a service, which tools they use to provide it, etc.". I pointed out here in this Talk page that most backup applications that say they do CDP really do near-CDP via incremental backups every few minutes, but the new-to-the-subject editor reverted the edits I had done after his "merge" that pointed out this "inconvenient fact".

On 22 May 2019 the same editor new to the article did another "cut-and-paste move" of the "Information Repository" article into the "Backup article. Again the Talk page for that article shows there was no previous discussion of the "merge", and this time the new-to-the-subject editor immediately deleted the entire "merged" article contents—except for the new lead I had added on 1 May. I have copied the body of the "Information Repository" article here in this Talk page; you can see that the new editor deleted the article body because it had nothing to with backup. Couldn't the new editor have been satisfied with what I did to the "Backup" article on 1 May, which was to put in a link to the "Information Repository" article in order to use the backup-related lead I had just added to that article?

The fact that he wasn't satisfied is why I have written the two preceding illustrative paragraphs. Together they show that the new editor has an IMHO irresistible urge to simplify several Wikipedia articles into a single one. He carried this urge further on 22 May 2019 when he "merged" two paragraphs from the "performance" sub-section of the "Enterprise client-server backup" section forward into the personal sections of the article. I have copied the original versions of those paragraphs here in this Talk page; you can see that the feature descriptions in both paragraphs explain their importance to enterprise backup administrators. The new editor promptly simplified those moved descriptions so that they would fit into the worldview of a reader needing to know about personal backup applications, and deleted my attempts to add clarified versions of the original feature descriptions back into the "Enterprise client-server backup" section. You can see that the personal backup versions of the descriptions of the two features, here and here, have been pruned of so much information as to be essentially useless.

Since the new-to-the-subject editor has previously contributed to WP articles about IT hardware and software used 25 years ago by enterprises, he surely has some basic understanding that the backup needs of an enterprise are more extensive than those of a personal computer user—for legal and business continuity reasons. We could split off the "Enterprise client-server backup" section into a separate article. But it's highly probable that, based on what the same editor did to the cluster of "Outsourcing"-related articles as described in another section of this Talk page,the new-to-the-subject editor would in a week or two succumb to his urge to "merge" the split-off "Enterprise client-server backup" article into the "Backup" article—repeating the same "simplifications". DovidBenAvraham (talk) 05:51, 13 June 2019 (UTC)

  • Worst RfC Ever - This RfC is sooooooooo bad. Seriously dude.... take a chill pill. You gotta relax. You'd find RfC's are much more effective if you simply and neutrally state the question. You could have done this entire RfC by simply asking "Should 'Continuous data protection' be split into its own article?". Instead, you spent several paragraphs raging out at someone that no one cares about. Just chill. NickCT (talk) 19:51, 13 June 2019 (UTC)
I totally agree, but my previous RFC attempt was even worse—because I didn't suppress my rage. My ideal question would be "Should 'Enterprise client-server backup' be split into its own article, and—if I do that—can you editors lay down a set of comments that will persuade 'new-to-the-subject editor' not to merge it back in to 'Backup' and dumb it down again?" The problem is that, IMHO for a combination of psychological and cultural reasons, "new-to-the-subject editor" simply won't listen to anyone's comments. I initiated a 3O, and he simply refused to respond to the Third Opinion editor. I'm hoping an RfC will have more influence on him, but I'm reluctantly prepared to go to Administrator's Noticeboard or Arbitration. I'd prefer not to get "new-to-the-subject editor" banned, because I think contributing to WP is an important part of his life, but I think that some of his conduct in connection with this and other articles would support doing so. As you can see it's a very tricky situation—which WP no longer permits dealing with directly via an RfC, and I'd appreciate any advice you editors can offer. DovidBenAvraham (talk) 21:08, 13 June 2019 (UTC)
  • Retry RfC? I'd be glad to give my opinion like many other wikipedians if this RfC were to be done right. Its too inconveniencing for anyone who wants to help. However it sounds like this editor has a WP:ITSCRUFT problem perhaps. Information relevant to backup should stay on backup and it can refer to other articles with useful information if necessary. --NikkeKatski [Elite] (talk) 15:38, 15 June 2019 (UTC)
@DovidBenAvraham: Yeah as pointed out by redrose you should recreate the RfC and make it more neutral and aimed at the actual article rather than the person. If we gain consensus for the obviously superior your version of the article then any attempts to revert it can probably be considered edit warring (if it wasn't considered that already) and would be more easily punishable. --NikkeKatski [Elite] (talk) 15:57, 15 June 2019 (UTC)

Removed the template from this sub-section; my third try is below in a new sub-section. NikkeKatski [Elite], based on what the other editor did on 22 May 2019, as described in this same subsection in the paragraph beginning "The fact that he wasn't satisfied ...", the other editor again wouldn't let a version of the article improved by me exist long enough for any of you editors to see it—that's just the way he has demonstrated he operates. DovidBenAvraham (talk) 00:04, 16 June 2019 (UTC)

Rewrite of Continuous data protection sub-section by User:Pi314m

Yesterday User:Pi314m merged the former Continuous data protection article into this one. He thereby wiped out a separate article without any prior discussion on that article's Talk page. I believe that's a violation of WP rules, and I intend to be up his tuchus about that.

But what really bothers me is that, after I spent about 5 hours editing his inserted sub-section into early this morning, Pi314m reverted all my editing. My editing was necessary because the Continuous data protection article left out an inconvenient fact about many recent "CDP" backup applications, was poorly worded in places, and had references from 2007 - 2012 that were basically marketing blurbs for software that no longer exists—in one case written by a marketer whose software company went out of business after an uncontested fine for bribery.

The inconvenient fact is that many recent backup applications that call themselves "CDP" are really "near-CDP", meaning that they are actually doing incremental backup at short intervals to track changes—well-known examples being Apple Time Machine and CrashPlan. My editing included a paragraph that revealed the inconvenient fact, but Pi314m reverted that paragraph out. Instead he substituted "An alternative is snapshots, a bear-continuous [sic] solution, whereby restore points are periodically created to track changes", which links to a non-existent sub-section of the article using a non-standard definition of "snapshots".

As to poorly worded in places, let's first consider "Ideal continuous data protection is that the recovery point objective is unlimited in content". My edit changed that to "In true CDP the recovery point objective is zero", which is consistent with definition in the WP article I linked to. Let's next consider "CDP differs from RAID, replication, or mirroring by enabling rollback to any point in time. A related technique is journaling." My editing changed that to "CDP is often done by saving byte or block-level differences rather than file-level differences, making it dependent on journaling."

The references that were basically marketing blurbs included those by Bezad Behtash, Posey, and the infosectoday article by the uncredited Pat Hanavan. The eWeek article 's author, Bobby Crouch, was in a class by himself; in 2010—when he wrote the article—he was the Product Marketing Manager at FalconStor Software, which in 2012 agreed to pay $5.8 million in fines for bribery. The company was further charged with falsifying its corporate books and records associated with the bribery. Reliable sources, indeed! DovidBenAvraham (talk) 19:57, 22 May 2019 (UTC)

As to Continuous Data Protection, my 17:42, 6 June 2019 (UTC) comment below is a later and clearer explanation of how, and IMHO why, Pi314m messed up my revisions after his initial merge-in. DovidBenAvraham (talk) 02:08, 7 June 2019 (UTC)
Here's the procedure that Pi314m should have followed. I think the merger itself was uncontroversial, so it needn't have been discussed on the merged-in article's Talk page. It's Pi314m's wholesale reversion of my edits afterwards that should have been discussed on this Talk page. In the first paragraph of my section-starting comment I've summarized 3 types of fault in the merged-in article—ones that my edits attempted to correct. I've now discovered a 4th problem with Pi314m's post-merger edits; he moved the pre-existing "Create synthetic full backups" paragraph from the "Performance" sub-section to a new named paragraph in the "Incremental" sub-sub-section. Pi314m evidently didn't understand "from one archive file to another" in the first sentence of the paragraph he moved, which explains why the paragraph—with a clarification of its second sentence that keeps the refs—belongs underneath the "Enterprise client-server backup" section. The single-sentence paragraph just above the paragraph, in its current position, describes an operational technique used by such non-enterprise backup applications as Apple's Time Machine for condensing a single archive file. The moved paragraph OTOH describes an enterprise backup administrator facility for creating a copy of an archive file, such as a longer-term tape copy of a disk archive file. This second copy is typically created to satisfy legal retention requirements, and may therefore intentionally omit some backups—either because there is no need to retain them or because retaining them would violate regulations such as the European GDPR Right_to_erasure. I intend to move a clarified version of the moved paragraph back to the "Performance" sub-section, while enhancing the single-sentence paragraph just above it in the "Incremental" section to explain that it refers to an operational technique for condensing a single archive file. DovidBenAvraham (talk) 00:21, 26 May 2019 (UTC)
I did what I said I intended to do in the last sentence of the preceding comment, but Pi314m promptly messed that up by moving my clarified version of the paragraph from "Performance" under "Enterprise client-server backup] up-article to Synthetic full backup" under "Storage, the basis of a backup system". This shows conclusively that Pi314m hasn't read enough of the article to understand one basic thing: the first seven screen pages were written starting around 2007 for a person who needs to know enough to set up backup for his/her individual computer, but the last 2.5 pages were written—primarily by me—for a person who needs to set up backup for his/her enterprise. So Pi314m shouldn't have moved the clarified paragraph from the back to the front of the article, but he tried to make up for that with a cutesy-poo trick: he wrapped the enterprise-applicable part of the moved paragraph in "ref" tags—which didn't identify it as a Note because he omitted "Group=note" from the lead tag. His having not read the article is further demonstrated by his beginning that moved paragraph with "Tapes of disk archives ..."; if he had read the first sentence of the article he would have seen that it establishes "archive file" as the term—used consistently throughout—for the output of a backup, and "Tapes of disk ..." sounds like Pi314m is still mentally stuck in the days of IBM System/370. And, BTW, Pi314m totally wiped out the "Automated data grooming" paragraph because he couldn't logically move it up front under "Backup types". DovidBenAvraham (talk) 05:05, 27 May 2019 (UTC)
I've put a copy of the original "Create synthetic full backups" paragraph in front of the copy of the original "Automated data grooming" paragraph that was already in this Talk sub-section below. I did it that way in order to have only one place for Notes and References. DovidBenAvraham (talk) 17:19, 28 May 2019 (UTC)

I just discovered another thing that Pi314m did that's definitely a violation of Wikipedia rules. He "merged" the first paragraph of "Information Repository" into this article (before "Backup types", from which which he later deleted the "Unstructured" paragraph), and then deleted that entire article. As you can see from this previous version, and also from the WikiVisually copy (made before I demoted the previous contents to a Federated Information Repository section with modernized refs and added a new lead—which is the only part that Pi314m kept), Pi314m wasn't entitled to delete the article under rule 4 of the Wikipedia:Deletion policy. That's because IMHO the article does have "relevant or encyclopedic content", even though it describes a system that is a superset of what SoleraTec had developed by around 2008. Pi314m's tuchus is likely to be very populated, especially after I inform SoleraTec LLC of what Pi314m has done. DovidBenAvraham (talk) 06:12, 26 May 2019 (UTC)

I've put a copy of the edited-out "Federated Information Repository" section of the "Information Repository" article in back of the copy of the original "Automated data grooming" paragraph that was already in this Talk sub-section below. I did it that way in order to have only one place for Notes and References. DovidBenAvraham DovidBenAvraham (talk) 20:43, 28 May 2019 (UTC)

I'm about fed up with pi314m's "my way or the highway, even if I don't understand what I'm editing and violate WP rules" approach to this article. If I don't get a response from him by 3 p.m. EDT this afternoon on this Talk page, I'm going to file for a 3O. DovidBenAvraham (talk) 05:30, 27 May 2019 (UTC)

This is not a response to the anatomy-attacking and other threats, but just to highlight that "Automated data grooming" (which perhaps I should have worked on earlier) is now ahead of "Consolidation." Explanation? The flow/sequence is now Deletion ("Automated ..."), then consolidation, followed by compression, etc. Pi314m (talk) 07:11, 27 May 2019 (UTC)
Sorry, but Pi314m's messed-up response (including refs that aren't to the articles intended) is not nearly good enough to make me put off the 3 p.m. deadline. He seems to have a conceptual problem with the basic sequence of the article, which has been—for over 1.5 years—features needed for individual backup (first 7 screen pages) followed by features needed for enterprise backup (last 2.5 screen pages). For a reason I can't understand, Pi314m has decided "Automated data grooming" is a feature needed for individual backup—which AFAIK it isn't and is thus absent in its formerly-described form from applications intended for that purpose.
DovidBenAvraham (talk) 12:54, 27 May 2019 (UTC)

In creating the promised Third Opinion Active Disagreement statement tonight, I discovered that what had I described in this section's beginning paragraph as a "cutesy-poo thing" is not that at all. It is instead a simply a repeat of what's on a WP user's Talk page. I have therefore revised that beginning paragraph to eliminate some some snark, while leaving in the phrase "up his tuchus"—referring to my questioning what a user is allowed to do via the moving-an-article facility (we'll have to let the 3O sort that question out, particularly in regard to Pi314m's subsequent move of "Information Repository" which wiped-out all but the lead two sentences of that article). I belatedly apologise to pi314m for my unjustified snark. DovidBenAvraham (talk) 04:05, 28 May 2019 (UTC)

I accept the words "I belatedly apologise to pi314m" as is, and don't see need for "belatedly" Pi314m (talk) 07:59, 28 May 2019 (UTC)

Early this morning I revised the 3O description of the dispute. I've replaced Pi314m's and my "handles" with "Editor #1" and "Editor #2", and made it—I hope—a bit more dignified and less whiny. DovidBenAvraham (talk) 15:14, 29 May 2019 (UTC)

In regard to "automated data grooming", Pi314m's crypto-Note "usually implemented as a customizable feature" is an incorrect and totally inadequate substitute for the descriptive paragraph that was in the "Performance" sub-section—which you will find I've copied into this Talk sub-section below. Personal backup applications usually don't have this as a customizable feature (CrashPlan was an exception, but that turned out to be a designed enterprise "push" application that for a few years was also marketed as a personal backup application). OTOH enterprise backup applications have to have this as a very customizable feature, because each enterprise has its own "regulatory requirements"—as stated in the descriptive paragraph Pi314m wiped out in creating the inadequate sentence in the front part of the article. If you want to know what "very customizable" means, read the Kaczorek and Jain and Dorion references for that paragraph. DovidBenAvraham (talk) 02:57, 3 June 2019 (UTC)

We haven't had a reply from Pi314m yet, and my thoughts turned to Windows File History—which someone recommended yesterday on an Ars Technica thread. Let's look at the key reference for that WP article sub-section. It says "... its closest analog is Mac OS X’s Time Machine .... The basic function of File History is to periodically [my emphasis] back up your Libraries (your documents, music, pictures, videos) to another hard drive. These backed up files are saved as versions, which you can easily browse through and restore with a couple of clicks". That reference goes on to say "By default, File History backs up a version of your files every hour. If you head into Advanced Settings, you can change this to a value between “Every 10 minutes” and Daily; personally, I opted for every 10 minutes (and even then, it would be nice to have an option for every 60 seconds — maybe it’s possible via a registry hack)." BTW Apple's Time Machine backs up once an hour, and only an independently-written add-on can change that.

So that covers the two most-readily-available examples of "continuous data protection" backup software; we see that neither of them "allows restoring data to any point in time". In my 06:51, 22 May 2019 (UTC) edit to Pi314m's original merge-in, I wrote (refs omitted) "However, because of the performance penalty imposed by necessary tight integration with the filesystem, a frequently-encountered alternative is near-CDP (often wrongly referred to as "CDP"), wherein restore points are created at short intervals to track changes. Nevertheless, given the proper precautions for live data, changes captured by near-CDP can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, databases and logs."

My edit to the first part of the "Backup" article is certainly sufficient for the user of a personal backup application, and it includes a caution needed by the user of an enterprise backup application. But Pi314m reverted that edit, leaving "Continuous data protection ... refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user saves. It allows restoring data to any point in time"—referenced by what I have characterized up-section as "marketing blurbs". This is a blatant example of what I said in the Talk section below this, which is "The overall picture that emerges is of Pi314m deciding without any discussion to consolidate a whole series of related articles into a single article that conforms to his concept of the subject matter." DovidBenAvraham (talk) 17:42, 6 June 2019 (UTC)

[Copied, with changes from "you" to "Pi314m", from a portion of a 05:39, 2 June 2019 (UTC) comment I made here on Pi314m's personal Talk page] Pi314m also seems to have an strong urge to merge and simplify descriptions, but accompanied by a willingness to sacrifice the precision of those descriptions. An example is what Pi314m did for the "Continuous_data_protection" subsection of the article. The reason I called the references there "marketing" is that they all basically say "it's nice to have backups at more frequent intervals than is normally done with scheduled scripts", but they don't talk about any performance hit. But if you look at the 2017 Carbonite reference Pi314m left in (Mozy has been merged into Carbonite), it says "we noticed no performance hit at all while using Carbonite to back up about 0.5GB worth of frequently-changing files ... That's probably because it's not actually done in real time, just on a tight schedule (okay, so maybe there is scheduling): 10 minutes if a file is saved once, 24 hours if it's save[d] more than once." By contrast my 2010 ComputerWeekly reference Pi314m deleted says "Because true CDP copies all delta changes, a system can be restored to any point in time required. This can be especially useful if you need to roll back to a point before a corruption event took place, for example. [new paragraph] Because they depend on fixed-interval copies, near-CDP/snapshots only allow you to roll back to a given point in time. For this reason, true CDP offers a recovery point objective (RPO) of zero [my emphasis], while the equivalent for near-CDP/snapshots is the last time a copy took place." Pi314m's link for "snapshots" at the end of the sub-section doesn't go anywhere, which is just as well because a correct WP link to "snapshots" goes to an article on "the state of a system at a particular point in time"—a capability used for near-CDP backups instead of a kind of backup (the ComputerWeekly author also got the terminology wrong in 2010). Isn't this, as I suspect, too technical for Pi314m—so he considers it too technical for any article reader? DovidBenAvraham (talk) 15:26, 17 June 2019 (UTC)

Descriptions in the two paragraphs as they originally were in the "Performance" sub-section, plus the parts of the "Information_repository" and "Continuous data protection" articles that were edited out

"Performance" subsection of "Backup article

Create synthetic full backups
For example, onto tapes from existing disk archive files—by copying multiple backups of the same source(s) from one archive file to another. This is termed a "synthetic full backup" because, after the transfer, the destination archive file contains the same data it would after a full backup.[1][2][3] One application can exclude[note 1] files and folders from the synthetic full backup.[4]
Automated data grooming
Frees up space on disk archive files by removing out-of-date backup data—usually based on an administrator-defined retention period.[5][6][1][7][8][9][note 2] One method of removing data is to keep the last backup of each day/week/month for the last respective week/month/specified-number-of-months, permitting compliance with regulatory requirements.[10] One application has a "performance-optimized grooming" mode that only removes outdated information from an archive file that it can quickly delete.[11] This is the only mode of grooming allowed for cloud archive files, and is also up to 5 times as fast when used on locally stored disk archive files. The "storage-optimized grooming" mode reclaims more space because it rewrites the archive file, and in this application also permits exclusion compliance with the GDPR "right of erasure" [12] via rules[note 1]—that can instead be used for other filtering.[13]

"Federated information repository" section of "Information Repository" article

A federated information repository is an easy way to deploy a secondary tier of data storage that can comprise multiple, networked data storage technologies running on diverse operating systems, where data that no longer needs to be in primary storage is protected, classified according to captured metadata, processed, de-duplicated, and then purged, automatically, based on data service level objectives and requirements. In federated information repositories, data storage resources are virtualized as composite storage sets and operate as a federated environment.[14]

Federated information repositories were developed to mitigate problems arising from data proliferation and eliminate the need for separately deployed data storage solutions because of the concurrent deployment of diverse storage technologies running diverse operating systems. They feature centralized management for all deployed data storage resources. They are self-contained, support heterogeneous storage resources, support resource management to add, maintain, recycle, and terminate media, track of off-line media, and operate autonomously.[15]

Automated data management

Since one of the main reasons for the implementation of an federated nformation repository is to reduce the maintenance workload placed on IT staff by traditional data storage systems, federated information repositories are automated. Automation is accomplished via policies that can process data based on time, events, data age, and data content. Policies manage the following:

  • File system space management
  • Irrelevant data elimination (mp3, games, etc.)
  • Secondary storage resource management

Data is processed according to media type, storage pool, and storage technology.

Because federated information repositories are intended to reduce IT staff workload, they are designed to be easy to deploy and offer configuration flexibility, virtually limitless extensibility, redundancy, and reliable failover.

Data recovery

Federated information repositories feature robust, client based data search and recovery capabilities that, based on permissions, enable end users to search the information repository, view information repository contents, including data on off-line media, and recover individual files or multiple files to either their original network computer or another network computer.[15]

Edited-out portions of "Continuous data processing" article

CDP runs as a service that captures changes to data to a separate storage location. There are multiple methods for capturing the continuous changes involving different technologies that serve different needs. CDP-based solutions can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, mail boxes, messages, and database files and logs.[16]

Differences from traditional backup

Continuous data protection is different from traditional backup in that it is not necessary to specify the point in time to recover from until ready to restore. Traditional backups only restore data from the time the backup was made. Continuous data protection has no backup schedules. When data is written to disk, it is also asynchronously written to a second location, usually another computer over the network. This introduces some overhead to disk-write operations but eliminates the need for scheduled backups.

Continuous vs near continuous

Some solutions marketed as continuous data protection may only allow restores at fixed intervals such as one hour or 24 hours. Such schemes are not universally recognized as true continuous data protection, as they do not provide the ability to restore to any point in time. These solutions are often based on periodic snapshots, an example of which is CDP Server, disk-based backup software that periodically creates restore points using a snapshot and volume filter device driver to track disk changes.

There is debate in the industry as to whether the granularity of backup must be "every write" to be CDP, or whether a solution that captures the data every few seconds is good enough. The latter is sometimes called near continuous backup. The debate hinges on the use of the term continuous: whether only the backup process must be continuous, which is sufficient to achieve the benefits cited above, or whether the ability to restore from the backup also must be continuous. The Storage Networking Industry Association (SNIA) uses the "every write" definition.

Differences from RAID, replication or mirroring

Continuous data protection differs from RAID, replication, or mirroring in that these technologies only protect one copy of the data (the most recent). If data becomes corrupted in a way that is not immediately detected, these technologies simply protect the corrupted data with no way to restore an uncorrupted version.

Continuous data protection protects against some effects of data corruption by allowing restoration of a previous, uncorrupted version of the data. Transactions that took place between the corrupting event and the restoration are lost, however. They could be recovered through other means, such as journaling.

Backup disk size

In some situations, continuous data protection requires less space on backup media (usually disk) than traditional backup. Most continuous data protection solutions save byte or block-level differences rather than file-level differences. This means that if one byte of a 100 GB file is modified, only the changed byte or block is backed up. Traditional incremental and differential backups make copies of entire files.

Risks and disadvantages

The protection afforded by continuous data protection is often heralded without consideration of the disadvantages and challenges that it can present. Specifically, the continuous bandwidth usage can adversely affect network performance, especially in operations where file sizes are large, such as multimedia and CAD design environments. To mitigate this risk, companies employ throttling techniques that prioritize network traffic to reduce the impact of backup on day-to-day operation.[17]

See also

Notes

  1. ^ a b Exclusion and/or inclusion is done with Selectors in the Windows variant; this misleading term has been changed to Rules in the Macintosh variant.
  2. ^ Some backup applications—notably rsync and CrashPlan—term removing backup data "pruning" instead of "grooming".[1][2]

References

  1. ^ a b "New EMC Dantz Retrospect 7 Improves Data Protection for SMBs and the Distributed Enterprise". DellEMC [current]. EMC Corp. [orig. publisher]. 31 January 2005. Retrieved 23 November 2016.
  2. ^ "About synthetic backups". Veritas Support. Veritas Technologies LLC (US). 25 September 2017. Retrieved 18 November 2017.
  3. ^ "Symantec Backup Exec: About the synthetic backup feature". Helpmax.net. HelpMax Software Help & Shop Inc. Retrieved 13 January 2018. {{cite web}}: Italic or bold markup not allowed in: |website= (help)
  4. ^ "Retrospect ® 12 Windows User's Guide" (PDF). Retrospect. Retrospect Inc. 2017. pp. 30-31(deduplication via "Snapshots"—a Retrospect term which predates and is distinct from Snapshot_(computer_storage)), 31-32(Dashboard), 41-43(removable disk drives), 216-218(selector as subset filter for synthetic full backups), 230-233(Scripted Verification), 280(Multiple Executions), 369(Duplicate Execution Options), 420(Startup Preferences—Launcher for auto-launch), 426-427(E-mail), 433-434(Open File Backup Tips—VSS snapshot at natural pause), 530-544(SQL Server Agent—coordinating VSS snapshot), 545-566(Exchange Server Agent—coordinating VSS snapshot). Retrieved 2 September 2018.
  5. ^ Preimesberger, Chris (31 March 2017). "World Backup Day 2017: 'We Don't Know the Day Nor the Hour'". eWeek. QuinStreet. Ian Wood of Veritas. Retrieved 11 November 2017.
  6. ^ Fernando, Sal (30 April 2008). "Combine disk, tape benefits to protect data". ZDNet. Retrieved 13 November 2017.
  7. ^ Kaczorek, Mariusz (15 August 2015). "NetBackup Storage Lifecycle Policy (SLP): Overview". Settlersoman. Settlersoman. Retrieved 2 February 2018.
  8. ^ Jain, Hemant (14 April 2015). "VOX Knowledge Base: Data Protection Knowledge Base: Data Protection". VOX. Veritas Technologies LLC. Retrieved 13 January 2018. Employee [of Veritas]
  9. ^ Dorion, Pierre (January 2007). "IBM Tivoli Storage Manager vs. traditional backup". TechTarget. Tech Target Inc. Backup versions. Retrieved 30 October 2018.
  10. ^ "Retrospect ® 12.0 Mac User's Guide" (PDF). Retrospect. Retrospect Inc. 2015. pp. 8-9(Improved Grooming). Retrieved 28 December 2017.
  11. ^ Schmitz, Agen (5 March 2016). "Retrospect 13". TitBITS. TidBITS Publishing Inc. Retrieved 27 October 2016.
  12. ^ "Support: Knowledge Base". Retrospect. Retrospect Inc. 24 April 2019. #Resources (Auto Launching Guide ..., ... difference between "Backup" and "Duplicate", Avid Support ..., Instant Scan FAQ, Can't use Open File Backup ...), #Email Backup, #Top Articles (BackupBot – Deep Dive into ProactiveAI, How to Set Up Remote Backup, GDPR – Deep Dive into Data Retention Policies, Deep Dive - Components [and phases] of a Retrospect Backup, How to Set Up the Management Console, Management Console - How to Use Shared Scripts, How to Use Storage Groups, Support End-of-Life Announcement for Mac OS X 10.3, 10.4, and 10.5, Retrospect Compatibility with Apple File System (APFS)), #Hooks (Script Hooks: External Scripting with Event Handlers, Script Hooks: How to Protect MongoDB with Retrospect, Script Hooks: How to Protect MySQL with Retrospect, Script Hooks: How to Protect PostgreSQL with Retrospect). Retrieved 4 May 2019.
  13. ^ Schmitz, Agen (28 May 2018). "Retrospect 15.1.1". TitBITS. TidBITS Publishing Inc. Retrieved 20 June 2018.
  14. ^ Armstrong, Mark (9 August 2007). "Benefits of a Federated Information Repository as a Secondary Storage Tier". SNIA Enterprise Information World 2007 Conference. Storage Networking Industry Association (SNIA). Retrieved 1 May 2019.
  15. ^ a b "Area Under Surveillance". SoleraTec. SoleraTec LLC. 2019. Phoenix RSM: (Record, Store, Manage), Surveillance Video Management (information repository), Ultra-fast Search and Playback (content-based search queries). Retrieved 6 May 2019.
  16. ^ "An Overview of Continuous Data Protection". Infosectoday.com. Retrieved 2011-11-12.
  17. ^ Off-Site Backup - The Bandwidth Hog Archived 2011-07-07 at the Wayback Machine

DovidBenAvraham (talk) 20:43, 28 May 2019 (UTC)