Stories
Slash Boxes
Comments

SoylentNews is people

posted by n1 on Wednesday April 23 2014, @09:28AM   Printer-friendly
from the checksum-did-not-match dept.

It has been discovered that Microsoft OneDrive for Business has been altering files being uploaded by users, injecting additional data into the files.

So what this means is that people who use OneDrive for Business or SharePoint need to be very careful with what they sync with it, especially those handling third party data due to confidentiality issues. For example, if an employee needs to transfer confidential files that absolutely must not be touched between its laptop and PC and decides to do so through a synced folder in OneDrive for Business, those files will end up being inadvertently modified without the user's knowledge. This could have severe consequences if let's say a file is used as evidence in a court case. How do you prove that the company did not intentionally modify it?

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by omoc on Wednesday April 23 2014, @09:38AM

    by omoc (39) on Wednesday April 23 2014, @09:38AM (#34768)

    Wow, a company of this size that does not understand the importance of integrity.

    • (Score: 4, Informative) by lhsi on Wednesday April 23 2014, @10:25AM

      by lhsi (711) on Wednesday April 23 2014, @10:25AM (#34780) Journal

      The SoylentNews tagline "checksum-did-not-match" is particularly accurate here as the person who discovered this backed up then deleted a folder in order to force a "fresh" sync and most of the checksums did not match after he did this.

    • (Score: 2, Insightful) by Anonymous Coward on Wednesday April 23 2014, @11:20AM

      by Anonymous Coward on Wednesday April 23 2014, @11:20AM (#34793)

      Wow, a company of this size that does not understand the importance of integrity.

      Micro$oft is the very polar opposite of integrity, always was. Not surprised it still is.

      http://wayback.archive.org/web/20120605103241/http ://www.msversus.org/ [archive.org]

      Add to that at least "restricted boot" and the OOXML standardizing fiasco...
      But if you say any of these things you're "Microsoft basher" and get told "haters will hate" and "it's so childish to spell it M$" etc ad nauseam.

    • (Score: 2) by Dunbal on Wednesday April 23 2014, @04:44PM

      by Dunbal (3515) on Wednesday April 23 2014, @04:44PM (#34987)

      There are entire countries today who don't give a fuck about integrity.

    • (Score: 0) by Anonymous Coward on Friday May 09 2014, @10:49AM

      by Anonymous Coward on Friday May 09 2014, @10:49AM (#41191)

      If the margin, china pretax earnings increased 7. With a gigantic rupture in the case may be some potential permanent shift in a couple of points in addition to brutality., gambling in vegas [onlinecasi...iazone.com], [url="http://onlinecasinoaustraliazone.com/ "]gambling in vegas[/url], siyv, online casinos for real money [bestcasinoclubcom.com], [url="http://bestcasinoclubcom.com/"]online casinos for real money[/url], 084,

  • (Score: 3, Interesting) by Bartman12345 on Wednesday April 23 2014, @10:12AM

    by Bartman12345 (1317) on Wednesday April 23 2014, @10:12AM (#34773)

    The sheer audacity of this is just... breathtaking. Surely MS didn't seriously think that no-one would ever twig that this was happening. I am looking forward to hearing their response to this farce, should be quite entertaining.

    • (Score: 5, Insightful) by tftp on Wednesday April 23 2014, @10:27AM

      by tftp (806) on Wednesday April 23 2014, @10:27AM (#34781) Homepage

      The sheer audacity of this is just... breathtaking. Surely MS didn't seriously think that no-one would ever twig that this was happening. I am looking forward to hearing their response to this farce, should be quite entertaining.

      They just see all MS Office formats and XML schemas as their private property, and they believe that they can do whatever they want with them for the best "customer experience" (this means anything they want it to mean at a given time and place.)

      What this really signifies is that every MS Office document (and perhaps other files) are opened and parsed by some agent. Most people would be uncomfortable, to the point of calling the police, if USPS decides to open mail "to prevent duplication of content," even though plenty of mail that USPS delivers is duplicates.

      Microsoft should treat all files as opaque binary objects. However that's not what they really want; there is no fun in being just a warehouse worker. They want to own all your information, even if they don't say it up front. The fact of parsing the files says that they access the content, and insertion of a GUID indicates that the GUID is coming from somewhere, and that it is a key in some database. What else is in that database, and on what grounds does MS even maintain it?

      At some point paranoia toward "cloud" services appears to be just a sound business decision. I am glad that I have no cloud accounts of any kind.

      • (Score: 3, Insightful) by Nerdfest on Wednesday April 23 2014, @10:33AM

        by Nerdfest (80) on Wednesday April 23 2014, @10:33AM (#34785)

        Wow, it sure sounds like people are being "scroogled".

      • (Score: 3, Interesting) by edIII on Wednesday April 23 2014, @06:17PM

        by edIII (791) on Wednesday April 23 2014, @06:17PM (#35056)

        I don't think there was a whole lot of maliciousness here on the part of Microsoft.

        The key part that I took away from it is that it was only happening to the enterprise and business customers. I think you are spot on about the GUID and database. I'm going to go with pure stupidity here. Microsoft was probably trying to put something in every file to enable some sort of service or function.

        It would really behoove Microsoft to come out and explain whatever new service they were trying to offer and they obviously had morons implementing it.

        Seriously. These are the people who created ALT streams in NTFS. They had to store this extra information with the file (why??) and modified the file direct instead of attaching this new metadata in a more transparent way.

        I'm honestly just baffled here. What enterprise feature could they have possibly been offering with this?

        Otherwise, this is just really stupid. A new level of stupid.

        --
        Technically, lunchtime is at any moment. It's just a wave function.
        • (Score: 2) by Hairyfeet on Wednesday April 23 2014, @09:33PM

          by Hairyfeet (75) <bassbeast1968NO@SPAMgmail.com> on Wednesday April 23 2014, @09:33PM (#35154) Journal

          Sigh...for years MSFT has been pushing the whole "metadata for a better future" kind of thing, hell did nobody look at WinFS? It was nothing BUT metadata! Most likely just as Google+ uses metadata to sort your pictures into a dozen different categories MSFT was trying to sort metadata and when a file didn't actually HAVE metadata? Then put the basics of where and when it was uploaded so it would have metadata to parse.

          --
          ACs are never seen so don't bother. Always ready to show SJWs for the racists they are.
    • (Score: 2) by ngarrang on Wednesday April 23 2014, @02:31PM

      by ngarrang (896) on Wednesday April 23 2014, @02:31PM (#34903) Journal

      There is a simple solution to this...Microsoft just needs to be open about this practice so that businesses understand this when subscribing to the service. Then, the businesses can make an informed decision.

    • (Score: 2) by Grishnakh on Wednesday April 23 2014, @03:08PM

      by Grishnakh (2831) on Wednesday April 23 2014, @03:08PM (#34932)

      If you use MS software, you really shouldn't be worried about data integrity, privacy, or anything else like that. If you use a closed-source vendor with secret file formats, you have no right to expect any privacy from that vendor, since you explicitly gave up your right to privacy by relying on secret file formats. It goes double for using that vendor's cloud service.

  • (Score: 4, Funny) by clone141166 on Wednesday April 23 2014, @10:14AM

    by clone141166 (59) on Wednesday April 23 2014, @10:14AM (#34774)

    If it were any other company this might actually be news. "Micro$oft stuffed up again" is like "Man bites dog" headline.

    • (Score: 1) by clone141166 on Wednesday April 23 2014, @10:17AM

      by clone141166 (59) on Wednesday April 23 2014, @10:17AM (#34775)

      Doh! I mean "Dog bites man" *sigh* I will submit my job application to Micro$oft tomorrow.

  • (Score: 5, Interesting) by mendax on Wednesday April 23 2014, @10:17AM

    by mendax (2840) on Wednesday April 23 2014, @10:17AM (#34776)

    Last year I attended a webinar on cloud computing for attorneys, a primer for those attorneys who wished to keep their case files in the cloud. At that time OneCloud did not exist. But as I recall, one of the concerns expressed was integrity. For lawyers, it's of prime importance. It's very easy for an entire case to be thrown out because a piece of evidence can be shown by the defense to have been tampered with in some way. It casts a shadow of doubt over its veracity and often that's all that's needed to lose the case.

    --
    It's really quite a simple choice: Life, Death, or Los Angeles.
    • (Score: 4, Informative) by lhsi on Wednesday April 23 2014, @10:31AM

      by lhsi (711) on Wednesday April 23 2014, @10:31AM (#34784) Journal

      There is an example of the change in the article - one is adding some extra tags into a HTML page, which is at least obvious as to what has changed.

      However later on it points out that about 8KB is being added to Word/Excel/Publisher files, apparently adding uniquely identifiable information.

      • (Score: 4, Insightful) by Horse With Stripes on Wednesday April 23 2014, @12:10PM

        by Horse With Stripes (577) on Wednesday April 23 2014, @12:10PM (#34816)

        The additional 8K may just be what the OS shows as the file size. This often happens when a little bit of data causes a file size to increase to the next size (I forget if it's cluster size or allocation size ... or something like that). Of course, if he's using the default 4KB then at least 4097B were added.

        It's still wrong, oh so wrong, and who knows what MS is adding to the Office documents? I wish the blogger had provided more information. There are plenty of tools available to show the differences in files, and making this extra effort wouldn't have taken long at all.

        • (Score: 3, Informative) by lhsi on Wednesday April 23 2014, @12:18PM

          by lhsi (711) on Wednesday April 23 2014, @12:18PM (#34821) Journal

          I mentioned that it added uniquely identifiable information, here is the full quote about it from the article:

          As for Word, Excel and Publisher files ('docx', 'xlsx' and 'pub' file extensions), these grew by about 8KB. Unlike the web files, these Microsoft Office files had what appears to be uniquely identifiable code added, potentially making it possible to match them to a company and possibly even to a specific user's account.

          • (Score: 2) by Horse With Stripes on Wednesday April 23 2014, @12:26PM

            by Horse With Stripes (577) on Wednesday April 23 2014, @12:26PM (#34824)

            Yes, I read that. And I'm not disputing that MS added uniquely identifiable data in any way. The fact that he didn't post any of that information leaves me to wonder if the "about 8KB" is actually "about 8KB" of data or just the new file size of the modified file after MS added data to the file (NTFS reports file sizes in even multiples of the allocation size).

            Providing the data that was added would have given us more information and really wouldn't have taken much time at all. 8KB of additional data is a lot of information to add for tracking purposes, and knowing what it was would have given us something to look for in files we've received, or to be able to verify his results when testing this issue for ourselves.

            • (Score: 3, Informative) by lhsi on Wednesday April 23 2014, @01:06PM

              by lhsi (711) on Wednesday April 23 2014, @01:06PM (#34853) Journal

              I found the original blog about it with some more information. Hopefully sufficient information to reproduce (while also not giving away his potentially unique identifier).

              Source: http://www.myce.com/news/microsoft-onedrive-for-bu siness-modifies-files-as-it-syncs-71168/ [myce.com]

              To get an idea of what was added, I used 7-Zip to extract the content of the Word file before and after syncing. There were two '.rels' files and one XML file modified and three folders with files added - 'customXml' containing 6 XML files, a folder '_rels' inside this containing three '.rels' files and a '[trash]' folder containing a '0000.dat' file. In the 'docProps' folder, a file 'custom.xml' contains a property with a 'ContentTypeId' name attribute with a unique ID.When I used 7-zip to look inside the two Microsoft Publisher files, the synced Publisher file had a 'MsoDataStore' folder added in it, inside which contains 3 folders with gibberish names and 2 XML files inside each. I found the same ContentTypeID code inside as the Word file and while it matched, it was different to that in files I compared with other users.

            • (Score: 2) by tempest on Wednesday April 23 2014, @01:13PM

              by tempest (3050) on Wednesday April 23 2014, @01:13PM (#34855)

              "8KB of additional data is a lot of information to add for tracking purposes"

              We're talking about Microsoft Office documents here. Not exactly the pinnacle of efficiency, so that sounds about right.

              • (Score: 2) by Horse With Stripes on Wednesday April 23 2014, @02:27PM

                by Horse With Stripes (577) on Wednesday April 23 2014, @02:27PM (#34897)

                We're talking about Microsoft Office documents here. Not exactly the pinnacle of efficiency, so that sounds about right.

                Toucé

                • (Score: 2) by tangomargarine on Wednesday April 23 2014, @03:46PM

                  by tangomargarine (667) on Wednesday April 23 2014, @03:46PM (#34957)

                  *Touché

                  (don't use so many caps warning? wat)

                  --
                  "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
                  • (Score: 2) by Horse With Stripes on Wednesday April 23 2014, @07:12PM

                    by Horse With Stripes (577) on Wednesday April 23 2014, @07:12PM (#35084)
                    Yes, I got the same warning so I quoted the parent and it stopped the caps warning. Of course, when I added the 'é' I forgot the 'h'. 'A' for effort, 'C' for execution and 'F' for spelling.
              • (Score: 2) by etherscythe on Wednesday April 23 2014, @08:22PM

                by etherscythe (937) on Wednesday April 23 2014, @08:22PM (#35120) Journal

                Yeah, I dunno about you guys but I haven't seen a Word doc come in at below 20KB even freshly made with nothing in it.

                --
                "Fake News: anything reported outside of my own personally chosen echo chamber"
              • (Score: 1) by meisterister on Wednesday April 23 2014, @11:41PM

                by meisterister (949) on Wednesday April 23 2014, @11:41PM (#35250) Journal

                Then I'd say that 8KB is a bit small for Microsoft. I was expecting something more long the lines of 8GiB.

                --
                (May or may not have been) Posted from my K6-2, Athlon XP, or Pentium I/II/III.
      • (Score: 2) by tangomargarine on Wednesday April 23 2014, @03:43PM

        by tangomargarine (667) on Wednesday April 23 2014, @03:43PM (#34953)

        It's already enough of a What The Fuck maneuver touching customers' data files *in any way*, but they're even adding *uniquely identifiable IDs*?! So blatantly unethical, no wonder they didn't want to admit they were doing it...

        --
        "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
        • (Score: 1, Interesting) by Anonymous Coward on Wednesday April 23 2014, @06:56PM

          by Anonymous Coward on Wednesday April 23 2014, @06:56PM (#35073)

          This is a value-add for companies that want to track access to documents. It helps find leakers. Some companies have internal investigators who know how to use MS metadata in Word files for just that purpose. Sanitizers for Office docs exist for a good reason, too.

    • (Score: 1) by choose another one on Wednesday April 23 2014, @12:38PM

      by choose another one (515) Subscriber Badge on Wednesday April 23 2014, @12:38PM (#34831)

      It's not the cloud, it's what system you are buying, wherever it is. Any lawyer dealing with electronic evidence discovery storage or production who does not know what Records Management is and why they need a proper Electronic Records Management system, and what standards apply, should IMO be disqualified from practice.

      Feel free to deal with such a lawyer, but be aware that your attorney-client correspondence is probably in clear-text on several email servers, and a couple of USB sticks lost in the car park.

      • (Score: 2) by mendax on Wednesday April 23 2014, @06:22PM

        by mendax (2840) on Wednesday April 23 2014, @06:22PM (#35059)

        Feel free to deal with such a lawyer, but be aware that your attorney-client correspondence is probably in clear-text on several email servers, and a couple of USB sticks lost in the car park.

        Indeed, and another topic that was discussed was ensuring that encryption was used for all documents put in the cloud. The recommendation was NOT to rely upon the cloud provider's encryption but instead to encrypt them BEFORE putting them in the cloud.

        --
        It's really quite a simple choice: Life, Death, or Los Angeles.
    • (Score: 0) by Anonymous Coward on Wednesday April 23 2014, @07:24PM

      by Anonymous Coward on Wednesday April 23 2014, @07:24PM (#35092)

      Damn, if it had been an actual physical seminar, you could've locked the doors and covered the building with congrete.

  • (Score: 3, Insightful) by geb on Wednesday April 23 2014, @10:23AM

    by geb (529) on Wednesday April 23 2014, @10:23AM (#34777)

    When you sign up for the service, is there anything in the agreement which guarantees your ability to get your data back? Getting back a file that superficially resembles your original data is not the same thing. Unless the contract says they can, it shouldn't be the cloud provider deciding which bytes are important, and which can be edited.

    • (Score: 2) by GreatAuntAnesthesia on Wednesday April 23 2014, @03:21PM

      by GreatAuntAnesthesia (3275) on Wednesday April 23 2014, @03:21PM (#34940) Journal

      > When you sign up for the service, is there anything in the agreement which guarantees your ability to get your data back?

      I very much doubt it. Why on Earth write an EULA that actually commits them to doing anything? They wrote it, your job is to tick "I accept" without reading it. They can (and will) make it as one-sided as they please. Far more likely it reads something like:

      "Microsoft will graciously allow you to upload your data to Microsoft servers. Microsoft will make every reasonable[1] effort to make sure that you can download your data at a later date. Any interruption in service, any lost / missing / munged data is entirely your problem and not ours.

      Oh, and you totally agree to let us read your data, pull data off your windows computer that maybe you didn't intend to upload, analyse your data, use your data, futz with your data, have you arrested or sued because we don't like your data, publish your data, claim ownership of your data, screw your wife, kill your dog, drink all the beer out of your fridge, and anything else that we consider reasonable [2]"

      [1] the word "reasonable" is defined as "whatever Microsoft damn well pleases".

      [2] there's that word again.

  • (Score: 3, Insightful) by Horse With Stripes on Wednesday April 23 2014, @10:24AM

    by Horse With Stripes (577) on Wednesday April 23 2014, @10:24AM (#34779)

    I can see all sorts of issues with this. I wonder what the last mod timestamp is? Does it match your local files? I doubt it, or that would make it appear as if MS were trying to cover their tracks.

    These are clearly MS specific tags. How presumptuous is that? And they are only added once, so MS is scanning the files to look for these tags and then adding them if they don't exist? Could they be updating them as well to track changes in your documents?

    So what happens when some small developer is using this service as a backup repository and syncs their files? Do all their local copies get overwritten? What happens to Sharepoint files when a small business uses OneMoreModDrive for business?

    Depending on when these files are modified by MS this could wreak havoc on backups and/or file synchronization processes that rely on last mod date or file sizes.

    How does this ever get past a management group that is supposed to ask "what could go wrong?"

    • (Score: 3, Informative) by Bytram on Wednesday April 23 2014, @11:47AM

      by Bytram (4043) on Wednesday April 23 2014, @11:47AM (#34803) Journal

      Horse With Stripes [soylentnews.org] wrote:

      I can see all sorts of issues with this. I wonder what the last mod timestamp is? Does it match your local files? I doubt it, or that would make it appear as if MS were trying to cover their tracks.

      I think your question is answered in the source used to construct the linked article: http://www.myce.com/news/microsoft-onedrive-for-bu siness-modifies-files-as-it-syncs-71168/ [myce.com] (emphasis added)

      Even though OneDrive for Business modified these files, it left the 'Date Modified' attribute in every file unchanged, so to an unsuspecting user who just checks when the files were modified, they appear untouched. For example, the Word file shows a modified time of '16:14:14' for both the original and synced file, even though the file sizes are clearly different. The only files that remain untouched are those that were placed in the synced folder on the original computer, so even if a user checks the files they place in a synced folder, they would not know anything is being modified unless they physically took those files to another computer with the matching synced folder to compare them.

    • (Score: 2) by mrclisdue on Wednesday April 23 2014, @11:52AM

      by mrclisdue (680) on Wednesday April 23 2014, @11:52AM (#34806)

      I wonder what the last mod timestamp is? Does it match your local files?

      from a link in tfa:

      Even though OneDrive for Business modified these files, it left the ‘Date Modified’ attribute in every file unchanged, so to an unsuspecting user who just checks when the files were modified, they appear untouched. For example, the Word file shows a modified time of ’16:14:14’ for both the original and synced file, even though the file sizes are clearly different. The only files that remain untouched are those that were placed in the synced folder on the original computer, so even if a user checks the files they place in a synced folder, they would not know anything is being modified unless they physically took those files to another computer with the matching synced folder to compare them.

      cheers,

      • (Score: 2) by mrclisdue on Wednesday April 23 2014, @11:54AM

        by mrclisdue (680) on Wednesday April 23 2014, @11:54AM (#34807)

        ...I see I was beaten to the punch, something about timestamps...

    • (Score: 1) by choose another one on Wednesday April 23 2014, @12:10PM

      by choose another one (515) Subscriber Badge on Wednesday April 23 2014, @12:10PM (#34814)

      I can see all sorts of issues with this. I wonder what the last mod timestamp is? Does it match your local files?

      I just uploaded a file to web hosting, downloaded it again - yep, last mod timestamp has changed! OMG - my web host is modifying my files !!

      I just ftped a test file to another local machine and back and last mod timestamp changed again - OMFG, the NSA are modifying files on my home network - PULL ALL THE PLUGS! .NO CARRIER

      • (Score: 3, Informative) by Horse With Stripes on Wednesday April 23 2014, @12:18PM

        by Horse With Stripes (577) on Wednesday April 23 2014, @12:18PM (#34820)

        Actually, modifying the timestamps would be the right thing to do if they are modifying the files. The fact that they aren't makes this worse.

        BTW, if you want to maintain the timestamps of files that are FTP'd you can just select the "preserve timestamps of transferred files" option in your FTP application and it won't muck with the dates. If you're syncing any files between servers then maintaining timestamps is kind of important.

        • (Score: 0) by Anonymous Coward on Wednesday April 23 2014, @05:10PM

          by Anonymous Coward on Wednesday April 23 2014, @05:10PM (#35006)

          you don't know what the fuck you're talking about. assuming you're on windows, so what you see in the explorer thing or whatever file browsing utility as the modification time of a file -- that's a bit of metadata maintained by the filesystem.

          when you tranfer a file to somewhere, a file is created on a different filesystem. it has just been both created and modified in that filesystem. if you transfer a file from somewhere, you've just created and modified a file in yet a different filesystem.

          the client behavior you're talking about "preserve timestamps" -- THAT's lying about modification time. your application alters the true creation and modification times of the file in the filesystem to which it is transferred so it looks like it did on the source.

          you should EXPECT creation and modification times of transferred files to change, unless you are specifically asking for filesystem metadata to be transferred (as you do with "preserve timestamps")

  • (Score: 4, Insightful) by Geezer on Wednesday April 23 2014, @10:28AM

    by Geezer (511) on Wednesday April 23 2014, @10:28AM (#34782)

    Really, after all the NSA revelations, major-site MITM hacks, Heartbleed, and now this, isn't it about time the suits get over this "cloud" bullshit and get back to doing secure data storage like its a fixed-cost necessity again and not a value-added commodity?

    Maybe when this asinine "cloud" finally blows away we'll have a bright, sunny storage day.

    • (Score: 4, Insightful) by bucc5062 on Wednesday April 23 2014, @11:35AM

      by bucc5062 (699) on Wednesday April 23 2014, @11:35AM (#34799)

      Agreed. I did not quite agree with this sentiment in the blog:

      Of course, in many situations it may not cause a problem for Microsoft to meddle with the contents of the files that it stores and syncs with users, but that’s not the point.

      It causes a problem every time Microsoft opens a file that is not theirs and alters it. The problem is privacy and trust. If the TOS has buried in it some phrase that absolves them of this them shame on all. Customers for not reading the fine print, Microsoft for being Big Brother.

      This is akin to a storage facility opening boxes you stored, checking the content then putting a tracking tag on it for their own use. If that was stated up front I doubt that facility would get much if any business.

      For the 64,000 dollar question, do other Cloud services do something similar? Perhaps it is time to study this further (looking at you Drive and E2).

      --
      The more things change, the more they look the same
      • (Score: 2) by Bot on Wednesday April 23 2014, @01:03PM

        by Bot (3902) on Wednesday April 23 2014, @01:03PM (#34851) Journal

        > The problem is privacy and trust

        And screwed up checksums, root of potentially serious headaches for admins and users.

        --
        Account abandoned.
      • (Score: 3, Interesting) by Common Joe on Wednesday April 23 2014, @02:00PM

        by Common Joe (33) <common.joe.0101NO@SPAMgmail.com> on Wednesday April 23 2014, @02:00PM (#34878) Journal

        It isn't just the cloud where they modify files either. When you open an Excel file on your desktop to simply look at it and then close it without saving it, it will modify your file without your knowledge, but the file size will not change and the modified date will not change. I'll provide proof. First, the link [microsoft.com] at Microsoft's website. Next, how to reproduce the problem yourself:

        1. Login to your machine as UserName1.

        2. Create an excel file and sprinkle some data into it. Save it and close it.

        3. Copy it under a different file name. You now have two identical files. We'll call them FileName1 and FileName2.

        4. Login to your machine as UserName2.

        5. Open FileName1 with Excel. Do not change anything. Close it. Do NOT open FileName2.

        6. Perform a binary compare between FileName1 and FileName2. It will fail. In my test cases from way back when, it left the modified date and the file size exactly the same. I was using Excel 2007 and 2010 on Windows 8.0, and the user names were very close in size.

        From the link I provided: "This problem occurs because when you open an Excel workbook, Excel writes the name of the current user to the header of the file. This is necessary so that other users receive the "file in use" notification. The operating system then updates the last modified date." I didn't see it updating that modified date but it did modify that file. Making it read-only should prevent that, but you shouldn't have to do that with your files.

        I found this out because I was writing a backup program (never finished it), but I got some failures one day. I knew I didn't have to copy the file so I only did a verification on it. It stumped me because there were no outside changes to the file (modified date, file size, etc), but there was definitely a difference and when I opened it up with a hex editor, the only thing that changed was the user name and maybe a couple of other bytes. (It's been a while since I looked at the exact issue, so I forget the exact details.)

        Truecrypt can do this to you too, but there is a setting that you set. Under Settings --> Preferences --> Windows, there is a checkbox for "Preserve modification timestamp of file containers".

    • (Score: 2) by VLM on Wednesday April 23 2014, @11:55AM

      by VLM (445) on Wednesday April 23 2014, @11:55AM (#34808)

      Well, "cloud" means nothing and you can screw over the users any way you want. Like MS is doing.

      There's nothing wrong with the classical "ftp site as a service" (or ssh equiv, etc) assuming the operators are trustworthy (aka not MS).

    • (Score: 2) by MrGuy on Wednesday April 23 2014, @12:49PM

      by MrGuy (1007) on Wednesday April 23 2014, @12:49PM (#34838)

      There are a lot of cloud concepts that are really powerful. Multi-device sync. Easy shared folder management that doesn't require you to be in the office or everyone to be online. Not losing access to your files if a single server goes down. Easy virtualized server management. One-button server cloning for testing. All cool stuff.

      The problem isn't with clouds per se. It's with THIRD PARTY HOSTED clouds, where you get all these advantages at the cost of someone else having your data.

      If you're a semi-large company, you can get a lot of these same advantages from a self-hosted private cloud. You can't quite spin up capacity on demand, but if you have some excess you can approximate it pretty well.

      Of course, there's also a potential market for a cloud service that truly encrypts data strongly, so they couldn't do something like this even if they wanted. But in this day and age of the NSA, I'm not sure we'll ever trust that again, so maybe that ship has sailed.

  • (Score: 5, Informative) by choose another one on Wednesday April 23 2014, @11:44AM

    by choose another one (515) Subscriber Badge on Wednesday April 23 2014, @11:44AM (#34802)

    This is the way it's been for a decade or more with SharePoint, and with other similar DMSs, and is hardly news.

    Here's another one - if you open an Office document and save it again, the file may have changed on disk even if you didn't change anything. It's done that for as long as I can remember - maybe back to Office 2000.

    The bits of the office document format that get modified are metadata, not content - as you'll find out if you digitally sign the files in the way the file format intends / is documented, e.g. using the MS tools in Office. SkyDrive or SharePoint will not invalidate your signatures - because the metadata is not covered by the signature. By design, and documented, and has been known for years.

    If it were an all-Apple world and all filesystems had resource forks, this stuff would be in a fork and no one would bother about it - except it would still have to be packaged into one stream for the network (which is what the Office file format, old and OOXML versions do), and then someone would sniff the stream and complain that what was sent out is not what is coming back [sigh].

    The whole complaint is akin to complaining that your email integrity is compromised because every server in the delivery path is adding its own headers, or complaining that your web content integrity is compromised because your hosting server adds an ETag header.

    Furthermore, if you want to properly guarantee the integrity of the bytes, you need a records management system (which you can get in cloud or on premise, and SkyDrive is not one, nor is SharePoint), not a (mutable) document / content management system. Again, this shouldn't be news - we've had government standards for electronic records management systems around the world, since last century. If you haven't figured out by now that for electronic business records requiring full integrity you might need a certified and tested electronic records management system - then really what did you think all those standardisation efforts were for ? Where have you been ? Storing all your jewellery in the garden shed and then complaining it hasn't maintained integrity as well as a bank vault perhaps ?

    Horses for courses, know your tools and what they do, and don't complain when a hammer does a bad job putting in a screw.

    • (Score: 3, Insightful) by MrGuy on Wednesday April 23 2014, @12:10PM

      by MrGuy (1007) on Wednesday April 23 2014, @12:10PM (#34815)

      There are times when I really wish I got mod points more often than I do since the recent change. This is one of them. Nice writeup.

    • (Score: 5, Informative) by bucc5062 on Wednesday April 23 2014, @12:49PM

      by bucc5062 (699) on Wednesday April 23 2014, @12:49PM (#34837)

      Hold on a second. I can see that opening a MS Word or Excel file may alter the metadata in some way, this is not the same as what is bring presented here in this article. The original author even made a simple test [myce.com] that showed an HTML file getting *altered* by the sync process. While it would have been nice to give a similar example with Word or Excel documents, he did not say the file size remained the same but the content changed (metadata change), but he stated that *extra data* was added.

      Boil it down, a sync is nothing but a file transfer, no matter how fancy you make it. With a file transfer there is no need to "open" the file in its default program, there is no need to open it beyond a system style file IO to move *the same bytes* from one location to another. The end product being the same file on both systems. This is not the case.

      Talking with a fellow Tech person as to why MS would do this the thought was to save on duplicate files. Say Joe as an io.h file and Jane has the same file. Why have both on the cloud when it is just "the same file" so MS will tag it, store one copy and present to the customer their "file" on demand. However, even outside the privacy aspects of reading content, this is MS actually taking control of your files without your knowledge. Let us not give them a pass just because they alter metadata when opening/closing a MS standard file. They took a step beyond that and if this spreads and turns out to be true, could really alter the Cloud Business model.

      --
      The more things change, the more they look the same
      • (Score: 2) by Sir Garlon on Wednesday April 23 2014, @02:30PM

        by Sir Garlon (1264) on Wednesday April 23 2014, @02:30PM (#34899)

        Talking with a fellow Tech person as to why MS would do this the thought was to save on duplicate files. Say Joe as an io.h file and Jane has the same file. Why have both on the cloud when it is just "the same file" so MS will tag it, store one copy and present to the customer their "file" on demand.

        There is no need to tag a file in order to do that. Just store the MD5 checksum of each file and compare it to whatever the customer uploads.

        --
        [Sir Garlon] is the marvellest knight that is now living, for he destroyeth many good knights, for he goeth invisible.
        • (Score: 2) by tibman on Wednesday April 23 2014, @05:33PM

          by tibman (134) Subscriber Badge on Wednesday April 23 2014, @05:33PM (#35023)

          Collisions would make for some hilarity though.

          --
          SN won't survive on lurkers alone. Write comments.
      • (Score: 2, Insightful) by choose another one on Wednesday April 23 2014, @03:01PM

        by choose another one (515) Subscriber Badge on Wednesday April 23 2014, @03:01PM (#34925)

        It's SharePoint (under the hood). It's not a file transfer, it is storing a document in SharePoint and getting it out again and it has always affected/synced metadata, where the file format provides for it.

        As to why MS do it - read through the SharePoint docs on metadata sync, document information panel, etc. - try here for 2007 example (although 2003 did it too, so it's at least a decade old): http://msdn.microsoft.com/en-us/library/ms550037(v =office.12).aspx [microsoft.com]

        Your Tech person is way off by the way - MS will de-dupe using the SharePoint shredded storage backend (worth looking up, including the associated MS-FSSHTTP protocol, and remembering they bought Groove Networks) which fragments files intelligently (content dependent boundaries) for office XML and similar formats, and falls back to "dumb" fragmentation for other (opaque) file formats, similar then to block-level de-dupe. No metadata injection is required for this.

        As to the cloud business model, you can always store your stuff encrypted in the cloud, which will prevent other than block-level de-dupe. If your cloud storage starts charging you more for that, then you'll know it makes a difference and you can choose. Doesn't alter, or break, anything.

        • (Score: 3, Interesting) by bucc5062 on Wednesday April 23 2014, @03:15PM

          by bucc5062 (699) on Wednesday April 23 2014, @03:15PM (#34937)

          You got me curious so I did look one OneDrive for Business and indeed, you are correct. see this link [microsoft.com].
          From the page:

          OneDrive for Business is a personal library intended for storing and organizing your work documents. As an integral part of Office 365 or SharePoint Server 2013, OneDrive for Business lets you work within the context of your organization, with features such as direct access to your organization’s address book.

          That does put a slightly different spin on the topic. With the said, if companies are not aware of this and use it for more secure documents, does it invalid them in a court of law situation? As another post mentioned, metadata *is* data in the file and if could be viewed, when used in a public cloud as breaking privacy. Sharepoint admined within a company may be one thing, Sharepoint admined by Microsoft another, but I'll admit I do not know much more so I'll refrain comment further.

          --
          The more things change, the more they look the same
    • (Score: 0) by Anonymous Coward on Wednesday April 23 2014, @12:55PM

      by Anonymous Coward on Wednesday April 23 2014, @12:55PM (#34849)

      And we all know metadaata isn't data, right?

      >if you open an Office document and save it again
      That's not what happened. Nice try, though.

    • (Score: 1) by quacking duck on Wednesday April 23 2014, @01:47PM

      by quacking duck (1395) on Wednesday April 23 2014, @01:47PM (#34867)

      Disagree that they should get a pass on this. Files are NOT email. This would be analogous to servers adding tags or other metadata onto file attachments *in* that email.

      A file copy is not an open-and-save at the application level, it is an open-filestream+copy+close-filestream on the OS/filesystem level. There are certain expectations attached to the latter and changing the structure and contents of the file itself is not one of them.

      • (Score: 1) by choose another one on Wednesday April 23 2014, @02:34PM

        by choose another one (515) Subscriber Badge on Wednesday April 23 2014, @02:34PM (#34905)

        > Disagree that they should get a pass on this. Files are NOT email. This would be analogous to servers adding tags or other metadata onto file attachments *in* that email.

        Like, say, whether that attachment was virus checked at a server ?

        > A file copy is not an open-and-save at the application level, it is an open-filestream+copy+close-filestream on the OS/filesystem level.

        "OneDrive for Business" is not really a filesystem - it is an online application. It used to be called "SharePoint Workspace" then "SkyDrive Pro". It is effectively a SharePoint client syncing stuff with a SharePoint server, and SharePoint has always altered (some) files in this way.

        SharePoint is not a filesystem (yes, can look like one through explorer view, no it still isn't one) - often it is really really hard to get it through to some people who insist it looks like a filesystem and therefore they should be able to store dvd rips and database files inside it [sigh]. It's not a filesystem, doesn't work like one and never will.

        You want a filesystem, buy a filesystem, you want SharePoint online/cloud, buy SharePoint online. Be aware of the differences and which one you have purchased / which one the article is about.

        • (Score: 1) by quacking duck on Thursday April 24 2014, @02:24AM

          by quacking duck (1395) on Thursday April 24 2014, @02:24AM (#35321)

          > Disagree that they should get a pass on this. Files are NOT email. This would be analogous to servers adding tags or other metadata onto file attachments *in* that email.

          Like, say, whether that attachment was virus checked at a server ?

          Email antivirus modifying actually harmful files or false positives I get, but they actually modify files that *pass* the virus check too?

    • (Score: 1) by darkfeline on Wednesday April 23 2014, @03:42PM

      by darkfeline (1030) on Wednesday April 23 2014, @03:42PM (#34952) Homepage

      While I see your point, "it's metadata, not content" isn't really a compelling argument, wouldn't you say? Just ask the NSA.

      Also, XML metadata is not the same as, say, POSIX file metadata (mod times and permission bits). If a server replaced the reply header in my email with their own proxy server "for my convenience", would I be mad? Yes, very.

      --
      Join the SDF Public Access UNIX System today!
  • (Score: 3, Insightful) by Anonymous Coward on Wednesday April 23 2014, @11:52AM

    by Anonymous Coward on Wednesday April 23 2014, @11:52AM (#34805)

    There is no such thing if it resides on somebody else's server.

  • (Score: 2) by Subsentient on Wednesday April 23 2014, @02:23PM

    by Subsentient (1111) on Wednesday April 23 2014, @02:23PM (#34896) Homepage Journal

    Gotta love that cloud!

    --
    "It is no measure of health to be well adjusted to a profoundly sick society." -Jiddu Krishnamurti
  • (Score: 0) by Anonymous Coward on Wednesday April 23 2014, @03:00PM

    by Anonymous Coward on Wednesday April 23 2014, @03:00PM (#34923)

    require at least some evidence. This is a big farking deal, IF it's true. (Many of the comments here seem to be taking it at face value.)

    Has no one else with a "corporate" version of OneDrive managed to reproduce this issue?

    'Cause otherwise, despite Cluley's reputation, this is just an alarming anecdote that someone reported to him and he's passing on.

    Surely there's someone who can confirm or deny this behavior?

    • (Score: 5, Insightful) by choose another one on Wednesday April 23 2014, @03:05PM

      by choose another one (515) Subscriber Badge on Wednesday April 23 2014, @03:05PM (#34929)

      I can confirm as a long time SharePoint user and implementation consultant / architect, that SkyDrive Pro / OneDrive Business is SharePoint, and always has been. I can also confirm that SharePoint does metadata injection back into retrieved documents, and has done for at least a decade, and that it is documented, e.g. http://msdn.microsoft.com/en-us/library/ms550037(v =office.12).aspx [microsoft.com]

      That good enough ?

      • (Score: 0) by Anonymous Coward on Wednesday April 23 2014, @03:14PM

        by Anonymous Coward on Wednesday April 23 2014, @03:14PM (#34935)

        Thank you! So then the problem is really in the article, which by my count uses the word "sync" approximately 937,402 times.

        Document management is indeed very different from file synchronization, and it's a shame Cluley has managed to help us confuse the two.

        I think to me and many others, once he started using the word "sync" (and using MD5s no less) we did imagine file level synchronization a la Dropbox.

        On the other hand, it would be good for MS to distinguish these things in good faith. I wonder if Google Drive does anything similar to files that are uploaded and downloaded.

    • (Score: 2) by Reziac on Thursday April 24 2014, @03:36AM

      by Reziac (2489) on Thursday April 24 2014, @03:36AM (#35344) Homepage
      --
      And there is no Alkibiades to come back and save us from ourselves.