Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Wednesday March 04 2015, @07:27PM   Printer-friendly
from the over-to-you dept.

What free software is there in the way of organizing lots of documents?

To be more precise, the ones I *need* to organize are the files on hard drives, though if I could include documents I have elsewhere (bookshelves and photocopy files) I wouldn't mind. They are text documents in a variety of file formats and languages, source code for current and obsolete systems, jpeg images, film clips, drawings, SVG files, files, object code, shared libraries, fragments of drafts of books, ragged software documentation, works in progress ...

Of course the files are already semi-organized in directories, but I haven't yet managed to find a suitable collection of directory names. Hierarchical classification isn't ideal -- there are files that fit in several categories, and there are a lot files that have to be in a particular location because of the way they are used (executables in a bin directory, for example) or the way they are updated or maintained. Taxonomists would advise setting up a controlled vocabulary of tags and attaching tags to the various files. I'd end up with a triples store or some other database describing files.

More down the page...

But how to identify the files being tagged? A file-system pathname isn't enough. Files get moved, and sometimes entire directory trees full of files get moved from one place to another for various pragmatic reasons. And a hashcode isn't enough. Files get edited, upgraded, recompiled, reformatted, converted from JIS code to UTF-8, and so forth. Images get cropped and colour-corrected. And under these changes they should keep their assigned classification tags.

Now a number of file formats can accommodate metadata. And some software that manipulates files can preserve metadata and even allow user editing of the metadata. But more doesn't.

Much of it could perhaps be done by automatic content analysis. Other material may require labour-intensive manual classification. Now I don't expect to see any off-the-shelf solution for all of this, but does anyone have ideas as to how to accomplish even some of this? Even poorly? Does anyone know of relevant practical tools? Or have ideas towards tools that *should* exist but currently don't? I'm ready to experiment.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by frojack on Thursday March 05 2015, @04:50AM

    by frojack (1554) on Thursday March 05 2015, @04:50AM (#153393) Journal

    Probably not what the OP needs, since he is on linux.

    Often you don't have options of appending crap to the name. For instance, I have to index tons of source code. You don't get to change those names.
    Legal documents are another thing you really can't mess with.

    And a file name only search means you pretty much have to know the file name. Which is not likely going to be the case once you get beyond a few hundred thousand files.

    --
    No, you are mistaken. I've always had this sig.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by Immerman on Thursday March 05 2015, @05:13AM

    by Immerman (3985) on Thursday March 05 2015, @05:13AM (#153406)

    I put the tweaks needed for WINE there for a reason - I mostly run Linux, and have pretty much given up finding a comparable native tool, and it's far to useful to abandon.

    As for file name limitations - yes, source code is a challenge, though I can't say I've seen any content-indexing systems that are worth much for source either, but fortunately it does tend to lend itself to hierarchical organization along module lines. For most everything else, I've begun abandoning short, condensed names in favor of actual descriptive ones that don't need to be memorized: "Trigonometry and Geometry quick-reference sheet v14.9.svg" will almost certainly be in the top ten results when I type "geo ref tri", and if not I'll just keep typing more requirements.

    One of the absolutely essential components of Everything is that I don't have to initiate a search - it shows me the matching results as fast as I type, starting with a list of all 200,000+ files on my computer when first launched. Even the fastest results after some "initiate search" trigger (hitting [Enter] or whatever) offer a qualitatively inferior experience.

  • (Score: 2) by hendrikboom on Thursday March 05 2015, @07:25PM

    by hendrikboom (1125) Subscriber Badge on Thursday March 05 2015, @07:25PM (#153606) Homepage Journal

    Even though Everything is Windows-only, it's still worth discussing, because its salient features could perhaps be implemented in a Linux program. The OP *is* interested in tools that should exist but don't, after all.

    -- the OP

    • (Score: 2) by frojack on Thursday March 05 2015, @10:48PM

      by frojack (1554) on Thursday March 05 2015, @10:48PM (#153671) Journal

      The OP *is* interested in tools that should exist but don't, after all.

      The tools do exist. the OP just doesn't know what/where they are.

      --
      No, you are mistaken. I've always had this sig.