Stories
Slash Boxes
Comments

SoylentNews is people

posted by LaminatorX on Sunday March 16 2014, @03:28AM   Printer-friendly
from the premature-optimization-is-the-root-of-all-evil dept.

Subsentient writes:

"I've been writing C for quite some time, but I never followed good conventions I'm afraid, and I never payed much attention to the optimization tricks of the higher C programmers. Sure, I use const when I can, I use the pointer methods for manual string copying, I even use register for all the good that does with modern compilers, but now, I'm trying to write a C-string handling library for personal use, but I need speed, and I really don't want to use inline ASM. So, I am wondering, what would other Soylenters do to write efficient, pure, standards-compliant C?"

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Interesting) by ls671 on Sunday March 16 2014, @03:42AM

    by ls671 (891) on Sunday March 16 2014, @03:42AM (#17061) Homepage

    Look at the java implementation of Strings. It is pretty interesting. Strings are final and are reused. So if you have 1000 instances of the String "yes" in your program at once, only one area of the memory will contain "yes".

    There is other nifty tricks as well in the string copy operations.

    Get the source code.

    --
    Everything I write is lies, including this sentence.
    Starting Score:    1  point
    Moderation   +4  
       Interesting=3, Informative=1, Total=4
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: -1, Troll) by Ethanol-fueled on Sunday March 16 2014, @03:56AM

    by Ethanol-fueled (2792) on Sunday March 16 2014, @03:56AM (#17069) Homepage

    We call those "const" in my hood, nigga.

    • (Score: 3, Informative) by ls671 on Sunday March 16 2014, @04:33AM

      by ls671 (891) on Sunday March 16 2014, @04:33AM (#17075) Homepage

      "const" don't have the copy on write feature. You can't even write them. So sorry, we are not talking about the same thing..

      --
      Everything I write is lies, including this sentence.
    • (Score: 1) by sjames on Sunday March 16 2014, @06:32AM

      by sjames (2882) on Sunday March 16 2014, @06:32AM (#17106) Journal

      Not the same thing at all. Const can't be assigned at all (other than the initial assignment). Thi is entiorely different. There is a string pool containing one and only one copy of every string assigned anywhere. If you have char *a="hello " and char *b="there" and char *c = "hello there!", there will be 3 strings stored. If you concatenate a and b assigned to char *d, d will point to the same instance of "hello there" as c does (and it's reference count will be incremented).

      Unlike a const, you can then do d = "Never Mind".

      • (Score: 0) by Anonymous Coward on Sunday March 16 2014, @03:32PM

        by Anonymous Coward on Sunday March 16 2014, @03:32PM (#17191)

        You mean whenever I change a string in Java, the JVM runs through ALL my strings in memory to make sure I don't have a duplicate string already? and there's no opting out??

        Yikes, that sounds like a huge waste of CPU cycles. I'll happily trade memory to get that performance back, thanks.

  • (Score: 3, Informative) by zip on Sunday March 16 2014, @04:04AM

    by zip (702) on Sunday March 16 2014, @04:04AM (#17072)

    I'm not sure if this still applies with newer versions, but maybe it's unrelated: https://stackoverflow.com/questions/16123446/java- 7-string-substring-complexity [stackoverflow.com]

    But as far as I know, Qt strings still behave this way.

    • (Score: 2) by ls671 on Monday March 17 2014, @02:07AM

      by ls671 (891) on Monday March 17 2014, @02:07AM (#17346) Homepage

      Before:

      String a = "1234567890";
      a = a.substring(0,2); // a is now "12"

      1) No memory copy has occured.
      2) "1234567890" is still kept into memory with a reference count of 1

      Now (apparently according to the link you posted):

      String a = "1234567890";
      a = a.substring(0,2); // a is now "12"

      1) Memory copy has occured.
      2) "1234567890" is eligible for garbage collection while a new area of memory has been assigned to "12".

      IMHO, there is work to be done to combine the 2 approaches. Hint: Keep old behavior but revert back to new behavior in the background after a given TTL where "1234567890" hasn't been accessed, then transparently copy the substring "12" to a new memory area.

      I know it sounds very complicated but it is pretty simple compared to the many flavors of garbage collection floating around ;-)

      --
      Everything I write is lies, including this sentence.
  • (Score: 1) by germanbird on Sunday March 16 2014, @05:17AM

    by germanbird (2619) on Sunday March 16 2014, @05:17AM (#17090)

    As cool as the java implementation is, it may not be what the original poster is looking for. Sure it is very memory efficient, but optimizing for memory tends to reduce performance (as optimizing for performance often increases memory usage).

    Maintaining a "library" of final/immutable strings means that every time you create or modify a string, you will have to search the list of existing strings for an existing copy. Likewise any destroyed or modified strings may need to remove an entry from the list. At this point, you are implementing some sort of a reference counting mechanism and/or garbage collection. Interesting (well, at least interesting to certain code geeks) problems to solve and play with, but maybe not exactly what you are looking for.

    The big implication of all this is that anytime you modify a string, a new immutable string is (possibly) created. So think about appending text in a loop. For every iteration of that loop, a new immutable string is added (or found) in your library of strings. This can make certain text operations pretty expensive.

    I think all of this stuff is worth thinking through, but if you are just looking to write a library to help you with or abstract common string operations, this is probably more work than you need to do. Either way from your original post, it sounds like you care more about cpu performance than memory usage, which probably means this is the wrong direction for you.

    Either way, food for thought and a good mental exercise.

    • (Score: 3, Insightful) by jorl17 on Sunday March 16 2014, @05:19AM

      by jorl17 (3747) on Sunday March 16 2014, @05:19AM (#17091)

      It really depends on the context of application. If he wants to manipulate strings ever so often, then that seems like an overkill. If he's going to be crunching strings like beta crunches F-bombs, then it might be a good idea to go beyond that.

      I often say that Context Information is everything. And it really is.

    • (Score: 1) by TheGratefulNet on Sunday March 16 2014, @06:00AM

      by TheGratefulNet (659) on Sunday March 16 2014, @06:00AM (#17102)

      deduplication is great for disks, but if I have something in memory, I want it to be fast.

      it sounds 'elegant' for java to de-dupe strings but I'm not sure its going to be faster. I never pick elegant of speed, personally. elegant is for school. simple and obvious is much better for production code. and not having to search for instances of a string, just to de-dupe it, is not at all simple or fast.

      --
      "It is now safe to switch off your computer."
      • (Score: 2) by Aighearach on Monday March 17 2014, @09:10PM

        by Aighearach (2621) on Monday March 17 2014, @09:10PM (#17799)

        As a Rubyist I challenge the idea that elegant is at odds with simple. The simpler code is generally more elegant.
        It is the more clever code that uses the excuse of being faster, not the more elegant. Elegant code is faster where the algorithm it uses is simpler. That is the essence of an elegant improvement; the code gets simpler and typically faster.

    • (Score: 1) by sjames on Sunday March 16 2014, @07:20AM

      by sjames (2882) on Sunday March 16 2014, @07:20AM (#17113) Journal

      There are tradeoffs. Yes, you do extra work in some cases, but in others you do a lot less. For example, getting the length of the string becomes trivial rather than scanning the buffer (potentially wiping out your cache) Comparisons are, of coure, trivial. You do more accounting when you cat the strings, but you do less scanning and there's no chance of overrunning the buffer. It can also be a win if you have good optimization.

      Which is overall better and faster will strongly depend on what you're doing. I do suspect that on average final strings will be a bit slower but safer.

      • (Score: 2) by maxwell demon on Sunday March 16 2014, @12:15PM

        by maxwell demon (1608) on Sunday March 16 2014, @12:15PM (#17158) Journal

        Any sane string implementation (that is, not C's) will have a length field (or, alternatively, an additional pointer to end), so you don't have to scan the string to find out its length (an O(n) operation which, as you noted, will do bad things to your cache), but can just read the length (or calculate it in O(1) by simple pointer subtraction). This also means you'll not arbitrarily restrict the characters which can be stored in your string (C applications are generally easily identified by them being tripped off by a zero byte in positions supposed to contain text).

        --
        The Tao of math: The numbers you can count are not the real numbers.
  • (Score: 2) by Debvgger on Sunday March 16 2014, @08:30AM

    by Debvgger (545) on Sunday March 16 2014, @08:30AM (#17124)

    In C, if the strings are constant (I mean, if they are hard-coded) their memory positions are reused, too. This is one of the most basic optimizations C compilers do.

  • (Score: 1) by koja on Sunday March 16 2014, @08:02PM

    by koja (3832) on Sunday March 16 2014, @08:02PM (#17250)

    Doesn't need to go as far as java lies. std::string in C++ is much nearer and has the same COW optimization.
    https://en.wikipedia.org/wiki/Copy-on-write [wikipedia.org]