Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


turgid (4318)

turgid
(email not shown publicly)

Journal of turgid (4318)

The Fine Print: The following are owned by whoever posted them. We are not responsible for them in any way.
Sunday February 26, 23
10:45 PM
Code

What are the requirements? How can I implement one?

Have at ye!

Display Options Threshold/Breakthrough Reply to Article Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Interesting) by istartedi on Monday February 27, @02:47AM (6 children)

    by istartedi (123) on Monday February 27, @02:47AM (#1293496) Journal

    I've implemented a simple reference-counted language. It's fun. The answer to your question depends on what you already know about language development in general. Memory management is just one aspect of it. Terms to google: parser, lexer, "reference counting", "garbage collection", "mark and sweep", "incremental garbage collector", etc. This is just of the top of my head. There are many facets to language design, and many reasons to do it (or not do it).

    --
    Appended to the end of comments you post. Max: 120 chars.
    • (Score: 2) by DannyB on Monday February 27, @06:04PM (4 children)

      by DannyB (5839) Subscriber Badge on Monday February 27, @06:04PM (#1293551) Journal

      Once you've experienced garbage collection, and think about it, and its implementation, sooner or later the whole idea of reference counting should occur to you. It did to me in the early 90s. Until I realized that it cannot handle circular data structures. Object A points to Object B, which points to a series of objects eventually pointing back to A. You'll need a tracing GC to find that one.

      --
      How often should I have my memory checked? I used to know but...
      • (Score: 1, Insightful) by Anonymous Coward on Wednesday March 01, @05:47AM (3 children)

        by Anonymous Coward on Wednesday March 01, @05:47AM (#1293855)

        A tracing GC is probably the simplest way to handle cycles like that, but it isn't the only one. Non-naive implementations of reference counting can handle them just fine. One popular one have references of different strengths and having the data model such that only weak ones can be used in certain situations to prevent cycles. A different way is to just prevent cycles by keeping certain types of objects from referring to other types that could make circular references possible.

        Another is to do that implicitly by differentiate between non-collection and collection (objects that can refer to other object) types. For non-collection types, you use a standard RC implementation. However, when you have a collection type, you track strong and weak references separately and you only allow strong references to collection from another collection when instantiated. In some data models, that alone is enough to prevent cycles. In the more generalized implementation, you have the restriction above or when strong references no longer exist. In that case, you attempt to convert a weak reference to a strong reference. If you don't find a weak candidate to convert, then that object has no strong references and can be collected, and the entire cycle is collected in turn. There are a number of algorithms that can do that without having to trace from the root.

        There is also naive age of allocation, a subset of reference counting. When the oldest reference is dropped in a pure implementation, the object and all its children can be freed because the only remaining references are circular by definition. Or a combination of the standard naive reference counting and allocation aging. There you keep track of the number of references and the age of the oldest. When you remove a reference you decrement the counter. If your counter reaches zero, you can delete the object. If not, you use the age propagation algorithm to mark the collections as having an unknown age (which technically means running it twice because removing a reference is a mutation as well but there are ways to avoid that). This has the side effect of detecting cycles without other references, which you can then handle appropriately.

        There are other reference counting schemes that can do it as well. But as those examples make clear, you have to get a little bit more fancy than just naive reference counting in language models that allow strong reference cycles.

        • (Score: 3, Interesting) by DannyB on Wednesday March 01, @03:13PM (2 children)

          by DannyB (5839) Subscriber Badge on Wednesday March 01, @03:13PM (#1293894) Journal

          A different way is to just prevent cycles by keeping certain types of objects from referring to other types that could make circular references possible.

          That approach limits what the programmer can express in the language. Tracing GC imposes no such limitations upon the programmer.

          Optimize for programmer productivity. Optimize for dollars. After that, and at lower priority to those first two, optimize for cpu cycles and bytes. But cpu cycles and bytes are not the top priority for most purposes. Computers are cheap and getting cheaper. Programmers are expensive and getting more expensiver.

          --
          How often should I have my memory checked? I used to know but...
          • (Score: 0) by Anonymous Coward on Wednesday March 01, @07:44PM

            by Anonymous Coward on Wednesday March 01, @07:44PM (#1293935)

            Programmers are expensive and getting more expensiver.

            The trends in higher education suggest programmers are going the way of lawyers within the next few years. Everybody and their parents know that becoming a programmer will get you big money, so the number of CS majors is booming. The same happened around the dot-com crash, but crashed just after, so the notorious volatility of tech corps dampen the rate, but probably not enough.

          • (Score: 1, Interesting) by Anonymous Coward on Wednesday March 01, @11:15PM

            by Anonymous Coward on Wednesday March 01, @11:15PM (#1293972)

            You'd think so, but the reduced expressiveness isn't necessarily an issue for programmers. The language paradigm could be declarative to the point that the lack of imperative control isn't noticed. Or the natural relationships in the type hierarchy make such references unnecessary. Or their are immutability restrictions. Not that those are the only three choices. Unidirectional heaps are more common than most people think because of the power they provide. Besides if there were one perfect method for memory management, that method would be used everywhere.

    • (Score: 2) by DannyB on Monday February 27, @06:18PM

      by DannyB (5839) Subscriber Badge on Monday February 27, @06:18PM (#1293554) Journal

      Rather Lexer and then Parser. I got fascinated with these in my Common Lisp (CL) days. NFAs. The algorithm to convert NFA into a DFA state table. I implemented all that in CL. I used a CL reader macro so it was possible to write RegEx's directly into CL expressions, producing my particular RegEx data structure used by my implementation. This made it easy to write lexical analyzers. I extended the RegEx to have more flexibility than a mere regular language, but don't remember the precise details right now. (I do have the source code somewhere.)

      As I played with parsers, I really ended up favoring nice neato recursive descent parsers. It could handle anything I wanted to throw at it. Well, except maybe C++. But I wasn't going to implement a parser for that. I was more interested in parsing Pascal or Modula 3 or other languages.

      Going down the rabbit hole here, I then realized that this only gets you half a compiler. You can have one or more internal representations, and various transformations on those to do optimizations. But at some point you've got to dive down into the ugly details of the machine and generate machine code.

      I never went there. There were too many other fun things to do. But I did read up on the subject any time interesting articles came along. I was particularly fascinated with the whole SSA (Single Static Assignment) when that came along. I can't remember if it was an article or a video. But it was brilliant work.

      Naturally I think LLVM is a great tool or generating the back end of a compiler for multiple implementations. I appreciated that you can only write something to a register exactly one time. No changes allowed after that.

      Finally, I also notice that Chez Scheme is now fully open source. It compiles down to real machine code through a series of transforms that first convert to CPS (continuation passing style) then down to machine code. There definitely is one or more YouTube videos on this. The YT videos were uploaded presentations from some Clojure conference. (A few years ago, before covid but definitely after 2014) I always wanted to get this code and study it. But I seem to have less free time and less energy than decades ago.

      --
      How often should I have my memory checked? I used to know but...
  • (Score: 2, Insightful) by Anonymous Coward on Monday February 27, @03:55AM (2 children)

    by Anonymous Coward on Monday February 27, @03:55AM (#1293508)

    Memory safety is the requirement that any pointer used always points to valid memory. There are a number of way to guarantee that in a language. The best way to do it is to make the rules of the language and its' abstract machine not allow such bad pointers to exist. You can accomplish that by forbidding it in the data model or execution model. Once you do that, any implementation technique will be correct since the underlying behavior of the implementation is independent from that of the abstract machine. So pick any form (or combination) of garbage collection you like, root tracing, allocation aging, reference counting, smart pointers, escape analysis, thread ownership, the stack, etc. and any form of buffer protection you like, transparent dependent typing, index checking, runtime detection, static calculation, etc., and then go to town.

    The real trick, then, is to decide how you want to forbid it in the language model and abstract machine. But that is a matter of personal tastes and can depend heavily on the "feel" you are going for. Do you want scope-based lifetimes for everything? Do you want immutable variables? What are variables in your language? Do you want provable compile-time correctness? Does the abstract machine use call by value, reference, pointer, or a combination? Do you want to allow pointers at all? Where does your language's type system fall on the 4 major axis (strong vs weak, static vs dynamic, explicit vs implicit, structural vs nominal)? In many ways, these model/AM decisions are the most important because tacking on the requirement that all pointers are to valid memory is relatively easy after those and can be highly dependent on the them.

    • (Score: 2) by DannyB on Monday February 27, @06:38PM (1 child)

      by DannyB (5839) Subscriber Badge on Monday February 27, @06:38PM (#1293558) Journal

      After seeing how radically far both Red Hat and Oracle (independently of each other) have taken tracing GC in Java, I now consider that a "solved" problem. At least for the rest of my lifetime. I now know just what is actually possible. I am impressed and feel like I've arrived in some sci fi future.

      You can have a heap of many Terabytes (TB) with maximum GC pause times of 1 millisecond. This is the product of a couple decades of research.

      Red Hat's implementation is the Shenandoah Garbage Collector.

      Oracle's independent implementation is their ZGC (Zero Garbage Collector).

      Another very noteworthy mention is Azul Systems' 4C, or Continuously Concurrent Compacting Collector.

      Java has been an open platform for research. The Java JIT may be the most advanced compiler on the planet. Even though its "source code" is JVM bytecode. IBM opened this up with its Open J9 implementation of Java. Let the research projects begin. And they did.

      I sincerely wish that other GC languages and runtimes could get this. (Python, Go, etc., even Microsoft's .NET)

      Once you have this kind of GC, what more is there to wish for in a memory safe runtime system? Not just Java, but any language that compiles to JVM bytecode and runs on the JVM. Real tracing GC. No limitations of compile time escape analysis. No performance issues of reference counting. As I have pointed out GC that happens on other cpu cores actually speeds up the primary work load. The primary work load does not see one single cycle of memory management overhead. It can just malloc away. You do need more hardware, but this is cheap and getting cheaper. Considering the hardware dedicated to playing games, I don't exactly consider it extravagant or wasteful to have another core doing GC. Or many cores doing GC in a big system.


      Aside: Red Hat, Oracle, IBM, Microsoft and others that contribute to Java's Open JDK don't just do this out of the goodness of their heart. They think this is profitable for them and in their best business interests. I say that for the Java haters on SN who really don't know what they're talking about. Who have said before that I like GC because I really don't know how to manually manage memory, like back in the day with languages like Pascal and C.
      --
      How often should I have my memory checked? I used to know but...
      • (Score: 1, Informative) by Anonymous Coward on Monday February 27, @11:29PM

        by Anonymous Coward on Monday February 27, @11:29PM (#1293609)

        If you think that is neat, just wait for Java's implementation of green threads. They not only improve performance of your threads, but have the additional benefit of reducing your root set. This can make even more workloads pauseless, even without specifically engineering a pauseless process.

  • (Score: 0) by Anonymous Coward on Monday February 27, @08:35PM (1 child)

    by Anonymous Coward on Monday February 27, @08:35PM (#1293587)

    Figure out whether you're making an interpreted or compiled or interpreted and compiled language, and what the basic data structures and operations are, and how much work you want to do to make it work, and how much you care about being called names by programmers and academics. The solution will just fall out naturally at that point ;)

    • (Score: 2) by DannyB on Monday February 27, @10:47PM

      by DannyB (5839) Subscriber Badge on Monday February 27, @10:47PM (#1293604) Journal

      Hopefully, new languages are an effort to improve the current state of the art. In some way. Either to make existing types of programs easier to write requiring less human effort (saves dollars); or to make the most complex of problems easier to solve than with any current language.

      Or both.

      --
      How often should I have my memory checked? I used to know but...
(1)