Programming thread -

ConcernedAnon

Concerned and hopefully anonymous
kiwifarms.net
So I was watching this video about assembly code and I encountered something that muddied the waters for me.
He mentions that C++, C# are for virtual machines, exes, and drivers, but he makes no mention of operating systems. Isn't Windows written in those languages? I thought a virtual machine was supposed to be an OS loaded entirely on random access memory as a security feature or for running backwards compatibility on old software, is he saying that operating systems and virtual machines are the same thing?
EXEs and drivers typically contain machine code, machine code being in a sense the language that machines understand, but they require Windows/Linux/an OS to actually get them running. I don't know why exactly why he's grouping this with JITed virtual machines like C#, but whatever. Virtual machine generally means "a program that runs other programs" so I guess you could compare that aspect of an OS to something like the CLI (the virtual machine that C# code runs on).

Here's the deal essentially, you have a language called machine code which computer processors can actually understand, but unfortunately it's literally just a big pile of numbers and is basically unreadable to humans. Heaven forbid you ever have to write some machine code by hand.
So then you have assembly, which is sort of like a human-readable version of machine code, e.g. it's the pinyin to machine code's gook runes. Programs called assemblers turn the assembly into machine code. This is a little better, but it's still not a good way of visualizing or tackling most programming tasks
To assist with that you have languages like C++, which are translated to assembly by compilers. An exe or driver would typically be written in C/C++/assembly or some other language that translates to machine code, because these applications typically require performance and access to machine code features.
C++ and languages like it have a lot of gotchas however, so then you have virtual machines, which are programs written in those same types of "low level" languages typically (C etc.). Virtual machines act as translators between more convenient languages like Python, and machine code.
Finally you get your languages like Python and JavaScript which are read and executed by some virtual machine. They have the advantage of being "abstracted" from machine code which means they can operate in more intuitive ways.

The low level languages are useful when you need speed or fine control, and the high level languages are useful for general tasks.


I'm at the point where i see no reason to bother with C, C++ or anything else that isn't javascript. I'm pretty sure everything will switch to JS in the future.
Consider this though, the js runtime will have to be written in C/C++/Rust/whatever. Unless you are suggesting the diabolical; we start emulating javascript using javascript :cryblood:
 
Last edited:

MarvinTheParanoidAndroid

This will all end in tears, I just know it.
True & Honest Fan
kiwifarms.net
C++ and languages like it have a lot of gotchas however, so then you have virtual machines, which are programs written in those same types of "low level" languages typically (C etc.). Virtual machines act as translators between more convenient languages like Python, and machine code.
So you're saying that virtual machines are basically like a different kind of assembler or compiler?

What exactly is a virtual machine and how does it fit into the programming hierarchy & computer architecture?
 

Cryonic Haunted Bullets

Niemals schlafen! Alles Lügen!
kiwifarms.net
So you're saying that virtual machines are basically like a different kind of assembler or compiler?

What exactly is a virtual machine and how does it fit into the programming hierarchy & computer architecture?
Interpreted languages like Python compile to special bytecode. This is not machine code - it can only run on the language's VM. The VM provides a fake "CPU" and memory model for the bytecode to operate on, divorced from the actual hardware implementation and semantics. In this way you can create a higher-level "base" for the language to work on.

Let me give you an example - a simple function implemented in C and in Python:

C:
int add(int a, int b) {
    return a + b;
}

int main(int argc, char* argv) {
    add(1, 2);
}
Python:
def add(a, b):
    return a + b

def main():
    add(1, 2)
We can obtain the C machine code by compiling the source and running objdump on the executable. Here's the assembly for the relevant functions:

Code:
add:
  push rbp
  mov rbp, rsp
  mov DWORD PTR [rbp-4], edi
  mov DWORD PTR [rbp-8], esi
  mov edx, DWORD PTR [rbp-4]
  mov eax, DWORD PTR [rbp-8]
  add eax, edx
  pop rbp
  ret

main:
  ; unrelated stuff
  mov esi, 2
  mov edi, 1
  call add
  ; yet more unrelated stuff
This is as bare-bones as assembly gets.

First it pushes the initial value of the stack base pointer (rbp) onto the top of the stack. Then it moves the current stack pointer (rsp) into the base pointer. This is called the function prologue. Think of function calls (and their stack variables) as levels of a skyscraper: rbp is the floor, and rsp is the ceiling. The prologue essentially creates a new level in the call skyscraper by setting the "floor" of this call to the "ceiling" of the last one. This allows the rest of the function to ignore the stack frames above and below it.

The function then moves edi and esi (the registers containing the arguments) onto the stack, directly "above" the floor ("above" means a negative offset - assembly likes to fuck with people like that). The arguments are then moved back into registers, this time edx (no significance) and eax (called the accumulator, often used to hold return values). I have no idea why the arguments are moved from registers to the stack and back again, but compilers do lots of really weird things to make assembly code go brrr, and this is probably one of those things. The function (finally) adds eax to edx. Then it pops the value of the base pointer that was pushed in the prologue, and returns, jumping back to where it was called. You can see how the main function moves the constants 1 and 2 into the argument registers, then calls the add function.

If you don't get it, that's OK. I'm trying to make the point that machine code has to deal with a lot of low-level things that are irrelevant to the computation being done, like micro-managing registers and the memory layout.

Now let's look at the Python bytecode. We can do this with the dis module:

Python:
>>> import dis
>>> dis.dis(add)
  2           0 LOAD_FAST                0 (a)
              2 LOAD_FAST                1 (b)
              4 BINARY_ADD
              6 RETURN_VALUE

>>> dis.dis(main)
  2           0 LOAD_GLOBAL              0 (add)
              2 LOAD_CONST               1 (1)
              4 LOAD_CONST               2 (2)
              6 CALL_FUNCTION            2
              8 POP_TOP
             10 LOAD_CONST               0 (None)
             12 RETURN_VALUE
Coming from assembly, this is totally foreign. There are no registers, only a stack. Let's take a look at the add function:

It calls the LOAD_FAST instruction twice with variables a and b. This pushes them from the local namespace onto the stack. Then it calls BINARY_ADD, which pops the top two values off the stack and pushes their sum. Then it returns. The return value is implicit: the main function puts the arguments on the stack for the function to use, and the return value is whatever's left on the stack when the function's had its way. Notice how, since there are no registers, everything centers around the stack. This makes bytecode programming and code generation easier, but at the cost of performance (using registers is orders of magnitude faster than getting all your data from main memory).

There's no actual computer that works like this. Hardware with that level of complexity would be impossible to produce at a reasonable price, and useless for most purposes. What you can do is write a program in a low-level language (like this one) that pretends to be a computer that works with this memory model and instruction set. This way, you can leave the nitty-gritty stuff to the low-level implementation, and your language's compiler only has to worry about producing good bytecode.

There's also a mixed approach called a JIT compiler, which compiles the most heavily used parts of a high-level program into machine code. PyPy is the most well-known one for Python. JIT compilers usually only work with a subset of the target language, and tend to open up gnarly security issues.
 
Last edited:

ConcernedAnon

Concerned and hopefully anonymous
kiwifarms.net
So you're saying that virtual machines are basically like a different kind of assembler or compiler?

What exactly is a virtual machine and how does it fit into the programming hierarchy & computer architecture?
A virtual machine could be thought of as a program that facilitates the running of other programs, and typically they have to do some translation in the process. Compilers only translate between one language and another, and assemblers are just a specific type of compiler that translate between assembly and machine language.


As an example, a videogame might be written in C# because of C#'s relative ease of use. The C# code is translated into a language called IL (Intermediary Language) by a C# compiler. IL is not the same as machine language, and so a computer can't directly run the compiled code, but the CLR (Common Language Runtime) virtual machine can.
If you've ever installed a game and seen something about "installing .Net Framework Redistributable" that's a version of the CLR being installed. The CLR is just a regular program written in machine language, so your computer knows how to run it without any help. When your player goes to run the game, the CLR program is run and begins translating the game for their computer, and running the translated copy as it goes.

To simplify, the C# compiler translates C# code into an IL program, then later the CLR program runs the IL program. This may all seem a little roundabout, but it has some advantages, namely that dialects of machine language vary between computers; but IL does not. This means that you can make one version of your game that can run on every player's computer. If you wrote your game in C++ it would be compiled to a single dialect of machine language, and thus you would need to make different versions of your game for each type of computer.
One of the primary things here is cross-compatibility, if a new OS or processor was released and a program equivalent to the CLR was written for it, it would then be possible to run any existing C# program on that computer. Think about the recent demand for mainframe programmers; commonly those mainframes are actually emulated —something like dosbox— but the programs running on them are written in machine code or assembly, and so they can't be easily ported to new hardware, thus the old hardware is ported to new hardware as a program instead. With a language like C# you don't have to worry about the specifics of the machine running the program as much; because those details are handled by the CLR virtual machine, and so you get some future proofing and consistency.

An interesting example of this that blurs a bunch of lines is JavaScript to Dos emulation where JavaScript is used to interpret dos programs. In this case JavaScript (which runs in it's own virtual machine) is being used to interpret old machine language programs that were built for Dos computers. It's a virtual machine in a virtual machine. Roughly speaking it is translating from old machine language -> JavaScript -> modern machine language. This is kind of inefficient, but modern computers are so much faster than dos that it's workable.
As another even more absurd example, here is an atari emulator written in minecraft so roughly atari machine language -> minecraft script -> Java -> modern machine language :story:

Roughly, a compiler just translates from one language to another, while a virtual machine interprets non-machine language for machines and actually facilitates the running of programs in those languages. The reason virtual machines are used is the consistency they provide. Machine language is the uneven ground you first build upon, and virtual machines act as a foundation to stabilize that ground so that you might build higher.
 

Splendid

Ignore mods. Report and negrate their posts.
True & Honest Fan
Retired Staff
kiwifarms.net
JIT compilers usually only work with a subset of the target language, and tend to open up gnarly security issues.
Java and C# are both normally JIT compiled and they're both considered quite secure platforms. In fact, in terms of how buggy actual production systems are, they're probably safer than languages that don't have automatic memory management.
As an example, a videogame might be written in C# because of C#'s relative ease of use.
Games tend to be written in C++ because you pay a heavy performance penalty for C#'s nice features. It doesn't matter for normal programs, but it does for games.
Unity is a C# engine though.
 

ConcernedAnon

Concerned and hopefully anonymous
kiwifarms.net
Games tend to be written in C++ because you pay a heavy performance penalty for C#'s nice features. It doesn't matter for normal programs, but it does for games.
Unity is a C# engine though.
I was considering mentioning that some engine components might be built with C++, but I didn't want to complicate things. Either way, that's arguable. The major cost that interactive applications have trouble with is the GC stalls. In normal operation C# has performance similar to C++ with heavy use of virtual functions, and less intensive optimization. I'd agree you should generally write low level engine code like rendering and physics in C++, but you really shouldn't do scripting in C++, or even just general game code. C++ code is too cumbersome to change for it to be practical for general game code.
On top of this is the fact that GC troubles can be largely eliminated. Take for example a composition based system where game entities are made up of components; Naively you would just store all the components as objects in a multi-dictionary of sorts, however with a little reflection you can fix this. The basic tactic is to pool all your objects by type, and then associate entities with the pooled objects by some kind of index into the pool.

C#:
/* Crude approximation of such a system
* Some details excluded, done from memory so may not be entirely correct.
*/

/// <summary>A numeric id representing an entity</summary>
public struct EntityId
{
    /* Conversions to/from long and equality operators not included
     * ...
     */
  
    private long value_;
}

/// <summary>Helps with type safe conversions in the component system</summary>
internal static class ComponentSystemHelper
{
    /// <summary>A function which creates ref TOut from ref TIn</summary>
    /// <remarks>All instances created by ComponentSystemHelper will have TIn == TOut</remarks>
    public delegate ref TOut ReferenceFunction<TIn, TOut>(ref TIn input);
    /// <summary>A function which converts TIn to TOut</summary>
    public delegate TOut ConvertFunction<TIn, TOut>(in TIn input);

    /// <summary>Trivially returns the input</summary>
    private static ref T Reference<T>(ref T val)
    {
        return ref val;
    }

    /// <summary>Cast to a subclass or interface</summary>
    private static TOut ConvertUp<TIn, TOut>(in TIn input)
        where TIn : TOut
    {
        return (TOut)input;
    }

    /// <summary>Implementation of conversions for arbitrary TIn/TOut</summary>
    static class Impl<TIn, TOut>
    {
        // Filled out by the initializer
        public static readonly ReferenceFunction<TIn, TOut> ref_;
        public static readonly bool canRef_;

        public static readonly ConvertFunction<TIn, TOut> convert_;
        public static readonly bool canConvert_;

        // Specializes Convert for this case
        private static ConvertFunction<TIn, TOut> SpecializeConvert()
        {
            return (ConvertFunction<TIn, TOut>)typeofComponentSystemHelper).GetMethod(nameof(ConvertUp))
                .MakeGenericMethod(typeof(TIn), typeof(TOut)).CreateDelegate(typeof(ConvertFunction<TIn, TOut>));
        }
      
        // Specializes Reference for this case
        private static ReferenceFunction<TIn, TOut> SpecializeReference()
        {
            return (ReferenceFunction<TIn, TOut>)typeof(ComponentSystemHelper).GetMethod(nameof(Reference))
                .MakeGenericMethod(typeof(TIn)).CreateDelegate(typeof(ReferenceFunction<TIn, TOut>));
        }

        static Impl()
        {
            // Default initialize in case something goes horribly wrong
            (ref_, canRef_, convert_, canConvert_) = (null, false, null, false);

            Type inType = typeof(TIn), outType = typeof(TOut);

            if (inType.Equals(outType)) // Same types, reference is possible
            {
                (ref_, canRef_) = (SpecializeReference(), true);

                (convert_, canConvert_) = (SpecializeConvert(), true);
            }
            else if (outType.IsAssignableFrom(inType)) // InType can be turned to out type
            {
                (convert_, canConvert_) = (SpecializeConvert(), true); // Only convert is possible
            }
        }
    }


    public static bool CanConvert<TIn, TOut>() // Ideally should be a field, but that would require exposing impl
    {
        return Impl<TIn, TOut>.canConvert_;
    }

    public static bool CanReference<TIn, TOut>()
    {
        return Impl<TIn, TOut>.canRef_;
    }

    public static TOut Convert<TIn, TOut>(in TIn input)
    {
        return Impl<TIn, TOut>.convert_(in input);
    }

    public static ref TOut Reference<TIn, TOut>(ref TIn input)
    {
        return ref Impl<TIn, TOut>.ref_(ref input);
    }
}


public class ComponentSystem
{
    private class Pool
    {
        public System.Type type => type_;
      
        public abstract bool CanGet<T>();
        public abstract bool CanReference<T>();
        public abstract bool CanSet<T>();
      
        public abstract T Get<T>(int index);
        public abstract ref T Reference<T>(int index);
        public abstract void Set<T>(int index, in T value);
      
        private System.Type type_;
    }
  
    private class Pool<T>
    {
        public sealed override bool CanGet<TA>() => ComponentSystemHelper.CanConvert<T, TA>();
        public sealed override bool CanReference<TA>() => ComponentSystemHelper.CanReference<T, TA>();
        public sealed override bool CanSet<TA>() => ComponentSystemHelper.CanConvert<TA, T>();
      
        public sealed override TA Get<TA>(int index)
        {
            if (!CanGet<TA>())
                throw new InvalidOperationException();
          
            ValidateBounds(index);
          
            return ComponentSystemHelper.Convert<T, TA>(in components_[index]);
        }
      
        public sealed override ref TA Reference<TA>(int index)
        // In order to prevent stale references from being used extra steps would need to be taken
        // One simple solution is to simply separate the pool into fixed sized segments, and just add more when the pool is resized
        {
            if (!CanReference<TA>())
                throw new InvalidOperationException();
          
            ValidateBounds(index);
          
            return ref ComponentSystemHelper.Reference<T, TA>(ref components_[index]);
        }
      
        public sealed override void Set<TA>(int index, in TA value)
        {
            if (!CanSet<TA>())
                throw new InvalidOperationException();
          
            ValidateBounds(index);
          
            components_[index] = ComponentSystemHelper.Convert<TA, T>(in value);
        }
      
      
        private void ValidateBounds(int index)
        {
            if (index < 0 || index >= count_)
                throw new IndexOutOfRangeException();
        }
      
        T[] components_;
        int count_;
    }
  
    /// <summary>Specifies a component in a pool</summary>
    private struct ComponentBinding
    {
        int poolIndex {get;}
        ind index {get;}
      
        /* A pair that specifies a pool and an item in the pool
         * Boilerplate not included
         * ...
         */
    }
  
    /// <summary>Specifies the components of an entity</summary>
    private class ComponentSet
    {
        /* A list type with pooled storage that contains Bindings
         * ...
         */
    }
  
  
    public bool TryGetComponent<T>(EntityId entity, out T value)
    {
        if (components_.TryGet(entity, out var cSet))
        {
            var bindings = cSet.bindings; // Some kind of array-like type
            for (int i = 0; i < bindings.length; i++)
            {
                var b = bindings[i];
              
                var p = pools_[b.poolIndex];
              
                if (p.canGet<T>())
                {
                    value = p.Get<T>(b.index);
                    return true;
                }
            }
        }
        value = default;
        return false;
    }
  
    public bool TryUpdateComponent<T>(EntityId entity, in T value)
    {
        if (components_.TryGet(entity, out var cSet))
        {
            var bindings = cSet.bindings; // Some kind of array-like type
            for (int i = 0; i < bindings.length; i++)
            {
                var b = bindings[i];
              
                var p = pools_[b.poolIndex];
              
                if (p.canSet<T>())
                {
                    p.Set<T>(b.index, in value);
                    return true;
                }
            }
        }
        return false;
    }
  
    public ref T ReferenceComponent<T>(EntityId entity)
    {
        if (components_.TryGet(entity, out var cSet))
        {
            var bindings = cSet.bindings; // Some kind of array-like type
            for (int i = 0; i < bindings.length; i++)
            {
                var b = bindings[i];
              
                var p = pools_[b.poolIndex];
              
                if (p.canReference<T>())
                {
                    return ref p.Reference<T>(b.index);
                }
            }
        }
        throw new InvalidOperationException("No such component");
    }
  
    /* Allocation method for pools and components not included
     * ...
     */

  
    // A custom container type could significantly optimize this, first by allowing value type ComponentSet
    // Second by allowing the use of the EntityIds for ordering, and thus allowing easy traversal of the set.
    // No matter what though, some portion of ComponentSet would have to be pooled due to the need to store an arbitrary number of components.
    Dictionary<EntityId, ComponentSet> components_ = new Dictionary<EntityId, ComponentSet>();
    List<Pool> pools_ = new List<Pool>();
}
Not a perfect implementation, but I think it gets the idea across. Major improvements include more reflection to simplify the call chains for conversions, and eliminate virtual calls, better type intuition through the caching of relationships, sorting for components so that entities can by accessed by component, and the list could go on.
If you want an extreme example look at Unity's ECS, which implements it's own memory management for components, and achieves C++ level performance if used correctly. The problem with ECS (and the reason I'm making my own with blackjack and hookers) is that it's custom memory management scheme prevents it from storing ref types, so like uh... 95% of the .Net library. Kind of a steep price lol.
 
Last edited:

MarvinTheParanoidAndroid

This will all end in tears, I just know it.
True & Honest Fan
kiwifarms.net
The syntax objection I have with Haskell is that it's got significant whitespace. I hate Python for that and I hate Haskell for it too. (Still neat and useful languages, it just annoys the shit out of me.)
Doesn't Fortran also have a lot of white noise garbage in it too?
 
  • Thunk-Provoking
Reactions: Marvin

Yotsubaaa

Discord is the best!
True & Honest Fan
kiwifarms.net
Doesn't Fortran also have a lot of white noise garbage in it too?
Not so much these days. It used to be the case that position and stuff mattered a great deal in Fortran code, e.g. especially in Fortran77, back in the 70s when every single line of Fortran code had to be literally punched into a punch card (behold!):
fortran.jpg

(God, can you imagine programming like that? Fuck programming like that.)

Since Fortran90 (in the 90s), we've had free-form syntax for Fortran code, and so Fortran looks and works much the same way as other programming languages in the way it handles whitespace and everything. Here, check it out: here's part of a style guide for Fortran90 code on some project. Most of the conventions in this guide are pretty typical of most modern programming languages:
fortranstyle.png
 

AnOminous

Really?
True & Honest Fan
Retired Staff
kiwifarms.net
Not so much these days. It used to be the case that position and stuff mattered a great deal in Fortran code, e.g. especially in Fortran77, back in the 70s when every single line of Fortran code had to be literally punched into a punch card (behold!):
View attachment 1351979
(God, can you imagine programming like that? Fuck programming like that.)
I believe the character set was also EBCDIC which is as unintelligible as Linear A to modern humans. You couldn't use certain characters in certain positions or they'd cause the rest of the line to be interpreted differently. Absolute madness.
 

RandomTwitterGuy

kiwifarms.net
I'm programming in C now
Fuck programming in C.
For a project i had to program a TCP/IP connection in C++ it was kind of a pain to get to work and it forced us to use a bunch of C style bullshit, because "You need to understand it with out using outside libraries".

If you want to see true madness look into PLC programming and the 6 languages they support. Especially ladder.
It is the bain of my existence as i can write in ST or structured test. Think of it weird C++ with a few things you have to keep in mind, but nothing scary. Then some jerk-wad comes to you with a fucking diagram you have to try and write into a useful program.
 

Spamy the Bot

Notorious Moon
kiwifarms.net
I always wanted to learned to do program and art. I will speak of the programming here.
As an outsider I totally have zero clue, even after a few years of lurking where to really start.
 

Least Concern

Pretend I have a waifu avatar like everyone else
kiwifarms.net
I always wanted to learned to do program and art. I will speak of the programming here.
As an outsider I totally have zero clue, even after a few years of lurking where to really start.
What's your goal? What sort of things do you want to eventually be able to write?

If web applications, learn PHP.
If games or Linux software, learn C++.
If Windows software, learn C#.
If Mac or iOS software, learn Swift.

All of these languages will have plenty of introductory-level books to help you get started, and we can point you in the right direction. We need to know where you want to go first.
 

Spamy the Bot

Notorious Moon
kiwifarms.net
What's your goal? What sort of things do you want to eventually be able to write?

If web applications, learn PHP.
If games or Linux software, learn C++.
If Windows software, learn C#.
If Mac or iOS software, learn Swift.

All of these languages will have plenty of introductory-level books to help you get started, and we can point you in the right direction. We need to know where you want to go first.
I think the C# and C++ angle looks most interesting from the bunch.
 

SIGSEGV

Segmentation fault (core dumped)
True & Honest Fan
kiwifarms.net
I think the C# and C++ angle looks most interesting from the bunch.
Those languages are very different from one another. If you want to start with an easy language, then go with C#. If you want something more difficult and powerful, then try C++.
 
  • Informative
Reactions: Spamy the Bot

Spamy the Bot

Notorious Moon
kiwifarms.net
Those languages are very different from one another. If you want to start with an easy language, then go with C#. If you want something more difficult and powerful, then try C++.
Yeah. I have heard C++ is like black magic, that nobody really knows how to use properly.
C sharp seems reasonable. I guess some of the programming principles should carry over to other languages. I am not so sure about that of course.
 

SIGSEGV

Segmentation fault (core dumped)
True & Honest Fan
kiwifarms.net
Yeah. I have heard C++ is like black magic, that nobody really knows how to use properly.
You've heard correctly. I've been using it for several years now and feel like I've barely scratched the surface.
C sharp seems reasonable. I guess some of the programming principles should carry over to other languages. I am not so sure about that of course.
Things absolutely do carry over. I had to use C# for one semester in college and it was a pretty nice language.
 
  • Informative
Reactions: Spamy the Bot

Spamy the Bot

Notorious Moon
kiwifarms.net
You've heard correctly. I've been using it for several years now and feel like I've barely scratched the surface.

Things absolutely do carry over. I had to use C# for one semester in college and it was a pretty nice language.
What do you think, Is the self taught programmer a meme or it can be done?
 
Tags
None