Hello from a libc-free world! (Part 1)

Posted in computer architecture on March 16th, 2010 by Jessica McKellar104 Comments

As an exercise, I want to write a Hello World program in C simple enough that I can disassemble it and be able to explain all of the assembly to myself.

This should be easy, right?

This adventure assumes compilation and execution on a Linux machine. Some familiarity with reading assembly is helpful.

Here’s our basic Hello World program:

jesstess@kid-charlemagne:~/c$ cat hello.c
#include <stdio.h>

int main()
{
  printf("Hello World\n");
  return 0;
}

Let’s compile it and get a bytecount:

jesstess@kid-charlemagne:~/c$ gcc -o hello hello.c
jesstess@kid-charlemagne:~/c$ wc -c hello
10931 hello

Yikes! Where are 11 Kilobytes worth of executable coming from? objdump -t hello gives us 79 symbol-table entries, most of which we can blame on our using the standard library.

So let’s stop using it. We won’t use printf so we can get rid of our include file:

jesstess@kid-charlemagne:~/c$ cat hello.c
int main()
{
  char *str = "Hello World";
  return 0;
}

Recompiling and checking the bytecount:

jesstess@kid-charlemagne:~/c$ gcc -o hello hello.c
jesstess@kid-charlemagne:~/c$ wc -c hello
10892 hello

What? That barely changed anything!

The problem is that gcc is still using standard library startup files when linking. Want proof? We’ll compile with -nostdlib, which according to the gcc man page won’t “use the standard system libraries and startup files when linking. Only the files you specify will be passed to the linker”.

jesstess@kid-charlemagne:~/c$ gcc -nostdlib -o hello hello.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 00000000004000e8

Well, it’s just a warning; let’s check it anyway:

jesstess@kid-charlemagne:~/c$ wc -c hello
1329 hello

That looks pretty good! We got our bytecount down to a much more reasonable size (an order of magnitude smaller!)…

jesstess@kid-charlemagne:~/c$ ./hello
Segmentation fault

…at the expense of segfaulting when it runs. Hrmph.

For fun, let’s get our program to be actually runnable before digging into the assembly.

So what is this _start entry symbol that appears to be required for our program to run? Where is it usually defined if you’re using libc?

From the perspective of the linker, by default _start is the actual entry point to your program, not main. It is normally defined in the crt1.o ELF relocatable. We can verify this by linking against crt1.o and noting that _start is now found (although we develop other problems by not having defined other necessary libc startup symbols):

# Compile the source files but don't link
jesstess@kid-charlemagne:~/c$ gcc -Os -c hello.c
# Now try to link
jesstess@kid-charlemagne:~/c$ ld /usr/lib/crt1.o -o hello hello.o
/usr/lib/crt1.o: In function `_start':
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:106: undefined reference to `__libc_csu_fini'
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:107: undefined reference to `__libc_csu_init'
/build/buildd/glibc-2.9/csu/../sysdeps/x86_64/elf/start.S:113: undefined reference to `__libc_start_main'

This check conveniently also tells us where _start lives in the libc source: sysdeps/x86_64/elf/start.S for this particular machine. This delightfully well-commented file exports the _start symbol, sets up the stack and some registers, and calls __libc_start_main. If we look at the very bottom of csu/libc-start.c we see the call to our program’s main:

/* Nothing fancy, just call the function.  */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);

and down the rabbit hole we go.

So that’s what _start is all about. Conveniently, we can summarize what happens between _start and the call to main as “set up a bunch of stuff for libc and then call main”, and since we don’t care about libc, let’s just export our own _start symbol that just calls main and link against that:

jesstess@kid-charlemagne:~/c$ cat stubstart.S
.globl _start

_start:
	call main

Compiling and running with our stub _start assembly file:

jesstess@kid-charlemagne:~/c$ gcc -nostdlib stubstart.S -o hello hello.c
jesstess@kid-charlemagne:~/c$ ./hello
Segmentation fault

Hurrah, our compilation problems go away! However, we still segfault. Why? Let’s compile with debugging information and take a look in gdb. We’ll set a breakpoint at main and step through until the segfault:

jesstess@kid-charlemagne:~/c$ gcc -g -nostdlib stubstart.S -o hello hello.c
jesstess@kid-charlemagne:~/c$ gdb hello
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
(gdb) break main
Breakpoint 1 at 0x4000f4: file hello.c, line 3.
(gdb) run
Starting program: /home/jesstess/c/hello

Breakpoint 1, main () at hello.c:5
5	  char *str = "Hello World";
(gdb) step
6	  return 0;
(gdb) step
7	}
(gdb) step
0x00000000004000ed in _start ()
(gdb) step
Single stepping until exit from function _start,
which has no line number information.
main () at helloint.c:4
4	{
(gdb) step

Breakpoint 1, main () at helloint.c:5
5	  char *str = "Hello World";
(gdb) step
6	  return 0;
(gdb) step
7	}
(gdb) step

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000001 in ?? ()
(gdb)

Wait, what? Why are we running through main twice? …It’s time to look at the assembly:

jesstess@kid-charlemagne:~/c$ objdump -d hello

hello:     file format elf64-x86-64

Disassembly of section .text:

00000000004000e8 <_start>:
  4000e8:	e8 03 00 00 00       	callq  4000f0
  4000ed:	90                   	nop
  4000ee:	90                   	nop
  4000ef:	90                   	nop    

00000000004000f0 :
  4000f0:	55                   	push   %rbp
  4000f1:	48 89 e5             	mov    %rsp,%rbp
  4000f4:	48 c7 45 f8 03 01 40 	movq   $0x400103,-0x8(%rbp)
  4000fb:	00
  4000fc:	b8 00 00 00 00       	mov    $0x0,%eax
  400101:	c9                   	leaveq
  400102:	c3                   	retq

D’oh! Let’s save a detailed examination of the assembly for later, but in brief: when we return from the callq to main we hit some nops and run right back into main. Since we re-entered main without putting a return instruction pointer on the stack as part of the standard prologue for calling a function, the second call to retq tries to pop a bogus return instruction pointer off the stack and jump to it and we bomb out. We need an exit strategy.

Literally. After the return from callq, push 1, the syscall number for SYS_exit, into %eax, and because we want to say that we’re exiting cleanly, put a status of 0, SYS_exit‘s only argument, into %ebx. Then make the interrupt to drop into the kernel with int $0x80.

jesstess@kid-charlemagne:~/c$ cat stubstart.S
.globl _start

_start:
	call main
	movl $1, %eax
	xorl %ebx, %ebx
	int $0x80
jesstess@kid-charlemagne:~/c$ gcc -nostdlib stubstart.S -o hello hello.c
jesstess@kid-charlemagne:~/c$ ./hello
jesstess@kid-charlemagne:~/c$

Success! It compiles, it runs, and if we step through this new version under gdb it even exits normally.

Hello from a libc-free world!

Stay tuned for Part 2, where we’ll walk through the parts of the executable in earnest and watch what happens to it as we add complexity, in the process understanding more about x86 linking and calling conventions and the structure of an ELF binary.


Ksplice Uptrack: What Understanding Object Code Can Do For You

Ksplice Uptrack lets you apply Linux security updates to your running kernel without rebooting. Say goodbye to wasted time, money, and sleep for scheduled downtimes and try it today.

Share :
  • Twitter
  • Reddit
  • Digg
  • Facebook
  • del.icio.us
  • StumbleUpon
  1. Punya says:

    Thanks for writing this the way someone might plausibly discover what’s going on, rather than just stating the facts.

  2. hurst says:

    Nice article

  3. Tiago S. says:

    Thanks!

    As pointed out in the HN thread, this is also a good reference: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

  4. Curt says:

    Oh man. I use to love the days of hacking C and doing assembly. I really miss that!

    I haven’t done a lot of Linux assembly. Surprised they are still using interrupts to terminate programs like we use to do in the old days. Oh the joys of INT 21h DOS programming.

  5. Michael Williamson says:

    Using dietlibc allows you to avoid pulling in a whole load of code you never use:

    http://www.fefe.de/dietlibc/

    @Curt: when moving from x86 to x86-64 Linux, int 0×80 is replaced by the syscall instruction.

  6. JoostM says:

    In 64 bit world, assembly hacking is a lot of fun, because there are now 16 registers and of course a huge address space. Also, there are different syscall numbers and the syscalls are triggered with the ‘fast syscall’ instruction, “syscall”, instead of “int 0×80″. And syscalls use different registers now: rdi, rsi, rdx, r10, r8, r9.

    However many systems ship with both 32-bit and 64-bit syscall interfaces, like yours does.
    So in 64-bit architecture your 32-bit bootstrapper would become:

    _start:
    call main
    mov rax, 0x3c ; __NR_exit from /usr/include/asm/unistd_64.h
    mov rdi, 0×00
    syscall

  7. Vince says:

    Nice article, looking forward to your next post.

  8. JoostM says:

    @Curt : Interrupts/syscalls are still used for basically all of the core functionality of a POSIX kernel, like writing to file, opening file, etc. libc shields us from that.

    I’m not sure what happens on win32/win64 but I suspect some of it is still handled through trusty 0xCD 0×21 opcodes :-)

  9. Well written, thank you.

  10. MrNightLifeLover says:

    I remember doing the same thing some years ago (but also with a bit of more complicated programs).. what gcc are you using? IIRC we used GCC 2.something because GCC 3 optimized and generated hard to read code. However I don’t remember which libc we used.

  11. MrNightLifeLover says:

    Anybody on a Mac (running SN with latest XCode)? So far I failed to compile with -nostdlib

    macbook:code blah$ gcc -nostdlib -o hello hello.c
    ld: could not find entry point “start” (perhaps missing crt1.o)
    collect2: ld returned 1 exit status

  12. pfarrell says:

    Nice essay. I went through a period of wanting to learn assembler, but it passed :) .

    Also, Steely Dan for your server? Brilliant!

  13. Jessica McKellar says:

    @MrNightLifeLover

    “gcc -v” gives me “gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4)”

    and “/lib/libc.so.6″ gives me “…GNU C Library stable release version 2.9…”

  14. JoostM says:

    MrNightLifeLover : it seems from the error message) that Macs use entry point ‘start’ instead of ‘_start’, so you’d have to adjust the code in stubstart.S accordingly. This has to do with C name mangling.

  15. digitocero says:

    Very nice article. The kind of step by step and experimenting I like

  16. atx says:

    I enjoyd the article very much and i’m looking forward for part 2!!

  17. matthias says:

    @JoostM thx for pointing that out.
    Superb article.

  18. Null says:

    Great article. Reminds me of a similar article but pushed in another direction.

    http://oep.tumblr.com/

  19. Ogre says:

    NNNNNNNNNNNNEEEEEEEEEEEEERRRRRRRRRRRRRRRRRRRRRRRDDDDDDDDDDDSSSSSSSSSSSSSSSSSSSS

  20. Ofer says:

    Very amusing. Thank you!

  21. manish says:

    nice article.

  22. Thomas says:

    Of course, you could have just made Hello World in assembly from the beginning. :)

    Nice article

  23. not_dmr says:

    output?

  24. sisis says:

    you can use the system call write to print text on stdout:
    file descriptor number 1 corresponds to stdout, store that in ebx. push “Hello world\n” on the stack and store esp in ecx and finally store the correct size in edx and call write (syscall number 4)

  25. starkstech says:

    Amusing cant wait for part II

  26. Thank you, not_dmr and sisis! I was wondering what had happened to this being a Hello World program :)

  27. Fido says:

    Brilliant, by far one of the simplest and most complete explanations I ever read! Thanks for sharing!

  28. dominiko says:

    I get hello world in 5298 bytes (about half of your size) using libc and gcc-4.4.1 (Ubuntu-9.10).

    * use puts() instead of printf(). puts() is simpler since it does not take care of formatting (formatting is useless here)
    * strip the executable

    $ cat hello.c
    int main()
    {
    puts(“hello world”);
    return 0;
    }

    $ gcc -Os hello.c; strip a.out ; ls -l a.out
    -rwxr-xr-x 1 pel pel 5298 2010-03-17 04:17 a.out

    $ ./a.out
    hello world

    A c++ program is barely bigger (5600 bytes):

    $ cat hello.cpp
    #include
    int main()
    {
    std::cout << "hello world\n";
    return 0;
    }

    $ g++ -Os hello.cpp ; strip a.out ; ls -l a.out
    -rwxr-xr-x 1 pel pel 5600 2010-03-17 04:25 a.out

    $ ./a.out
    hello world

  29. TJ says:

    This is hardly a Hello World program anymore if it’s not going to output the words “Hello World” when executed. Just setting a pointer to the string in memory isn’t the same.

  30. JeffFromOhio says:

    Hey, could you, in your next segment, possibly explore a little bit of what else is in your ‘stripped down’ file? That is, the file is over 1000 bytes long, still, and the section of disassembly you showed only accounts for maybe what, like 100 bytes of that? The string constant you compiled in should only account for something like 12 or 13 bytes.

    I played around with objdump a little bit (tried using the -D option instead of -d), and I see there are a bunch of other sections – what are all those other sections for?

  31. Systam/360 says:

    Speaking as a Windows programmer, Assembly Language is hardly dead. Modern versions of Microsoft MASM, and others assemblers, now have things like the INVOKE statement, that provides a direct interface to the WIN32/64 API, the very same API used by Visual Studio and other development environments. Open source packages like MASM32 include the Asm source code for virtually ever function, structure and equate used by Windows, and it includes the .LIB files needed to map the various Windows DLLs. We don’t use INT 21h much these days. :)

    C may be the “popular” language for Windows / Linux development, but there are still plenty of us “asm only” geeks out there today. You can’t beat Assembly Language when it comes to speed, efficiency, and program size.

    And this is true on just about every computer platform. Most of my time these days, and for the last 40+ years, is with IBM mainframe computers, where I also program in Assembly Language. But mainframe Asm, and PC Asm, are 2 entirely different beasts, as they should be. :)

  32. Jessica McKellar says:

    @TJ I absolutely agree that writing to stdout and setting a local variable aren’t the same. I structured the post this way because it let me dig a little deeper on some other ideas without getting too long winded, and in particular because of the way the sequel is structured. Thanks for reading, and stay tuned for part 2!

  33. alvare says:

    Very cute and all, but …. it doesn’t even say “Hello World” after all that fuss xP

  34. Jalal Hajigholamali says:

    Hi,

    very useful for me , thanks a lot…

  35. Eric says:

    If you change this line:

    char *str = “Hello World”;

    to this:

    // char *str = “Hello World”;

    You would make it smaller still, without affecting the output of the program in any way.

  36. Very nice article.

    One thing. If you’re not using libc, you do not need an ‘int main()’. You do not need to write a separate _start in assembly either. You could just do this:

    void _start()
    {
    char *str = “Hello World”;
    asm(“movl $1, %eax”);
    asm(“xorl %ebx, %ebx”);
    asm(“int $0×80″);
    }

    That would compile just fine and run too.

  37. Me again. Interestingly, if (for the sake of keeping to conventions somewhat) you rearrange the code like below, the file size jumps to about 5KB, just by adding two c functions.

    int main()
    {
    char *str = “Hello World”;
    return 0;
    }

    void quit()
    {
    asm(“movl $1, %eax”);
    asm(“xorl %ebx, %ebx”);
    asm(“int $0×80″);
    }

    void _start()
    {
    main();
    quit();
    }

  38. Johan Bezem says:

    Nice overview. In embedded systems programming this is done every day!

  39. Paul Gray says:

    Great article, not only do you cover the facts but you do it with the reader, which is a art unto itself. Reminds me of the days of doing chess on a ZX81 in only 1k of memory. Not many people bother to get to know things inside out and why things are done.

  40. Cesar Claveria says:

    Great, thanks!
    It was a fun read! I always love to learn more about the linker on Linux :-)

  41. Christian Treczoks says:

    Looks like a project I did about fifteen to twenty years ago. The system was an AmigaO3.0 and SAS C compiler. The system was extremely C-friendly, and it was very easy to stuff the standard C library overhead. OK, it was working without the parachute, but still very comfortable (AmigaOS had all the niceties like printf in the OS!).

    My masterpiece in code tightness was a fortune cookie program written in C, no startup used. It
    - opened the DOS library and the Timer device (The latter a bit difficult, AmigaOS devices were a bitch to open/close on that level),
    - opened an index file and got its size,
    - generated a random number in the range of the indexes of the file.
    - got that random index record and closed the file,
    - opened another file and moved to the index position,
    - read a variable length record into a proper size AllocMem()ed buffer,
    - wrote that string to stdout, and
    - properly closed and freed all the resources
    It even dealt with “impossible” cases like “can not open DOS library”, and exited gracefully if a resource could not be allocated, freeing all the things allocated or opened so far.
    The only “unclean” thing was that I linked a small assembler routine for the random number generator.
    Total executable size was somewhere between 950 and 1000 bytes, and it was clean enough to get an “S” flag on the system, meaning that one could load the executeable into memory once and execute it concurrently (_S_hared) from different processes.

    At the other end of the scale, my first ADA program was a “hello world!” (of course!), that gave me a) headaches and b) a 374 Kilobytes executeable (on VAX/VMS).

    Yours, Christian

  42. manuel says:

    Hi, thanks very much, nice work :)

    grettings from Colombia

  43. Henrik says:

    Very nice article indeed. But this program can barely be classified as the classic “hello world”, as it does not output the famous string.

    But a very interesting read! Nice work.

    - Henrik

  44. DavidG says:

    There’s one thing obviously wrong with your _start code, instead of moving 1 into eax and clearing ebx, you should move the content of eax (return value of main) into ebx and then move 1 into eax instead.

  45. Velan says:

    First of all. This is not a Hello World code. Plus a Hello World need not have a return and int type for main. But out of curiosity, I tried by taking it out and it gives me 881.

  46. Abhinay says:

    That’s a great way of learning! Thanks for this blog post!
    Very informative post!

  47. Lakshmipathi says:

    Interesting one…

  48. will w says:

    So besides the fact you no longer put “Hello World” to the screen, what was the byte count after you had done this?
    gcc -nostdlib stubstart.S -o hello hello.c
    wc -c hello

  49. richard k says:

    thank you for this.. i was just looking for info to get into assembly and this article sure tickles my fancy!

  50. Chris Lee says:

    your app isn’t actually outputting hello world though

  51. trh says:

    Thanks, waiting for part 2 ;)

  52. Sean says:

    Thank for an interesting article. I enjoyed reading it and following the conclusions that you made. Please write more of these!

  53. Scott says:

    Um, where is “Hello World?”

  54. blah says:

    this is better than 99.9% of the “learn c” [text]books out there…
    thanks for enlightenment…

  55. Jessica McKellar says:

    @DavidG Totally true that one should in general be propagating the return value from main() to the exit value instead of fixing it at 0. For this toy example I opted to gloss over that. Thanks for the comment!

  56. Dr. Mitch says:

    I don’t see “Hello World” in the output.

  57. taraCkaans says:

    It’s kinda sad that gcc produces such a bloatware-ish output by default when you’re just trying to display a string… How much smaller apps would generally get if gcc wouldn’t link against unnecessary libs when possible?

  58. Rob says:

    No output and it’s not portable.

  59. Bernie Roehl says:

    I would move main()’s return code (in eax) into ebx, so that the program’s exit status reflects what main() returns.

    In other words…

    _start:
    call main
    movl %eax,%ebx
    movl $1, %eax
    int $0×80

  60. Anonim says:

    Shouldn’t you start by using the proper version of hello world.

    #include
    int main()
    {
    return printf(“hello, world\n”) < 0;
    }

  61. Wimp says:

    Great article!
    Any idea when we can expect the sequel?
    Is this going to be a 2 part article only or can we expect more?!?

  62. chezgi says:

    very nice and simple.

  63. Craig Landrum says:

    Don’tcha just love geek critics? How many times do you think someone will point out that the program didn’t actually print anything out? Like, duh. Wonder what portion of “Part 1″ they didn’t understand? I’m just going to take a wild guess and anticipate that Part 2 will show the progression into an actual “Hello World” program as well as some other insightful goodies that you have discovered on how to achieve a teensy executable under x86 Linux.

    Right on with your bad-ass, bare-metal, thick-skinned, empowered grrrl-coder self. Knowledge is power and shared knowledge elevates your status. Keep this up and all those nit-pickers will wonder why you got the corner office with the window view.

  64. A. Sembler says:

    Good on you for having the curiosity and taking the initiative to figure all of this out. I’m an old assembler jock who’s written a lot of code that runs on the CPU with *no* operating system to take care of the niceties. It’s good to see there are still people who are interested in things at a lower level and not just what the latest thing out of the Java world can do. (That’s not a knock on Java, but there are a lot of people writing software that have little or no understanding of what the rest of the computer is doing while their programs are running.)

    If I can be critical for a minute, you didn’t do what you set out to do, which was “to write a Hello World program in C simple enough that I can disassemble it and be able to explain all of the assembly to myself.” What you did was write a very short replacement for _start which calls a compiled C function function that implements Hello World by calling another compiled C function out of libc. What have you learned about what the longer _start does and why it does what it does?

    If you really want to see what’s down the rabbit hole, study what the compiler produces for the higher-level constructs like if, for, switch, etc. and how it deals with pointers, arrays and structures. Then pull out your Spice Weasel and kick it up a notch (BAM!) by turning the optimizer on and seeing what it does to make your code run faster. (For example, compiled with -O, your *str = “Hello World” program does absolutely nothing because str is never used and is optimized out.)

    Happy hunting.

  65. /dev/null says:

    Something is inherently wrong in the way gnu/linux uses libraries. If only _one_ function is needed – the whole frigging library is loaded. In the ‘good ole days’ we had ‘ranlib’ and sometimes we had to load the library several times before all the dependencies were satisfied. What’s the point of libraries when they are nothing but ‘blobs’?

  66. A. Sembler says:

    @/dev/null: Modern Unix kernels (including Linux) demand page executables and shared libraries, so while a library is considered “loaded,” the entire thing isn’t brought into memory, and parts that are used infrequently get used get paged out. But before any of that happens, the dynamic linker has to make sure all of the symbols resolve before allowing the executable to use it, which means it has to be gone through. That’s the advantage of shared libraries: those that are used often (e.g., libc) are loaded early in the system’s life and remain available for every other executable to use without incurring the time penalty during startup.

  67. ddade says:

    Nice hacking, Jessica. Glad to see that there are still people who care to know how their systems actually work.

    @A. Sembler:
    The points you raised are valid, but orthogonal to Jessica’s. Her main point was that even *without* linking against glibc at runtime, which you correctly point out would not be as bad a hit as the ELF binary would lead one to believe, crtl.o calls in a load of code on its behalf, i.e. crtl.o cannot and does not respond to the –nostdlib flag. But what you said is true and as interesting nonetheless… how about you post a separate article? I’d read that one too :)

    In any case, looking forward to the next part done in your “as I experienced it” style.

  68. ddade says:

    A.
    Sembler.

    I see what you did there.

  69. QuietObserver says:

    I wrote the entire code in 39 bytes for the Commodore 64 (6510 CPU) using a miniassembler (not to show you up, but simply because I, too, felt 11k of “Hello, World” was outrageous.

    For laughs, here’s the entire code (using Address labels, since text labels are not an option when using miniassemblers; the text is PETSCII):

    C000:
    LDA #$19 ;Load the text address
    LDY #$C0
    STA $FB ; Store the text address in a memory pointer
    STY $FC
    LDY #$00
    C00A:
    LDY ($FB),Y
    BEQ $C018
    JSR $FFD2
    INY
    BNE $C00A
    INC $FC
    BNE $C00A
    C018:
    RTS
    C019:
    DB “HELLO, WORLD!”, $00

  70. nicomp says:

    Thanks for an interesting read. “Back in the day” (before Windows bloated everything) we used to consider the footprint of the executable as an informal benchmark of the compiler/linker.

  71. Consider the bigger issue; this article talks about a dynamically linked “Hello World”, but a statically linked one doesn’t even fit on a 720KB floppy anymore:

    int main() { printf(“Hello, World\n”); return(0); }

    Turns into:

    -rwxr-xr-x 1 mbt mbt 737584 Mar 12 11:46 hello

    This is 304 bytes more than an unformatted 720KB floppy disk.

    That is seriously bloated, IMHO.

  72. bandit says:

    There are things that _start needs to do to create a “real” C executable.

    Lets say you have a global

    int foo;

    foo will live in .BSS – and the C standard *demands* it be inited = 0 by the time main() is called. the entire .BSS segment *must* be inited to 0 before main() is called.

    You have another global (or a static in a function)

    int bar = 5;

    or

    void baz()
    {
    static int bletch = 5;
    }

    both variables live in .DATA and must be properly set (somehow) before main() is called.

    Note the embedded world works a bit differently than the PC world. Often, the .INIT segment lives in FLASH or other non-volatile memory and gets copied to .DATA (a RAM segment) in _start(). Generally the two segments have the same memory mapping, so a straight byte copy works (at least that is the best way to do it). Note there can be several flavors of .INIT segments in an embedded system – only one gets copies to RAM.

    The other major thing _start() needs to do for most cases is setup the stack. It may be that the PC world sets up a stack for the program, but not in the embedded world.

    Not to knock your project – it is a good demo of digging down to the iron. Good for you.

    … bandit

  73. victor n. says:

    very informative. thanks a lot.

  74. donkey says:

    pure C and nostdlib

    #include

    int main(int argc, char** argv);

    void _start()
    {
    main(0, NULL);
    }

    int main(int argc, char** argv)
    {
    return 0;
    }

  75. Suresh says:

    Very informative !
    Looking for next article !!

  76. Aboelnour says:

    Actually if you really need to get the assembly code of a C program this is too easy
    just run: gcc -S main.c then view the file main.s

    the compiling steps of the gcc is:
    * preprocessing (to expand macros)
    * compilation (from source code to assembly language) —–>here we go
    * assembly (from assembly language to machine code)
    * linking (to create the final executable)

    when you give the gcc the flag ‘S’ you make him to stop
    after step 2 and generate the assembly file.

    In your way you disassembly the executable file which
    contains the lib’s which he used .

    nice article keep going :)

  77. Alex says:

    Jesus Christ, HURRY UP WITH PART 2. The suspense is killing me.

  78. Andreas Salwasser says:

    After various tips and information from this post and the comments, I want to submit a libc-free version of a “Hello World”-program in C, which actually prints “Hello, World!”.

    main.c:
    —————————————————cut here—————————————————
    /*
    * ============================================================================
    *
    * Filename: main.c
    *
    * Description: libc-free
    *
    * Version: 1.0
    * Created: 04/05/2010 10:05:33 PM
    * Revision: none
    * Compiler: gcc
    *
    * Author: Andreas Salwasser (AnSa), anonuanon@googlemail.com
    * Company:
    *
    * ============================================================================
    */

    int
    print ( char *Cstring )
    {
    int len = 0;
    while( Cstring[len] != ” ) /* calculate length of string */
    {
    len++;
    }
    asm(
    “movl %0, %%edx\n\t” /* third argument: message length */
    “movl %1, %%ecx\n\t” /* second argument: pointer to message to write */
    “movl $1, %%ebx\n\t” /* first argument: file handle (stdout) */
    “movl $4, %%eax\n\t” /* system call number (sys_write) */
    “int $0×80″ /* call kernel */
    : /* no output */
    : “r” (len), “r” (Cstring) /* input variables
    (above: %0 revers to len, %1 refers to Cstring) */
    /* no clobbered register */
    );
    return 0;
    }

    int
    main ()
    {
    char *str = “Hello, World!\n”;
    print(str);
    return 0;
    }

    void
    quit ()
    {
    asm(
    “movl %eax, %ebx\n\t” /* return value of main into %ebx */
    “xorl $1, %eax\n\t”
    “int $0×80″
    );
    }

    void
    _start ()
    {
    main();
    quit();
    }
    —————————————————cut here—————————————————

    Makefile:
    —————————————————cut here—————————————————
    main : main.o
    gcc -nostdlib -o main main.o

    main.o : main.c
    gcc -c -nostdlib main.c
    —————————————————cut here—————————————————

    Thanks for the inspiration.

  79. Daniele says:

    nice article! <3

  1. [...] full post on Hacker News If you enjoyed this article, please consider sharing it! Tagged with: Assembly [...]

  2. [...] Hello from a libc-free world by Jessica McKellar reminded me of those days and started me wondering again about the path not taken. Like learning Greek or Sanskrit, discovering how assembly works is tempting, but less useful and more a quaint skill that is hard to obtain if you don’t actually use it to build programs or speak Greek or Sanskrit with someone. And what can you build in assembly that you can’t build quicker and easier in say, C? Especially with all those handy libraries that somebody else wrote and you can use. But I guess that is precisely the point Jess is making. All these libraries obscure what really happens when you want to print something on the screen, what pieces of information gets shoved in which registers. This is good stuff. [...]

  3. [...] 16, 2010 This guy steps through how to make a small C program where you understand all the assembly. Posted by Matthew Walker Filed in Programming/Development ·Tags: assembly Leave a [...]

  4. [...] Ksplice » Hello from a libc-free world! (Part 1) – System administration and software. Share and Enjoy: [...]

  5. [...] the classic ‘Hello world‘ to weigh in at 11 KB? An MIT programmer decided to make a Linux C program so simple, she could explain every byte of the assembly. She found that gcc was including libc even when you [...]

  6. [...] this post, “Hello from a libc free world” and thought that you guys might find it interesting given that we are using C to implement [...]

  7. [...] Ksplice » Hello from a libc-free world! (Part 1) – System administration and software [...]

  8. [...] Ksplice » Hello from a libc-free world! (Part 1) – System … [...]

  9. [...] Ksplice » Hello from a libc-free world! (Part 1) – System … [...]

  10. [...] Ksplice » Hello from a libc-free world! (Part 1) – System … [...]

  11. [...] Ksplice » Hello from a libc-free world! (Part 1) – System … Share This: [...]

  12. [...] Ksplice » Hello from a libc-free world! (Part 1) – System administration and software (tags: programming c linux compiler code gcc assembly asm linking) [...]

  13. [...] 18, 2010 by saravananthirumuruganathan 1. Simpler "Hello World" Demonstrated In C Two nice links from Slashdot . One to compile a hello world without libc – Hello from a libc-free [...]

  14. [...] Ksplice » Hello from a libc-free world! (Part 1) – System … [...]

  15. [...] Hello from a libc-free world! [...]

  16. [...] the previous post we conquered compilation by constructing a small program that can be compiled without using libc. [...]

  17. [...] A libc Free World (or Why’s HelloWorld.out 11 KB?) Hello from a libc-free world! [...]

  18. [...] interesting articles on creating tiny Linux executables and a minimal Hello World in [...]

  19. [...] language program students first learn and that I wrote so many years ago. And now I read how that simple little program expands out to more than 11 kilobytes of ELF executable on linux. That is not from printf in [...]

Leave a Reply