Pages

Monday, March 2, 2015

Stages of Compilation in Linux using gcc

When you write a program, it doesnt do anything until you compile it. People working on Linux machine use GCC a compiler for C, C++, java, Fortan and other program code that can be used in Unix, GNU/Linux machines. It is distributed as Free Software under the GNU General Public License (GNU GPL). It is useful to know the step by step compilation stages as a developer or even for a beginner.
        During the compilation we go through four stages and each stage use a tool to translate code from one to other till we reach the loadable binary image (binary file) for execution in architecture. As we are using sequence of tools hence it is called as GNU tool chain. Understanding the various stages of compilation helps in cross compilation of code.
Below steps shows compilation process using gcc compiler
Source file:
        It contains the source program in test format. It can be of any language c, c++, etc. For eg: first.c is a C source code.


Step1:
Pre-processing: (here we use cpp tool)
  • it helps in creating fast and efficient code.
  • it reads from header files for creating a pre-processed source file.
  • all macros and constant symbols are replaced.
  • all conditional pre-processor directives are processed by pre-processor.
  • it provides conditional pre-processor directives are pre-processed by processor
$ gcc -E first.c -o first.i 

    - E option to halt the compilation at pre-processing stagem. Refer man page.
    - o option to redirect the output to the new file first.i.


first.i contains the entire header file content + code.To see the sequence of approach in generation of first.i
$ gcc -v -E first.c -o first.i
    - v option stands for verbose.
 

Step2:
Assembler:
(here we use compiler tool)

  • Takes pre-processed file and creates file with .s extension called as assembly file.
  • It is mainly required for optimixation (speed and space) of code.

$ gcc -S first.i -o first.s

     - S option to halt at assembly stage.

Step3:
Relocatable Binary:
(here we use assembler tool)

  • contains offset address of the assembly code, it is assigned at compile time.object dump of first.o shows offset address.
for eg: a relocatable code contains call 19<>. Its position depends on main position.
This file contains source in assembly and library routines.

$ gcc -c first.s -o first.o

Note: first.o is not readable. To view the content we use a tool "objdump" called as binary disassembler tool.

$ objdump -D first.o
    - D option stands for disassemble, refer man pages.
 

Step4:
Linking:
(here we use linker tool)

  • linker tool is used to build the executable image, here we are packaging that gives lodable binary code that can be loaded and executed.

$ gcc first.o
    gcc first.o by default creates a.out, to get executable of specified name
we can give as
$ gcc first.o -o first (here first is the executable name we specified)

  • This executable (first) will be green in color in Bash shell.
  • This loadable fiel contaions loadable address in the form of segement and offset called as absolute address. 
  • Function calls entries present it PLT called as procesure linkage table.
  • Executable file contains some run time library. 
  • This file is mainly created by linker which is OS dependent.
    to view the content of executable first page wise,
$ objdump -D first | more
 


Observations:
So finally we are with five different files first.c, first.i, first.s, first.o, first, we shall
check out these file formats using the tool file.
( just a sequence of steps together shown )

$ gcc -E first.c -o first.i
$ gcc -S first.i -o first.s
$ gcc -c first.s -o first.o
$ gcc first.o -o first
$ file first.c
first.c: ASCII text
$ file first.i
first.i: ASCII C program text
$ file first.s
first.s: ASCII assembler program text
$ file first.o
first.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
$ file first
first: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
$

$ objdump -D first.o | more
first.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
0:         55                                          push          %ebp 
1:         89 e5                                     mov           %esp,%ebp 
3:         83 e4 f0                                 and           $0xfffffff0,%esp 
6:         83 ec 10                                sub            $0x10,%esp 
9:         c7 04 24 00 00 00 00            movl          $0x0,(%esp) 
10:       e8 fc ff ff ff                            call 11       <main+0x11> 
15:       c9                                          leave 
16:       c3                                          ret
--More--
$ objdump -D first | more (go down and see <main> section)
080483e4 <main>:
80483e4:         55                                          push          %ebp
80483e5:         89 e5                                     mov           %esp,%ebp
80483e7:         83 e4 f0                                 and           $0xfffffff0,%esp
80483ea:         83 ec 10                                sub            $0x10,%esp
80483ed:         c7 04 24 00 00 00 00            movl          $0x0,(%esp)
80483f4:       e8 fc ff ff ff                              call 11       <main+0x11>
80483f9:       c9                                            leave
80483fa:       c3                                            ret 
--More--

       You can view machine instruction code and important thing to observe is the address at the extreme left of each line, this is an offset address which is reloaded or remapped to a virtual address by adding this offset to a base address of the segment.

       We have obtained executable (first) from the relocatable (first.o) and here in first observe the address of the instruction that are mapped to some virtual 32-bit address. The step3 output (first.o) is hence called as relocatable as the offset address are remapped to some virtual address. Linker does the job of relocating offset address to the platform specific address. This virtual address concept is huge and interesting and even important topic of discussion which i will post soon. :-)


Note: Creation of files from .c to .o can be used in any architecture. Where as the executables are specific to platform and architecture.

Please leave comment :-)                                                Queries are at free of cost

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.