Compilation, Linking And Execution

Compilation, linking, and execution constitute the process of transforming C++ source code into an executable program. Here's an overview of each step:

  1. Preprocessing: The preprocessor handles directives like #include and #define, replacing them with their corresponding content and performing macro expansion. While some compilers might generate a temporary intermediate file during preprocessing, it's not essential for understanding the core concept.

  2. Compilation: The compiler translates the preprocessed source code into object code, which is a form of machine-readable code specific to the target platform. Modern compilers often optimize the process by translating preprocessed code directly into object code, bypassing the assembly stage. This optimization streamlines the process and improves performance.

  3. Linking: The linker combines object files and resolves external references to produce the final executable program. In addition to linking object files, the linker also performs tasks such as optimizing the final executable and generating debug information. It replaces undefined symbols with the correct addresses, includes necessary system libraries and startup code, and creates the executable file. The output can be an executable file (.exe) or a shared library.

  4. Execution: Upon execution, the linker-created executable file is loaded into memory by the operating system, and execution begins from the entry point, typically the main() function. It's important to note that the object file itself doesn't become an executable file during execution. Instead, the linker combines object files to create the executable file, which is then loaded and executed.

What is a compiler?

A compiler is a special program that translates a programming language's source code into machine code, bytecode or another programming language. The source code is typically written in a high-level, human-readable language such as Java or C++. A programmer writes the source code in a code editor or an integrated development environment (IDE) that includes an editor, saving the source code to one or more text files. A compiler that supports the source programming language reads the files, analyzes the code, and translates it into a format suitable for the target platform.

How does a compiler work?

Compilers vary in the methods they use for analyzing and converting source code to output code. Despite their differences, they typically carry out the following steps:

  • Lexical analysis - The compiler splits the source code into lexemes, which are individual code fragments that represent specific patterns in the code. The lexemes are then tokenized in preparation for syntax and semantic analyses.
  • Syntax analysis - The compiler verifies that the code's syntax is correct, based on the rules for the source language. This process is also referred to as parsing. During this step, the compiler typically creates abstract syntax trees that represent the logical structures of specific code elements.
  • Semantic analysis - The compiler verifies the validity of the code's logic. This step goes beyond syntax analysis by validating the code's accuracy. For example, the semantic analysis might check whether variables have been assigned the right types or have been properly declared.
  • IR code generation - After the code passes through all three analysis phases, the compiler generates an intermediate representation (IR) of the source code. The IR code makes it easier to translate the source code into a different format. However, it must accurately represent the source code in every respect, without omitting any functionality.
  • Optimization - The compiler optimizes the IR code in preparation for the final code generation. The type and extent of optimization depends on the compiler. Some compilers let users configure the degree of optimization.
  • Output code generation - The compiler generates the final output code, using the optimized IR code.

Compilers are sometimes confused with programs called interpreters. Although the two are similar, they differ in important ways. Compilers analyze and convert source code written in languages such as Java, C++, C# or Swift. They're commonly used to generate machine code or bytecode that can be executed by the target host system.

Interpreters do not generate IR code or save generated machine code. They process the code one statement at a time at runtime, without pre-converting the code or preparing it in advance for a particular platform. Interpreters are used for code written in scripting languages such as Perl, PHP, Ruby or Python.

Difference Between Compiler, Interpreter and Assembler

Topic asked in February 2022 (CBCS) , July 2022 (CBCS) and December 2022 (CBCS) question paper.

CompilerInterpreterAssembler
The compiler saves the Machine Language in form of Machine Code on disks.The Interpreter does not save the Machine Language.Assembler converts Assembly language code into machine code.
Compiled codes run faster than Interpreter.Interpreted codes run slower than Compiler.Assembler code usually runs faster than interpreted code but slower than compiled code.
Interpreted codes run slower than Compiler.The interpreter does not generate any output.Assembler does not generate output directly; it produces object files or machine code.
Any change in the source program after the compilation requires recompiling the entire code.Any change in the source program during translation does not require retranslation of the entire code.Changes in assembly code may require reassembly of only the affected modules.
Errors are displayed in Compiler after compiling together at the current time.Errors are displayed in every single line.Errors in assembly are usually reported per line or per statement.
The compiler can see code upfront which helps in running the code faster due to optimization.The Interpreter works by line, making optimization slower compared to compilers.Assembler optimization usually involves optimizing individual instructions or sequences.
It does not require source code for later execution.It requires source code for later execution.Assembler requires source code for reassembly or modification.
Object code is permanently saved for future use.No object code is saved for future use.Assembler generates object files which are used for linking and execution.
C, C++, C#, etc., are programming languages that are compiler-based.Python, Ruby, Perl, SNOBOL, MATLAB, etc., are programming languages that are interpreter-based.Assembly language programs are translated directly into machine code.
How's article quality?

Last updated on -

Page Contents