# \[2.1] Lexical Analysis

We've reached our first stage, known as Lexical Analysis, Scanning, or just Lexing. The Lexical Analyzers job is to take a stream of characters that make up the source program. This could be in any shape or form, as long as the Lexer has a constant flow of characters, to create meaningful sequences that are also known as "Lexemes", the Lexical Analyzer produces a *token* as a result of the analysis, a *token* will often look like this:&#x20;

<div align="center"><img src="https://2672771087-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MZIKsr7SRbZdC9diYtD%2F-M_ILfo2bnXm8GXXQbGe%2F-M_IMN6DckOIeshwbHus%2FGroup%204.png?alt=media&#x26;token=04e1ad8e-0a5b-4ca8-821e-38e07662fe35" alt="(Figure 1)"></div>

This token will later make its way to the syntax analysis phase. In the example token above, it has a name and value, the *token-name* of the token represents a symbol for use during the syntax analysis phase, and the *attribute-value* of the token is connected to an entry in what's known as a *symbol table*, this information is needed for semantic analysis and code generation.

Let's pick apart a statement together and see if we can't figure out how a source code becomes tokens. We'll start with something simple:

```csharp
amount = paid + tip * 0.6
```

Before explaining in-depth, I'd like to mention that lexing is all positional based, meaning we have no information on the language itself, just a constant stream of characters to convert into tokens. Now let's analyze that statement:&#x20;

1. *amount* is a lexeme that gets mapped into a new token that looks something along the lines of:\
   \<id, 1>. Id is what you'd call an *abstract symbol*, it doesn't directly represent "amount" but merely stands in place of it being an *identifier* for amount, and the value of 1 points to an entry somewhere in the symbol-table to retrieve actual information about the identifier, such as the name, type, value, and any other information the table holds.&#x20;
2. So we've now encountered the = lexeme, which in most languages are for identifying assignment. What token would this map to though? This can be somewhat of a controversial opinion, some argue that such tokens should be an abstract symbol, something along the lines of *assign* or *assignment,* others will say that there's no need, and it can represent simply as its name. For the sake of simplicity, we'll represent <=> as a pure lexeme itself as the name of the abstract symbol.
3. *paid* is another lexeme that gets mapped into a new token, looking like \<id, 2>, 2 simply points to the symbol-table entry of *paid,* you know the drill it's the exact same as the first lexeme.
4. Since we've decided to forego representing the operators as abstract symbols, our + lexeme will simply be <+>.
5. &#x20;*tip* follows exactly the same as *paid* and *amount*, being mapped into \<id, 3> and 3 pointing to the symbol-table entry for *tip*.
6. As always, we'll be representing the lexeme \* as <\*>.
7. 0.6 is what's known as a *literal*, but it's also a lexeme that's going to get mapped into a token as <0.6>, just because it has no name, does not mean it can't be represented as a valid token. We can also be more correct and represent it as a number token \<number, 0.6>. This gives the compiler information on its type, and how to treat it later on.&#x20;

If we put all the tokens together, we'll have something that looks along the lines of this:

```csharp
<id, 1> <=> <id, 2> <+> <id, 3> <*> <number, 0.6>
```

See? It's not that complicated, we'll be covering actual Lexing techniques later on in the book.&#x20;
