Our Latest News

FPGA, CPLD working principle and introduction

Programmable Logic Device (PLD) originated in the 1970s and is a new type of logic device developed on the basis of Application Specific Integrated Circuit (ASIC), which is the main hardware platform for today’s digital system design, and its main feature is that it is completely configured and programmed by the user through software to perform a specific function and can be repeatedly erased. Its main feature is that it is completely user-configurable and programmable through software to perform a specific function, and it can be repeatedly erased. When modifying and upgrading PLDs, no additional PCB board changes are required, only the program is modified and updated on the computer, making the hardware design work a software development work, shortening the system design cycle, increasing the flexibility of implementation and reducing costs, thus gaining the favor of the majority of hardware engineers and forming a huge PLD industry scale.

The common PLD products are: Programmable Read Only Memory (PROM), Field Programmable Logic Array (FPLA), Programmable Array Logic (PAL), and General Purpose Array Logic (GPLA). PAL, Generic Array Logic (GAL), Erasable Programmable Logic Array (EPLA), Complex Programmable Logic Device (CPLD) PLD devices can be subdivided into simple PLDs (SPLD), complex PLDs (CPLD), and FPGAs in terms of size, and their internal structures are implemented in different ways.

Programmable logic devices can be divided into three categories according to the granularity of the basic unit: ① small granularity (e.g., “sea of gates” architecture), ② medium granularity (e.g., FPGA), and ③ large granularity (e.g., CPLD). According to the programming process, there are four categories: ① Fuse and AnTIfuse programming devices, ② Erasable programmable read-only memory (UEPROM) programming devices, ③ Electrically erasable programmable read-only memory (EEPROM) programming devices (e.g., CPLD), and ④ SRAM programming devices (e.g., FPGA). In the process classification, the first three categories are non-volatile devices, and after programming, the configuration data is retained on the device; the fourth category is volatile devices, and the configuration data is lost after power-down, so the data needs to be reconfigured after each power-up. Development history of programmable logic devices The development of programmable logic devices can be divided into four phases, namely, from the early 1970s to the mid-1970s as phase 1, from the mid-1970s to the mid-1980s as phase 2, from the 1980s to the late 1990s as phase 3, and from the late 1990s to the present as phase 4.

Phase 1

The only three types of programmable devices were simple programmable read-only memories (PROMs), ultraviolet erasable read-only memories (EPROMs), and electrically erasable read-only memories (EEPROMs), which could only perform simple digital logic functions due to structural limitations.

Phase 2


Slightly more complex programmable array logic (PAL) and general-purpose array logic (GAL) devices emerged, formally known as PLDs, capable of performing a variety of logic operation functions. Typical PLDs consisted of “and” and “non-” arrays, and used “and or” expressions to implement any combination of logic, so PLDs could perform a large number of logic combinations in the form of product sums. Therefore, PLDs can perform a large number of logic combinations in the form of product sums.

Phase 3


Xilinx and Altera introduced FPGAs similar to standard gate arrays and scalable CPLDs similar to PAL structures, which improved the speed of logic operations and featured flexible architecture and logic units, high integration, and wide applicability, etc. They were compatible with the advantages of PLDs and general-purpose gate arrays, and could realize super-large-scale circuits with flexible programming methods. Prototyping and small and medium scale (generally less than 10,000) product production of choice. At this stage, CPLD and FPGA devices have achieved significant development in manufacturing process and product performance, reaching the scale of 0.18 process and coefficient gate millions of gates.

Phase 4

SOPC and SOC technologies emerged as a result of the integration of PLD and ASIC technologies, covering the full range of real-time digital signal processing technologies, high-speed data transceivers, complex computing, and embedded system design technologies. And, this stage of logic devices embedded with a hard-core high-speed multiplier, Gbits differential serial interface, clock frequency up to 500MHz PowerPC microprocessors, soft-core MicroBlaze, Picoblaze, Nios and NiosII, not only to achieve the perfect combination of software requirements and hardware design, but also to achieve the perfect combination of high-speed and flexibility The combination of high speed and flexibility not only enables the perfect combination of software requirements and hardware design, but also enables the perfect combination of high speed and flexibility, which has surpassed the performance and scale of ASIC devices and the concept of FPGA in the traditional sense, extending the application scope of PLD from monolithic to system level. Currently, the concept of PLD-based on-chip programmability is still further evolving. Development Tools The development of highly complex PLD-based devices relies heavily on electronic design automation (EDA), and EDA tools for PLDs are based on computer software that packages typical unit circuits to form fixed modules and a standard hardware development language (e.g., HDL language) for designers to use. PLD development software needs to automatically perform logic compilation, simplification, partitioning, synthesis and optimization, layout and wiring, simulation, as well as adaptation compilation and programming download for specific target chips. A typical EDA tool must include two special software packages, namely a synthesizer and an adapter. The function of the synthesizer is to compile, optimize, convert and synthesize the HDL, schematic or state graphical description of a system project completed by the designer on the EDA platform for a given hardware system component.


As development grows exponentially, it is essential to reduce the compilation time and improve the compilation performance of PLD development software, as well as to provide a rich source of intellectual property (IP) cores for designers to call upon. In addition, the user-friendliness of the PLD development interface and the complexity of its operation are also important factors in evaluating its performance. Currently, in the PLD industry, PLD development tools of each chip provider have become the core component that affects its success or failure. Only with comprehensive chip technology leadership, complete documentation and excellent PLD development software can a chip provider gain customer acceptance. A perfect PLD development software should have the following 5 points.

Accurate conversion of user designs into circuit modules

Efficient use of device resources

Enables fast compilation and synthesis

Provides rich IP resources

User-friendly and easy-to-use interface

CPLD working principle and introduction based on the product-term (Product-Term) PLD structure

The PLD chips using this structure are: Altera’s MAX7000, MAX3000 series (EEPROM process), Xilinx’s XC9500 series (Flash process) and LatTIce, Cypress most of the products (EEPROM process) Let’s look at the general structure of this PLD (MAX7000 for example, the structure of other models are very similar to this).

Figure 1 Internal structure of PLD based on product term This PLD can be divided into three structures: macro cell (Marocell), programmable interconnect (PIA) and I/O control block. The macro cell is the basic structure of the PLD, which implements the basic logic functions. The blue part in Figure 1 is a collection of several macrocells (not shown because of the large number of macrocells). The I/O control block is responsible for controlling the electrical characteristics of the inputs and outputs, such as open collector output, swing rate control, tri-state output, etc. The INPUT/GCLK1, INPUT/GCLRn, INPUT/OE1, INPUT/OE2 are global clock, clear and output enable signals, which are connected to each macro cell in the PLD by a dedicated line with the same delay time and the shortest delay time to each macro cell. The specific structure of the macro cell is shown in the following figure.

Figure 2: Macro cell structure The left side is the product term array, which is actually a sum or array, where each intersection is a programmable fuse that implements “sum” logic if it is on. The product term selection matrix at the back is an “or” array. The two together complete the combinational logic. The right side of the figure shows a programmable D flip-flop with programmable clock and clear inputs, either using a dedicated global clear and global clock, or using the clock and clear generated by the internal logic (product term array). If a flip-flop is not required, it can be bypassed and the signal fed directly to the PIA or output to the I/O pins. The following is an example of a simple circuit that illustrates how a PLD uses the above structure to implement logic.

Figure 3 Assuming that the output of the combinatorial logic (the output of AND3) is f, then f = (A+B)C(!D) = AC!D + BC!D ( we use !D to denote the “not” of D) PLD will implement the combinatorial logic f in the following way:

Figure 4 A,B,C,D are input from the pins of the PLD chip and enter the Programmable Link Array (PIA), which internally produces 8 outputs A,A inverse, B,B inverse, C,C inverse, D,D inverse. Each fork in the diagram is connected (programmable fuse on), so we get: f= f1 + f2 = (AC!D) + (BC!D). This way the combinational logic is implemented. The implementation of the D flip-flop in the circuit of Figure 3 is relatively simple and is implemented directly using the programmable D flip-flop in the macro cell. The clock signal CLK is input from the I/O pin and enters the internal global clock channel, which is directly connected to the clock side of the programmable flip-flop. The output of the programmable flip-flop is connected to the I/O pin, and the result is output to the chip pin. The PLD thus completes the function of the circuit shown in Figure 3. (The circuit in Figure 3 is a very simple example that requires only one macro unit to complete. However, for a complex circuit, one macro cell cannot be implemented, so it is necessary to connect multiple macro cells through parallel extensions and shared extensions, and the output of the macro cell can be connected to the programmable link array and used as the input of another macro cell. This allows the PLD to implement more complex logic. These PLDs based on product terms are basically manufactured by EEPROM and Flash processes, and they work as soon as they are powered on without the need for other chips.

FPGA working principle and introduction

As mentioned earlier, FPGA is the product of further development based on programmable devices such as PAL, GAL, EPLD and CPLD. It emerged as a semi-custom circuit in the field of ASICs, i.e., it solves the shortcomings of custom circuits and overcomes the shortcomings of the limited gate circuits of the original programmable devices.

Since FPGAs need to be repeatedly burned in, the basic structure of its implementation of combinational logic cannot be done by fixed with and without gates like ASICs, but only by a structure that can be easily and repeatedly configured. Lookup tables can meet this requirement very well. Currently, the mainstream FPGAs use the lookup table structure based on SRAM process, and some military and aerospace grade FPGAs use the lookup table structure of Flash or fused and anti-fused process. Repeated configuration of FPGAs is achieved by changing the lookup table contents through burning files.

According to the basic knowledge of digital circuits, it can be known that for an n-input logic operation, no matter it is with or without operation or different or operation, etc., there are at most 2n possible results. The same principle is used in FPGAs, where the contents of the look-up table are configured by burning a file to achieve a different logic function in the same circuit.

A Look-Up-Table (LUT) is essentially a RAM, and currently most FPGAs use 4-input LUTs, so each LUT can be considered as a RAM with 4-bit address lines. The PLD/FPGA development software automatically calculates all possible results of the logic circuit and writes the truth table (i.e., the result) into RAM in advance, so that each signal input for logic operation is equivalent to inputting an address to look up the table, finding out the content corresponding to the address, and then outputting it.

An example of a 4-and-gate circuit is given below to illustrate the principle of LUT implementation of logic functions.

Example: A truth table for a 4-input and gate circuit using an LUT is given.

Table 1-1 Truth Table of 4-input and Gate

As you can see, the LUT has the same functionality as the logic circuit. In fact, LUTs have faster execution speed and larger scale.

Since LUT-based FPGAs are highly integrated, with device densities ranging from tens of thousands to tens of millions of gates, they can perform extremely complex timing and logic combination logic circuit functions, making them suitable for high-speed, high-density high-end digital logic circuit design fields. Its components are mainly programmable input/output units, basic programmable logic units, embedded SRAM, rich wiring resources, underlying embedded functional units, embedded dedicated units, etc. The main design and manufacturers are Xilinx, Altera, LatTIce, Actel, Atmel and QuickLogic, among which the largest are Xilinx, Altera, LatTIce three.

As mentioned earlier, FPGAs are set up by the RAM stored on-chip to set their operating state, so the on-chip RAM needs to be programmed when working. Users can use different programming methods according to different configuration modes.FPGAs have the following configuration modes.

Parallel mode: parallel PROM, Flash configuration FPGA.

Master-slave mode: one PROM configuring multiple FPGAs.

Serial mode: serial PROM configuring FPGAs.

Peripheral mode: the FPGA is used as a peripheral to the microprocessor, and the microprocessor programs it.

Currently, the FPGAs produced by Xilinx and Altera, the two companies with the highest FPGA market share, are based on the SRAM process and require an external off-chip memory to save the program when in use. When power is applied, the FPGA reads the data from the external memory into the on-chip RAM, completes the configuration, and enters the working state; after power-down, the FPGA reverts to a white chip and the internal logic disappears. In this way FPGA not only can be used repeatedly, but also does not need special FPGA programmer, only general-purpose EPROM, PROM programmer. actel, QuickLogic and other companies also provide anti-fuse technology FPGA, which can only be downloaded once, with the advantages of radiation resistance, high and low temperature resistance, low power consumption and fast speed, etc., which are more used in the military and aerospace fields, but this Lattice is the inventor of ISP technology for small scale PLD applications. Early Xilinx products were generally not involved in the military and aerospace grade markets, but several products such as the Q Pro-R have now entered this category.

FPGA Chip Architecture

The mainstream FPGAs are still based on lookup table technology and have far exceeded the basic performance of previous versions and integrate hard-core (ASIC-type) modules with common functions (such as RAM, clock management and DSP). As shown in Figure 1-1 (Note: Figure 1-1 is just a schematic, in fact, each series of FPGA has its corresponding internal structure), FPGA chips are mainly completed by six parts, namely: programmable input and output units, basic programmable logic units, complete clock management, embedded block RAM, rich wiring resources, embedded underlying functional units and embedded dedicated hardware modules.

Figure 1-1 Internal structure of FPGA chip

The functions of each module are as follows.

1. Programmable input and output unit (IOB)

The programmable input/output unit, referred to as I/O unit, is the interface part between the chip and the external circuitry, completing the requirements of driving and matching input/output signals under different electrical characteristics, and its schematic structure is shown in Figure 1-2. The I/Os in the FPGA are classified by groups, and each group can support different I/O standards independently. Through flexible configuration of software, different electrical standards and I/O physical characteristics can be adapted, the size of drive current can be adjusted, and pull-up and pull-down resistors can be changed. Currently, the frequency of I/O ports is also increasing, and some high-end FPGAs can support data rates of up to 2 Gbps through DDR register technology.

Figure 1-2 Schematic diagram of a typical IOB internal structure

External input signals can be input to the internal FPGA through the memory cell of the IOB module, or directly to the internal FPGA. When the external input signal is input to the internal FPGA through the memory cell of the IOB module, the Hold Time requirement can be reduced and is usually set to 0 by default.

In order to facilitate management and adapt to multiple electrical standards, the IOBs of FPGAs are divided into several groups (banks), and the interface standard of each bank is determined by its interface voltage VCCO; a bank can have only one VCCO, but the VCCOs of different banks can be different. Only ports of the same electrical standard can be connected together, and the same VCCO voltage is the basic condition of the interface standard.

2. Configurable Logic Blocks (CLBs)

The actual number and characteristics of CLBs vary by device, but each CLB contains a configurable switch matrix consisting of 4 or 6 inputs, some selective circuitry (multiplexers, etc.), and flip-flops. The switch matrix is highly flexible and can be configured to handle combinational logic, shift registers, or RAM. in Xilinx FPGA devices, the CLB consists of multiple (typically 4 or 2) identical Slice and additional logic, as shown in Figure 1-3. Each CLB module can be used to implement not only combinational logic and timing logic, but also configured as distributed RAM and distributed ROM.

Figure 1-3 Schematic diagram of a typical CLB structure

A Slice is a basic logic unit defined by Xilinx, and its internal structure is shown in Figure 1-4. A Slice consists of two 4-input functions, rounding logic, arithmetic logic, memory logic, and function multiplexer. The arithmetic logic consists of a heterogeneous gate (XORG) and a dedicated with gate (MULTAND). A heterogeneous gate enables a Slice to achieve a 2bit full add operation, and a dedicated with gate is used to improve the efficiency of the multiplier; the feed logic consists of a dedicated feed signal and a function multiplexer (MUXC) for fast arithmetic addition and subtraction operations; the 4-input function generator is used to achieve a 4 input LUTs, distributed RAM or 16-bit shift registers (two input functions in Slice of Virtex-5 series chips are 6-input, which can implement 6-input LUTs or 64-bit shift registers); the feed logic includes two fast feed chains for increasing the processing speed of the CLB module.

Figure 1-4 Schematic diagram of a typical 4-input Slice structure

3. Digital Clock Management Module (DCM)

Most FPGAs in the industry offer digital clock management (all Xilinx FPGAs have this feature.) Xilinx introduces state-of-the-art FPGAs that offer digital clock management and phase loop locking. Phase loop locking provides accurate clock synthesis, reduces jitter, and enables filtering.

4. Embedded Block RAM (BRAM)

Most FPGAs have embedded block RAM, which greatly expands the scope and flexibility of FPGA applications. Block RAM can be configured as single-port RAM, dual-port RAM, content address memory (CAM), and FIFO, and other common storage structures. RAM and FIFO are relatively popular concepts and will not be discussed in detail here. The CAM memory has a comparison logic in each of its internal memory cells. In addition to block RAM, the LUT in the FPGA can be flexibly configured into RAM, ROM and FIFO structures. In practical applications, the amount of block RAM inside the chip is also an important factor in choosing a chip.

The capacity of a single block RAM is 18k bits, i.e., the bit width is 18 bits and the depth is 1024. The bit width and depth can be changed as needed, but two principles must be satisfied: first, the modified capacity (bit width depth) cannot be larger than 18k bits; second, the maximum bit width cannot exceed 36 bits. Of course, it is possible to cascade multiple pieces of RAM to form a larger RAM, which is only limited by the number of blocks of RAM in the chip, and is no longer bound by the above two principles.

5. Rich wiring resources

The wiring resources are connected to all units inside the FPGA, and the length and process of the wiring determine the driving ability and transmission speed of the signals on the wiring. The first category is the global wiring resources for the wiring of the global clock and global reset/setting inside the chip; the second category is the long wiring resources for the wiring of the high-speed signals and the second global clock signals between the chip banks; the third category is the short wiring resources for the logical interconnection and wiring between the basic logic units; the fourth category is the distributed wiring resources for the proprietary clock, reset and other control signal lines.

In practice the designer does not need to select wiring resources directly, and the layout wirer can automatically select wiring resources to connect individual module units based on the topology and constraints of the input logic netlist. In essence, there is a close and direct relationship between the way wiring resources are used and the outcome of the design.

6. Underlying embedded functional units

Embedded functional units mainly refer to soft processing cores (Soft Core) such as DLL (Delay Locked Loop), PLL (Phase Locked Loop), DSP and CPU. Nowadays, more and more abundant embedded functional units make monolithic FPGAs become system-level design tools, making them capable of joint software and hardware design and gradually transitioning to SOC platforms.
DLLs and PLLs have similar functions and can perform clock multiplication and division with high accuracy and low jitter, as well as duty cycle adjustment and shifting, etc. DLLs are integrated on chips from Xilinx, PLLs are integrated on chips from Altera, and both PLLs and DLLs are integrated on new chips from Lattice. The structure of the DLL is shown in Figure 1-5.

Figure 1-5 Schematic diagram of a typical DLL module

7. Embedded Dedicated Hard Core

Embedded dedicated hard core is relative to the underlying embedded soft core, which refers to the hard core (Hard Core) of FPGA with powerful processing capability, equivalent to ASIC circuit. In order to improve FPGA performance, chip manufacturers integrate some dedicated hard cores inside the chip. For example, in order to improve the multiplication speed of FPGA, the mainstream FPGAs have integrated special multipliers; in order to apply the communication bus and interface standards, many high-end FPGAs have integrated serial and parallel transceivers (SERDES) inside, which can reach tens of Gbps transceiver speed.
Xilinx’s high-end products not only integrate the Power PC series CPU, but also embedded DSP Core module, whose corresponding system-level design tools are EDK and Platform Studio, and accordingly put forward the concept of System on Chip (SOC). Through PowerPC, Miroblaze, Picoblaze and other platforms, it is possible to develop standard DSP processors and their related applications for SOC development purposes.

The concept of soft cores, hard cores and solid cores

IP (Intelligent Property) cores are the general term for integrated circuit cores with intellectual property cores, which are macro modules with specific functions that have been repeatedly verified, independent of the chip manufacturing process, and can be ported to different semiconductor processes. By the SOC stage, IP core design has become an important task for ASIC circuit design companies and FPGA providers, and a reflection of their strength. For FPGA development software, the richer the IP cores it provides, the more convenient the user’s design is, and the higher its market share is. At present, IP cores have become the basic unit of system design and are exchanged, transferred and sold as independent design results.

In terms of how IP cores are provided, they are usually classified into 3 categories: soft cores, hard cores, and solid cores. In terms of the cost of completing IP cores, hard cores are the most costly; in terms of flexibility, soft cores are the most reusable.

1. Soft cores


Soft core in EDA design field refers to the register transfer level (RTL) model before synthesis; specifically in FPGA design, it refers to the hardware language description of the circuit, including logic description, netlist and help file. Soft cores are only functionally emulated and need to be synthesized as well as laid out and wired before they can be used. Its advantages are high flexibility, portability, and allowing user self-configuration; disadvantages are low predictability of the module, possibility of errors in subsequent designs, and certain design risks. Soft cores are the most widely used form of IP cores.

2. Solid cores


A solid core in the EDA design field refers to a netlist with floor plan information; specifically in FPGA design it can be seen as a soft core with layout planning, usually provided in the form of a mixture of RTL code and a netlist corresponding to a specific process. The RTL description is combined with a specific standard cell library for a comprehensive and optimized design to form a gate-level netlist, which can then be used by a layout and wiring tool. Compared to soft cores, solid cores are slightly less flexible in design, but have a greater improvement in reliability. Currently, solid cores are one of the mainstream forms of IP cores.

3. Hard core


Hard core in the field of EDA design refers to the verified design layout; specifically in FPGA design refers to the layout and process fixed, front-end and back-end verification of the design, the designer can not be modified. There are two reasons why it cannot be modified: firstly, the system design has very strict timing requirements for each module, which does not allow disrupting the existing physical layout; secondly, the requirement of protecting intellectual property rights does not allow designers to make any changes to it. the no-modification feature of IP hardcore makes it difficult to reuse, so it can only be used for some specific applications and has a narrow scope of use.

    GET A FREE QUOTE

    FPGA IC & FULL BOM LIST

    We'd love to

    hear from you

    Highlight multiple sections with this eye-catching call to action style.

      Contact Us

      Exhibition Bay South Squre, Fuhai Bao’an Shenzhen China

      • Sales@ebics.com
      • +86.755.27389663