Multiprocessing, DDR partitioning and sharing data.

Tim Fernandez-Hart

Zynq MPSoC devices from Xilinx have several processors built in. As the complexity of a system increases, it starts to become necessary to use multiple processors to run different software to perform several tasks at the same time. This is multiprocessing, and this is what we are going to walk through in this post.

I am using the ZCU104 board, with Vivado and Vitis 2021.2. This board contains a whole load of IO and peripherals and has a Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC at its centre. Figure 1 shows its block diagram.

Figure 1 – Zynq UltraScale+ XCZU7EV-2FFVC1156 MPSoC block diagram.

In terms of processors, it has four ARM A53 cores, two R5 real-time processors and an ARM Mali GPU. Each of these can run an application in isolation and share data by reading and writing to the same address in system memory.

Today we are going to get two A53 cores up and running and operating on the same data with two bare metal applications. Concurrency of data can obviously be an issue in these cases and it is up to the programmer to make sure that data is only accessed when appropriate. There are several ways of coordinating this sort of behaviour between cores. One way is through interrupts, the other is by using a flag and polling. In this example we will use a flag, which we shall store in the on-chip-memory (OCM). This is an intermediate level tutorial so I won’t labour over creating new projects etc.

Create a project in Vivado for the board/ part you are using. Create a block diagram, and add the zynq processing system. Run block automation, and then double click to PS to customise it. Here you can see how the different processors connect to system DDR (Figure 2) via various interconnects.

Figure 2. Zynq UltraScale+ Vivado block diagram

This is mainly a software based project so the hardware design in Vivado can be minimal. Remove one AXI HPM1 FPD interface (Figure 3), connect the PL-PL clocks (Figure 4), create a top-level HDL wrapper and generate the bit-stream. Export the .xsa and make a note of where it is saved.

Figure 3 – Disable AXI HPM1 FPD interface

Figure 4 – Final block diagram

Open Vitis and create a new platform project. Navigate to the .xsa file you just created in Vivado, select standalone and psu_cortexa53_0 (Figure 5). You can have a look at the different processors that are available on your system using the drop down menu.

Figure 5 – Create a platform project

Here we have two ‘domains’ already configured in Vitis. Namey, psu_cortexa53_0 and psu_pmu_0 (Figure 6). Each domain describes the environment for a particular processor. Since we will need a second processor, we will add a domain. To do this, click the + button top-right and select add domain.

Figure 6 – Two auto-generated domains

Figure 7 – Create a new domain on A53 core 1

Figure 7 shows the dialog box after you choose to add a new domain. Select psu_cortexa53_1 as the processor, but you could easily choose another if you wish.

Finally, and optionally, I changed the display name of both cores to core_0, running on psu_cortex153_0 and core_1, running on psu_cortexa53_1 (Figure 8).

Figure 8 – Two defined domains setup on two separate cores and renamed

Build the platform project. Now create a new Application Project, you should see the two domains already assigned to different processors (Figure 9).

Figure 9

Name the application “core_o_app” and assign it to the processor psu_cortexa53_0. You could choose to assign it to any defined domain (aka processor) but let’s keep things simple.
Click next and select the HelloWorld template application then Finish.

Now we will create another application and assign it to the second domain which will run on the second A53 processor. In the Explorer window of the main view in Vitis, right click the top level application and select create new application. Click next, and assign the application name core_1_app. To get the two applications running in the same project, select core_0_app_system from the left hand side, which will auto-fill in core_0_app into psu_cortexa53_0 processor. Your project should now look like Figure 10.

Figure 10

Click next and again select the HelloWorld template application and click finish.

Now we have defined a single application which has two pieces of software which will run independently on two A53 cores. Your Explorer view should look like Figure 11.

Figure 11

Now we need to partition the memory so each application has its own address space. Then we are going to add a new memory partition which will be shared by both cores so they can pass data between each other.
If you open up the src folder from either application and double click the ldscript.ld file e.g. core_0_app > src > ldscript.ld, this is where we can partition the system memory.
Initially, both memory spaces will be the same as shown in Figure 12.

Figure 12

This will crash both applications as they will be writing over one another. We will assign the memory addresses as follows:

Core 0
Base Address: 0x0
Size: 0x100000

Core 1
Base Address: 0x100000
Size: 0x100000

Now we can add the shared memory partition. Click the Add Memory button to the right and add a section called shared_ddr.
Its Base Address will be 0x100000 + 0x100000 = 0x200000. With a size of 0x100000.
The final memory partitions should look like Figure 13 for core_0_app and Figure 14 for core_1_app. Also note the address of the on-chip-memory, psu_ocm_ram_0_MEM_0 begins at 0xFFFC0000 we will use this later.

Figure 13 – core_0_app linker script

Figure 14 – core_1_app linker script

These sizes are fine for the HelloWorld app as it’s small. If you choose a larger application such as the lwip echo template, then your build will fail. In this case, increase the size of the memory partition for that application and remember to resize any other memory partitions that are adjacent to it to avoid any conflicts.

Now we are set up and ready to code our applications. To avoid confusion, rename the two helloworld.c files in each application to helloworld_0.c and helloworld_1.c.

The source code for each file is given below. Essentially, we define a flag called CORE_FLAG which is given the address of the first bit in the on-chip-memory. In this case that’s 0xFFFC0000 (see Figure 13 & 14 above). If the flag is set to 0 then core_0 is doing the reading and writing to shared memory, and if the flag is set to 1 then core_1 is working with the shared memory. Each source file also has the definition for SHARED_MEMORY_BASE and declares a pointer to it. Two other functions Xil_Out32() and Xil_In32() from “xil_io.h” are used to read and write to the shared memory. When each has control, they read the current value, add either 4 (core_0) or 1 (core_1) then they write it back to DDR and relinquish control to the other processor, allowing it to operate on the shared data.

The final output is shown in Figure 15.

Figure 15

helloworld_0.c
#include <stdio.h>
#include <sleep.h>
#include “xil_io.h”
#include “platform.h”
#include “xil_cache.h”

// Flag to determine who is reading and writing the shared number
#define CORE_FLAG (*(volatile unsigned long *)(0xFFFC0000))
//Shared memory Addresses
#define SHARED_MEMORY_BASE 0X200000

int main()
{
init_platform();

    int n = 1;
    int shared_number = 0;
    int temp_num = 9;
    int *transmit_ptr = (int*) SHARED_MEMORY_BASE;

print(“CPU0: init_platform\n\r”);
CORE_FLAG = 0;

Xil_Out32(transmit_ptr, shared_number);

    while(1){
        // Read current value print it
        temp_num = Xil_In32(transmit_ptr);

printf(“0: number is currently: %d \n”, (int)temp_num);

        // Add 4 to the value and save it back to DDR
        temp_num += 4;
        Xil_DCacheFlushRange(transmit_ptr, sizeof(int));
        Xil_Out32(transmit_ptr, temp_num);

sleep(1);

        CORE_FLAG = 1;
        while(CORE_FLAG == 1){
        }
    }

cleanup_platform();
return 0;
}

helloworld_1.c
#include <stdio.h>
#include <sleep.h>
#include “xil_io.h”
#include “platform.h”
#include “xil_cache.h”

#define CORE_FLAG (*(volatile unsigned long *)(0xFFFC0000))

#define SHARED_MEMORY_BASE 0X200000

int main()
{
    int shared_number;
    int temp_num;
    int *transmit_ptr = (int*) SHARED_MEMORY_BASE;

init_platform();
print(“CPU1: init_platform\n\r”);

    while(1){
        while(CORE_FLAG == 0){
        };
        temp_num = Xil_In32(transmit_ptr);
        printf(“1: number is currently: %d \n”, (int)temp_num);
        temp_num += 1;
        Xil_DCacheFlushRange(transmit_ptr, sizeof(int));
        Xil_Out32(transmit_ptr, temp_num);

     // print(“ARM core 1\n”);
        sleep(1);
        CORE_FLAG = 0;
    }

cleanup_platform();
return 0;
}

<a href=”https://www.freepik.com/free-photo/closeup-electronic-circuit-board-with-cpu-microchip-electronic-components-background_1193001.htm#query=processor&position=9&from_view=search”>Image by xb100</a> on Freepik

Multiprocessing, DDR partitioning and sharing data.

Contact us for more information