Showing posts with label cpu. Show all posts
Showing posts with label cpu. Show all posts

Wednesday, October 10, 2018

How to identify CPU processor architecture on Linux


Multi-core processor architecture becomes increasingly popular nowadays. This trend is accelerated by the need for supporting high-performance computing applications, hardware virtualization, and server consolidation in data centers. If you are a server administrator and a cloud architect, you must be full aware of the CPU processor architecture of your servers so that deployed applications can take full advantage of underlying hardware capability.

The trend of high core density hardware also guides the evolution of software development, introducing new types of parallel programming models. Multi-threaded applications developed under these models must be able to leverage parallel execution across different cores, multi-level cache, CPU/memory affinity, etc.

In this tutorial, I describe how to identify CPU processor architecture from the command line on Linux. A CPU processor architecture is characterized by the number of physical sockets/processors, the number of cores per processor, multi-level (L1/L2/L3) cache, NUMA (Non-uniform memory access) configuration, etc.

Method One
likwid (Like I Knew What I’m Doing) is a suite of command line tools that are designed to support application designers for multi-threaded application development. likwid works with Linux kernel 2.6 and higher, and is regularly updated to support the latest generations of Intel/AMD processors, such as Intel's Sandy, Ivy, Haswell, Broadwell, Skylake processors, and AMD K8, K10, and Bulldozer (Interlagos).

To install likwid on Linux:
$ tar xvfvz likwid-3.0.0.tar.gz
$ cd likwid-3.0.0
$ sudo make install
  • likwid comes with several command-line tools:
  • likwid-topology: Display the NUMA and cache topology.
  • likwid-perfctr: Display the hardware performance counters of processors.
  • likwid-features: Display and change hardware prefetch control bits on Intel Core 2 processors.
  • likwid-pin: Pin a multi-threaded application to a specific CPU.
  • likwid-bench: Benchmarking tool for rapid prototyping of threaded assembly kernels.
  • likwid-mpirun: Script enabling CPU pinning of MPI and MPI/threaded hybrid applications.
  • likwid-perfscope: Frontend for likwid-perfctr which allows real-time plotting of performance metrics.
  • likwid-powermeter: Tool for accessing RAPL counters and query Turbo mode steps on Intel processor.
  • likwid-memsweeper: Tool to clean up ccNUMA (cache-coherent NUMA) memory domains.
To visualize the CPU processor architecture:
$ likwid-topology -g
-------------------------------------------------------------
CPU type:    Intel Core Westmere processor
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets:    2
Cores per socket:    4
Threads per core:    2
-------------------------------------------------------------
HWThread    Thread        Core        Socket
0        0        0        0
1        0        0        1
2        0        10        0
3        0        10        1
4        0        1        0
5        0        1        1
6        0        9        0
7        0        9        1
8        1        0        0
9        1        0        1
10        1        10        0
11        1        10        1
12        1        1        0
13        1        1        1
14        1        9        0
15        1        9        1
-------------------------------------------------------------
Socket 0: ( 0 8 4 12 6 14 2 10 )
Socket 1: ( 1 9 5 13 7 15 3 11 )
-------------------------------------------------------------

*************************************************************
Cache Topology
*************************************************************
Level:    1
Size:    32 kB
Cache groups:    ( 0 8 ) ( 4 12 ) ( 6 14 ) ( 2 10 ) ( 1 9 ) ( 5 13 ) (
7 15 ) ( 3 11 )
-------------------------------------------------------------
Level:    2
Size:    256 kB
Cache groups:    ( 0 8 ) ( 4 12 ) ( 6 14 ) ( 2 10 ) ( 1 9 ) ( 5 13 ) (
7 15 ) ( 3 11 )
-------------------------------------------------------------
Level:    3
Size:    12 MB
Cache groups:    ( 0 8 4 12 6 14 2 10 ) ( 1 9 5 13 7 15 3 11 )
-------------------------------------------------------------

*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 2
-------------------------------------------------------------
Domain 0:
Processors:  0 2 4 6 8 10 12 14
Relative distance to nodes:  10 20
Memory: 4207.48 MB free of total 8181.75 MB
-------------------------------------------------------------
Domain 1:
Processors:  1 3 5 7 9 11 13 15
Relative distance to nodes:  20 10
Memory: 4020.77 MB free of total 8192 MB
-------------------------------------------------------------

*************************************************************
Graphical:
*************************************************************
Socket 0:
+-----------------------------------------+
| +-------+ +-------+ +-------+ +-------+ |
| |  0  8 | | 4  12 | | 6  14 | | 2  10 | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| |  32kB | |  32kB | |  32kB | |  32kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| | 256kB | | 256kB | | 256kB | | 256kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------------------------------------+ |
| |                 12MB                | |
| +-------------------------------------+ |
+-----------------------------------------+
Socket 1:
+-----------------------------------------+
| +-------+ +-------+ +-------+ +-------+ |
| |  1  9 | | 5  13 | | 7  15 | | 3  11 | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| |  32kB | |  32kB | |  32kB | |  32kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------+ +-------+ +-------+ +-------+ |
| | 256kB | | 256kB | | 256kB | | 256kB | |
| +-------+ +-------+ +-------+ +-------+ |
| +-------------------------------------+ |
| |                 12MB                | |
| +-------------------------------------+ |
+-----------------------------------------+
The above is an example output of HP ProLiant DL380 G7, where it shows two physical sockets, Hyper-Threading enabled quad-core CPU in each socket, 32kB L1 cache, 256kB L2 cache, and 12MB L3 cache.

Method Two
hwloc is a command-line suite that gathers various attributes of the underlying processor architecture, such as NUMA memory nodes, multi-level caches, processor sockets, processor cores, PCI devices/bridges, etc.

To install hwloc on Debian, Ubuntu or Linux Mint:
$ sudo apt-get install hwloc
To install hwloc on Fedora, CentOS or RHEL:
$ sudo yum install hwloc
Once hwloc package is installed, you can use lstopo to show processor architecture as follows.
$ lstopo --no-io
If you are running lstopo in Linux desktop environment, it will pop up a window which visualizes the underlying processor architecture and cache hierarchy nicely as follows.

If lstopo is called in a desktop-less server environment, it will show the output in text format as follows.
Machine (16GB)
  NUMANode L#0 (P#0 8182MB) + Socket L#0 + L3 L#0 (12MB)
    L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
      PU L#0 (P#0)
      PU L#1 (P#8)
    L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
      PU L#2 (P#2)
      PU L#3 (P#10)
    L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
      PU L#4 (P#4)
      PU L#5 (P#12)
    L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
      PU L#6 (P#6)
      PU L#7 (P#14)
  NUMANode L#1 (P#1 8192MB) + Socket L#1 + L3 L#1 (12MB)
    L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4
      PU L#8 (P#1)
      PU L#9 (P#9)
    L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5
      PU L#10 (P#3)
      PU L#11 (P#11)
    L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6
      PU L#12 (P#5)
      PU L#13 (P#13)
    L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7
      PU L#14 (P#7)
      PU L#15 (P#15)
You can let lstopo export processor architecture visualization to a separate image file by specifying an output file as follows.
$ lstopo --no-io topo.png
Method Three
numactl is a command line tool for tuning NUMA hardware (such as pinning processes or threads to specific physical cores or ccNUMA nodes).

To install numactl on Debian, Ubuntu or Linux Mint:
$ sudo apt-get install numactl
To install numactl on Fedora, CentOS or RHEL:
$ sudo yum install numactl
If you want to check available NUMA nodes with numactl, do the following:
$ numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14
node 0 size: 8181 MB
node 0 free: 4235 MB
node 1 cpus: 1 3 5 7 9 11 13 15
node 1 size: 8191 MB
node 1 free: 4048 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10
Share:

Tuesday, October 9, 2018

How to find the number of CPU cores on Linux

Multi-core CPU processors are common nowadays, including dual-core processors (e.g., Intel Core Duo), quad-core processors (e.g., Intel Core i5), and hexa-core processors (e.g., AMD Phenom II X6). Also, many server-grade physical machines are equipped with more than one CPU processor. In order to find the number CPUs and the number of cores per CPU, you can refer to /proc/cpuinfo.

A sample /proc/cpuinfo of HP Proliant DL 380 G7 server is as follows. The HP Proliant server is equipped with two Intel Xeon 5600 series processors.
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping        : 2
cpu MHz         : 2399.316
cache size      : 12288 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr dca sse4_1 sse4_2 popcnt lahf_lm ida
bogomips        : 4802.28
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
. . . .
To find the number of physical CPUs:
$ cat /proc/cpuinfo | grep "^physical id" | sort | uniq | wc -l
2
To find the number of cores per CPU:
$ cat /proc/cpuinfo | grep "^cpu cores" | uniq
cpu cores       : 4
The total number of processors available is the number of physical CPUs multiplied by the number of cores per CPU. To find the total number of processors:
$ cat /proc/cpuinfo | grep "^processor" | wc -l
16
Note that Intel Xeon 5600 series processors have Intel Hyper-Threading capability. So each core shows up as "two" processors in Linux, and thus the total processor count seen by Linux is 16 (= 2 x 4 x 2).
Share: