How To Mangle And Demangle A C++ Method Name

When developing in C++ a usual task is to demangle the name of a C++ method to pretty-print it. Sometimes one also need to achieve the opposite conversion. This article explains some ways to do this, depending on whether you need a dynamic or static conversion.

Name Mangling

Before going to the details, here is a quick reminder on what is name mangling and why it is used in C++. When an executable is started, some (many) of its symbols addresses are resolved during the startup - at runtime - because these addresses are not known during the static link (i.e. when the executable is generated). One way to resolve them is to search them by symbol name. This is typically what is done when "dlsym" is used, but it is also how the runtime linker resolves the address of the functions implemented in the dynamic libraries.

When developing in C, all public symbols of an executable are unique. It is not possible to have two different variables or functions with the same name. The only exceptions to this are local variables and static variables/functions. However such symbols are private to a function or a file and cannot be retrieved by name at runtime. So in C the symbols signature is the name that appears in the code. In this case the linker can find a function by using its name as the signature.

Things are not that simple in C++. The symbols names cannot just be the name of the functions or methods because the language allows polymorphism: A method can have several prototypes to operate on different kinds of data. for example it is possible to declare "void Foo(int)" and "void Foo(double)". In this case it is not possible to use the function name as a signature without a conflict. This is why name mangling is used: Some additional information is added to the symbol name to make it unique in the executable/library. One can see C++ mangling as a concatenation of the method name with its parameters types. This is a simplification because other things are used to generate a mangled name, but it is the general idea. Once names are mangled, it is possible to resolve symbols at runtime by name without conflict.

When you have a mangled name, there are several ways to demangle it.

Dynamic Demangling

If you need to demangle a name at runtime, then you can use the "demangle" function of the compiler. All compilers export such a function, even if it is not always well documented.

Static Demangling

If you need to demangle the name of the current function/method, then you can simply use the __PRETTY_FUNCTION__ identifier.

If you have a mangled name and need to find its demangled name "manually" then c++filt is the tool you need:

$ c++filt _Znwm
operator new(unsigned long)

Dynamic Mangling

Mangling a name is not so easy as demangling. If you need to demangle a symbol at runtime in your software, then you are out of luck : Nothing seems to exist in the toolchains to do this. A naive but efficient approach is to do this mangling by hand, restricted to the use-case you need. This is typically what is done in the Haiku Archivable class : It instantiates an object from its class name by mangling the name of a static method of the class, searching for it in the symbols list, and calling it. As far as I know, this was also done this way in BeOS.

The drawback of this method is that you need to implement the mangling for all systems that you want to support. The good news is that gcc and clang seem to use (almost) the same mangling specification for all cpu architectures.

Static Mangling

There are also cases where you have a method name and you only need the mangled name as a preprocessor value or a constant. For example this is the case in edleak where we need the C++ allocation functions names to find their address at runtime. For this you can look at the symbols names in a binary and put use them directly in your code. However this works only if the mangled names are exactly the same in all architectures. In the example of edleak this was not the case : The mangled name of "operator new" is different on x86 and x86_64. This because the size of size_t is different on 32bit and 64bit systems (at least on linux). So in this case we need a way to determine the mangled name of the target architecture.

The trick here is to use the compiler, and make it generate an empty function/method with the name we want:

g++ -x c++ -S - -o-

We tell gcc that the language is C++ with "-x", to only assemble the code with "-S", that the input is read from stdin with "-", and that the output is on stdout with "-o-". This allows to get an assembly output of a function. For example:

echo -e "#include <new>\n void* operator new(std::size_t) {} " | g++ -x c++ -S - -o-

Generates the following output on x86_64:

	.file	""
	.text
	.globl	_Znwm
	.type	_Znwm, @function
_Znwm:
.LFB13:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movq	%rdi, -8(%rbp)
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE13:
	.size	_Znwm, .-_Znwm
	.ident	"GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4"
	.section	.note.GNU-stack,"",@progbits

On a mipsel toolchain we have:

	.file	1 ""
	.section .mdebug.abi32
	.previous
	.gnu_attribute 4, 1
	.abicalls
	.option	pic0
	.text
	.align	2
	.globl	_Znwj
$LFB13 = .
	.cfi_startproc
	.set	nomips16
	.ent	_Znwj
	.type	_Znwj, @function
_Znwj:
	.frame	$fp,8,$31		# vars= 0, regs= 1/0, args= 0, gp= 0
	.mask	0x40000000,-4
	.fmask	0x00000000,0
	.set	noreorder
	.set	nomacro
	addiu	$sp,$sp,-8
	.cfi_def_cfa_offset 8
	sw	$fp,4($sp)
	move	$fp,$sp
	.cfi_offset 30, -4
	.cfi_def_cfa_register 30
	sw	$4,8($fp)
	move	$sp,$fp
	lw	$fp,4($sp)
	addiu	$sp,$sp,8
	j	$31
	nop

	.set	macro
	.set	reorder
	.end	_Znwj
	.cfi_endproc
$LFE13:
	.size	_Znwj, .-_Znwj
	.ident	"GCC: (Broadcom stbgcc-4.5.4-2.8) 4.5.4"

And on an arm toolchain the result is:

	.cpu arm10tdmi
	.fpu softvfp
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 6
	.eabi_attribute 18, 4
	.file	""
	.text
	.align	2
	.global	_Znwj
	.type	_Znwj, %function
_Znwj:
	.fnstart
.LFB13:
	.cfi_startproc
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 1, uses_anonymous_args = 0
	@ link register save eliminated.
	str	fp, [sp, #-4]!
	.cfi_def_cfa_offset 4
	add	fp, sp, #0
	.cfi_offset 11, -4
	.cfi_def_cfa_register 11
	sub	sp, sp, #12
	str	r0, [fp, #-8]
	mov	r0, r3
	add	sp, fp, #0
	ldmfd	sp!, {fp}
	bx	lr
	.cfi_endproc
.LFE13:
	.cantunwind
	.fnend
	.size	_Znwj, .-_Znwj
	.ident	"GCC: (Broadcom stbgcc-4.5.4-2.8) 4.5.4"
	.section	.note.GNU-stack,"",%progbits

As you can see the name is different on the 64bit architecture and the 32bits ones : _Znwm vs _Znwj. The interesting point is that all architectures label the entry point of the function the same way. So we can find the symbol name by searching a line that starts with underscore, and ends with a dollar. So a simple shell script can be used to find a mangled name:

#! /bin/sh

echo -e "#include <new>\n $1 {} " | g++ -x c++ -S - -o- | grep "^_.*:$" | sed -e 's/:$//'

and use it this way:

./mangle.sh "void* operator new(std::size_t)"

If you need it from autoconf, I made an m4 script for edkit. You can get it from github.

Conclusion

Now you have several way to mangle and demangle a symbol name. If you are aware of other ways to do it, do not hesitate to comment about it.

Romain.

They posted on the same topic

Trackback URL : https://blog.oakbits.com/index.php?trackback/36

This post's comments feed