if(($ACT == 'edit' || $ACT == 'preview') && $INFO['editable']){ ?> } else { ?> } ?>
I was recently looking for a way to ship an image with an executable without referring to the image as external file. Now you can argue if it is a good idea in general, but that's another story.
Since I found very little information on the subject and needed to piece together information from various sources, I conclude the task with this comprehensive write up on the subject. The mechanisms described here apply to including any resource - not just images - into an executable c-program.
There are various ways to embed generic fixed data within an application. Let's start with a basic example that should be familiar:
const char* mystring = "hello world";
Every application is just a collection of Data segments stored in a file. On the modern PC architecture, every application binary consists of three major segments Stack, Data and Code. Without going into details, fixed data - such as constant text-strings reside in the read-only part of the Data Segment of the application binary and the c-compiler will automatically create a memory region for it.
When you compile the above line with gcc -c -o mytext.o mytext.c
and inspect the object file with objdump -t mytext.o
you'll find the data-section in the object file.
00000000 l d .text 00000000 .text 00000000 l d .data 00000000 .data 00000000 l d .rodata 00000000 .rodata 00000000 g O .data 00000004 mystring
which basically says: “mystring' is a read-only global (g
) text object (O
) and can be found at offset 0x00 in the object-file. For a detailed description of all the values see man objdump
.
Now one solution would be to simply write the code in c and have the compiler take care of everything by using
const unsigned char* myimage = "BIG BLOCK OF RESOURCE DATA ENCODED AS STRING";
To encode the data of a given file, the xxd tool that comes with the vim text editor can be used: xxd -i binary_file
outputs C include file style of the given binary file and writes a complete static array definition named after the input file:
# cat /tmp/example Hello World # xxd -i /tmp/example unsigned char _tmp_example[] = { 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64, 0x0a }; unsigned int _tmp_example_len = 12;
Using this variant produces portable C code and just works(TM). On the downside, it requires at least five times the size of the original data (” 0xNN,” for every original data-byte) and the source code needs to be updated every time the original data changes.
While this approach is useful in some cases, we can do better than that.
The GNU linker can be used to directly create object files with a custom .data
section directly from any input file. The /magic/ flags are -r
to make the object file relocatable and -b binary
for linking files with an unusual binary format.
# ld -r -b binary -o example.o example.jpg
The resulting object file example.o
can be linked with any application, simply with gcc itself. e.g. gcc -o myapp myapp.c example.o
. Now the last missing link is to access the data in example.o
from the c-code in myapp.c. Have a look at the output of the object and compare it with above output for the mystring object.
# objdump -t example.o SYMBOL TABLE: 00000000 l d .data 00000000 .data 00004fc0 g .data 00000000 _binary_example_jpg_end 00004fc0 g *ABS* 00000000 _binary_example_jpg_size 00000000 g .data 00000000 _binary_example_jpg_start
The ”[..] g .data
[…] <name>
” part should ring a bell. This data-section can be referenced from the C code simply by using:
extern const unsigned char _binary_example_jpg_start[];
and the length can be calculated from
extern const unsigned char _binary_example_jpg_start[]; extern const unsigned char _binary_example_jpg_end[]; size_t len = _binary_example_jpg_end - _binary_example_jpg_start;
or simply by referencing the address of _binary_example_jpg_size
.
The linker will resolve the extern reference when linking the application and the application will use the data just like a fixed blob in the source-code.
So far so good.
Now for the tricky part: The GNU linker behaves differently depending on platform and architecture. The implementations interesting for me are GNU/Linux, OSX and mingw (cross-compiling windows binaries on a GNU/Linux host).
The mingw cross-compiler behaves almost exactly as gnu-ld with one minor difference: the data section does not include the leading underscore: _binary_example_jpg_start
vs binary_example_jpg_start
. – Fine, there goes some of the elegance of the solution, but that case is easily handled with an #ifdef
.
However, Mac/OSX is different. The ld
which is shipped with X-code comes from llvm version 2.7svn and does not support the -b input-format
feature. Furthermore universal executables on OSX may comprise binary formats for various architectures with the .data section format being different for each architecture. The alignment for the data may differ between 32bit and 64bit architectures and the endianess may differ as well. Thus the creation of the data section needs to be done during compilation instead of the linking stage.
On OSX ld's binary linking feature has been moved into their customized gcc, and is available via '-sectcreate' option:
gcc -sectcreate __DATA __example_jpg example.jpg -o myapp myapp.c
To create a universal build for Intel architectures, add -arch i386 -arch x86_64
to above commandline. objdump
is also a GNU tool which is not available on OSX. You can inspect the data section using otool -s __DATA __example_jpg /path/to/executable
. see man otool
for details there.
Due to the nature of OSX binaries, referencing the data-section in the c-code is not possible with a simple extern unsigned char
. The linker does not know which architecture will be used and can not provide an address. The mach binary format which is used by OSX needs to be inspected at runtime when the architecture is known and map the relevant data after the application is started. Apple provides an API for doing that which is defined in the mach-o/getsect.h
header file. If you have x-code installed you can read documentation on it at man getsectbyname
.
Resolving the secion can be only be done at runtime after the data section has been relocated and can done by calling getsectbyname()
. However there is a trick that you can use, to make this implicit. the meta-variable _section$
is recognized by the gcc compiler on OSX. It produces the same result as calling getsectbyname()→addr
. Short of reading the actual code, information about osx linker internals is not easy to come by. getsectbyname()
actually opens the executable file and searches the relevant data section while the application is running. _section$
may or may not already be resolved at link-time for a given architecture 1).
Update (Oct 2016 - Thanks to Eugene Gershnik): On newer versions of OSX/macOS that run executables with ASLR, the call to `getsectbyname` needs to be replaced with `getsectiondata` 2). However this API is only available from OS 10.7 onwards. –
Long story short, one can use a macro abstraction that works x-platform. To access the data and size LDVAR()
and LDLEN()
are defined, for the actual external definition of the symbol, EXTLD()
is used:
#ifdef __APPLE__ #include <mach-o/getsect.h> #define EXTLD(NAME) \ extern const unsigned char _section$__DATA__ ## NAME []; #define LDVAR(NAME) _section$__DATA__ ## NAME #define LDLEN(NAME) (getsectbyname("__DATA", "__" #NAME)->size) #elif (defined __WIN32__) /* mingw */ #define EXTLD(NAME) \ extern const unsigned char binary_ ## NAME ## _start[]; \ extern const unsigned char binary_ ## NAME ## _end[]; #define LDVAR(NAME) \ binary_ ## NAME ## _start #define LDLEN(NAME) \ ((binary_ ## NAME ## _end) - (binary_ ## NAME ## _start)) #else /* gnu/linux ld */ #define EXTLD(NAME) \ extern const unsigned char _binary_ ## NAME ## _start[]; \ extern const unsigned char _binary_ ## NAME ## _end[]; #define LDVAR(NAME) \ _binary_ ## NAME ## _start #define LDLEN(NAME) \ ((_binary_ ## NAME ## _end) - (_binary_ ## NAME ## _start)) #endif
Example usage in a C Program:
// define the external variable EXTLD(example_jpg) void some_function() { // access the data size_t length = LDLEN(example_jpg); uint8_t *data = LDVAR(example_jpg); }
As a final note, some care must be taken when choosing the variable identifier.
ld
will use the filename to generated the section name. If the filename includes characters that are not valid C identifiers they will be transformed to underscores.
e.g. ld -r -b binary -o example.o ../images/example.jpg
will create a region _binary____images_example_jpg
. The ../
as well as the slash and dot are transformed to underscores.
This is not an issue on OSX where the identifier needs to be specified with the -sectcreate
option. However identifiers on OSX are limited to 16 characters.
So in order to use above approach x-platform, the path to the file-name passed to ld
must be <16 chars and the same identifier needs to be specified on the OSX compile command.
A complete project that uses this approach to include a jpeg image file and a javascript text file is harvid. It also outlines how to use a x-platform Makefile
for creating the object files and adding the relevant flags to the OSX gcc command.
User ColaEuphoria points out that an alternative for x86 architecture is to use assembly
section .rodata global _my_data; _my_data: incbin "my_data.file" _my_data_size: dd $-_my_data
and compile it with `nasm -felf64 resource.s -o resource.o
` (Note that -felf64
here is Linux 64bit. The options needs to be replaced with the corresponding target architecture and OS).
The data can then be referenced using
extern const unsigned char _my_data[];
with gcc. MSVC does not need the leading underscore in c-code and one can reference it using my_data
.