Source http://ilinuxkernel.com/files/5/Linux_Kernel_Source_Code.htm

1 前言

Linux內核源碼主要以C語言為主，有一小部分涉及彙編語言，編譯器使用的是Gcc。初次看內核源碼，會遇到一些難以理解、晦澀的代碼；而恰恰是這些晦澀的代碼，在內核源碼中經常出現。把一些晦澀、常見的代碼看懂後，大家會發現看內核代碼越來越順利。

本文以x86_64架構中的Linux 2.6.32-71.el6（RHEL 6）源碼為例，選擇一些經常出現且晦澀的源碼進行解釋，選擇的源碼雖以2.6.32-71.el6為例，但很多內容同樣使用其他版本的源碼。主要內容包括Gcc中C語言的擴展用法、及其他一些雜項。

2 Gcc中C語言的擴展用法

2.1 attribute

在我們看文件系統（File Sytems）或頁面緩存（Page Cache）管理內容時，會經常遇到struct address_space數據結構，其定義在include/linux/fs.h中。

00624: struct address_space {

00625: struct inode *host; / * owner: inode, block_device */

00626: struct radix_tree_root page_tree; / * radix tree of all pages */

00627: spinlock_t tree_lock; / * and lock protecting it */

00628: unsigned int i_mmap_writable;/ * count VM_SHARED mappings */

00629: struct prio_tree_root i_mmap;

00629: / * tree of private and shared mappings */

00630: struct list_head i_mmap_nonlinear;/ *list VM_NONLINEAR mappings */

00631: spinlock_t i_mmap_lock; / * protect tree, count, list */

00632: unsigned int truncate_count; / * Cover race condition with truncate */

00633: unsigned long nrpages; / * number of total pages */

00634: pgoff_t writeback_index;/ * writeback starts here */

00635: const struct address_space_operations *a_ops;/ * methods */

00636: unsigned long flags; / * error bits/ gfp mask */

00637: struct backing_dev_info *backing_dev_info; / * device readahead, etc */

00638: spinlock_t private_lock; / * for use by the address_space */

00639: struct list_head private_list; / * ditto */

00640: struct address_space *assoc_mapping;/ * ditto */

00641: } __attribute__((aligned(sizeof(long))));

大家注意到，在結構體定義結束出__attribute__((aligned(sizeof(long))))。

這句的作用是什麼？對結構體的定義有什麼影響？

對於關鍵字__attribute__，在標準的C語言中是沒有的。它是Gcc中對C語言的一個擴展用法。關鍵字__attribute__可以用來設置一個函數或數據結構定義的屬性。對一個函數設置屬性的主要目的是使編譯器對函數進行可能的優化。對函數設置屬性，是在函數原型定義中設置，如下面一個例子：

void fatal_error() __attribute__ ((noreturn));

. . .

void fatal_error(char *message)

{

fprintf(stderr,"FATAL ERROR: %s\n",message);

exit(1);

}

在這個例子中，noreturn屬性告訴編譯器，這個函數不返回給調用者，所以編譯器就可以忽略所有與執行該函數返回值有關的代碼。

可以在同一個定義中，設置多個屬性，各個屬性用逗號分開即可。如下面的定義就是告訴編譯器，它不改變全局變量和該函數不能擴展為內聯函數。

int getlim() __attribute__ ((pure,noinline));

屬性（attributes）也可以用來設置變量和結構體的成員。如，為了保證結構體中的一個成員變量與結構體有特殊方式的對齊（alignment），可以用以下形式定義：

struct mong {

char id;

int code __attribute__ ((align(4)));

};

address_space結構體中，顯然__attribute__是用來設置結構體struct address_space的，就是給該結構體設置一個屬性。設置什麼樣的屬性呢？該結構體的屬性是aligned(sizeof(long)) ，就是設置struct address_space結構體按sizeof(long)個字節對齊。

這裡的屬性aligned的含義是：設置與內存地址對齊（alignment）的方式。如

int alivalue __attribute__ ((aligned(32)));

變量alivalue的地址就是32字節對齊。對於我們內核源碼的例子，當然屬性有很多中，不僅僅是aligned，比如還有deprecated、packed、unused等。並且設置變量或結構體的屬性，與設置函數的屬性有所不同。

GCC對C語言的擴展，更多內容請參考鏈接。http://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html#C-Extensions

我們再來看一個實例代碼摘自linux/include/module.h

00083: #ifdef MODULE

00084: #define MODULE_GENERIC_TABLE(gtype,name) \

00085: extern const struct gtype##_id mod_##gtype##_table \

00086: __attribute__ ((unused, alias(__stringify(name))))

00087:

00088: extern struct module __this_module;

00089: #define THIS_MODULE (& this_module )

00090: #else / * ! MODULE */

00091: #define MODULE_GENERIC_TABLE(gtype,name)

00092: #define THIS_MODULE ((struct module *)0)

00093: #endif

注意到86行的__attribute__ ((unused, alias(__stringify(name))))。前面已經提到，可以為一個變量或函數設置多個屬性（attribute），各個屬性之間用逗號隔開。86行的宏有兩個屬性：unused和alias。unused使該類型的數據項顯示為未被使用的，這樣編譯時就不會產生任何告警信息；alias使該定義是其他符號的別名。如

void __f () { /* Do something. */; }

void f () __attribute__ ((weak, alias ("__f")));

定義“f”是“__f”的一個弱別名。

2.2 關鍵字替代

先看一段源碼，摘自include/linux/compiler-gcc.h。

00010: / * Optimization barrier */

00011: / * The "volatile" is due to gcc bugs */

00012: #define barrier() __asm __volatile__("": : :"memory")

在文件arch/x86/include/asm/msr.h另外一段代碼。

00076: static inline unsigned long long native_read_msr_safe(unsigned int msr,

00077: int *err)

00078: {

00079: DECLARE_ARGS(val, low, high);

00080:

00081: asm volatile("2: rdmsr ; xor %[err],%[err]\n"

00082: "1:\n\t"

00083: ".section .fixup,\"ax\"\n\t"

00084: "3: mov %[fault],%[err] ; jmp 1b\n\t"

00085: ".previous\n\t"

00086: _ASM_EXTABLE(2b, 3b)

00087: : [err] "=r" (*err), EAX_EDX_RET(val, low, high)

00088: : "c" (msr), [fault] "i" (- EIO));

00089: return EAX_EDX_VAL(val, low, high);

00090: }

00091:

給出的兩段代碼都使用了嵌入式彙編。但不同的是關鍵字的形式不一樣。一個使用的是__asm__，另外一個是asm。事實上，兩者的含義都一樣。也就是__asm__等同於asm，區別在於編譯時，若使用了選項-std和-ansi，則關閉了關鍵字asm，而其替代關鍵字__asm__仍然可以使用。

類似的關鍵字還有__typeof__和__inline__，其等同於typeof和inline。

2.3 typeof

在內核雙鏈表include/linux/kernel.h中，有以下一段代碼。該宏的具體含義，這裡不多作解釋，後面的章節會介紹。這裡我們關注一個關鍵字typeof。

00669: / **

00670: * container_of - cast a member of a structure out to the containing structure

00671: * @ptr: the pointer to the member.

00672: * @type:the type of the container struct this is embedded in.

00673: * @member: the name of the member within the struct.

00674: *

00675: */

00676: #define container_of(ptr, type, member) ({ \

00677: const typeof( ((type *)0)- >member ) *__mptr = (ptr); \

00678: (type *)( (char *)__mptr - offsetof(type,member) );})

00679:

從字面意思上理解，typeof就是獲取其類型，其含義也正是如此。關鍵字typeof返回的是表達式的類型，使用上類似於關鍵字sizeof，但它的返回值是類型，而不是一個大小。下面是一些例子：

char *chptr; // A char pointer

typeof (*chptr) ch; // A char

typeof (ch) *chptr2; // A char pointer

typeof (chptr) chparray[10]; // Ten char pointers

typeof (*chptr) charray[10]; // Ten chars

typeof (ch) charray2[10]; // Ten chars

2.4 asmlinkage

asmlinkage在內核源碼中出現的頻率非常高，它是告訴編譯器在本地堆棧中傳遞參數，與之對應的是fastcall；fastcall是告訴編譯器在通用寄存器中傳遞參數。運行時，直接從通用寄存器中取函數參數，要比在本地堆棧（內存）中取，速度快很多。

00492: / *

00493: * sys_execve() executes a new program.

00494: */

00495: asmlinkage

00496: long sys_execve(char __user *name, char __user * __user *argv,

00497: char __user * __user *envp, struct pt_regs *regs)

00498: {

00499: long error;

00500: char *filename;

00501:

00502: filename = getname(name);

00503: error = PTR_ERR(filename);

00504: if (IS_ERR(filename))

00505: return error;

00506: error = do_execve(filename, argv, envp, regs);

00507: putname(filename);

00508: return error;

00509: }

fastcall的使用是和平台相關的，asmlinkage和fastcall的定義都在文件arch/x86/include/asm/linkage.h中。

00009: #ifdef CONFIG_X86_32

00010: #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))

00011: / *

00012: * For 32- bit UML - mark functions implemented in assembly that use

00013: * regparm input parameters:

00014: */

00015: #define asmregparm __attribute__((regparm(3)))

2.5 UL

UL通常用在一個常數的後面，標記為“unsigned long”。使用UL的必要性在於告訴編譯器，把這個常數作為長型數據對待。這可以避免在部分平台上，造成數據溢出。例如，在16位的整數可以表示的範圍為-32,768 ~ +32,767；一個無符號整型表示的範圍可以達到65,535。使用UL可以幫助當你使用大數或長的位掩碼時，寫出的代碼與平台無關。下面一段代碼摘自include/linux/hash.h。

00017: #include types.h>

00018:

00019: / * 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */

00020: #define GOLDEN_RATIO_PRIME_32 0x9e370001UL

00021: / * 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */

00022: #define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL

00023:

2.6 const和volatile

關鍵字const的含義不能理解為常量，而是理解為“只讀”。如int const*x是一個指針，指向一個const整數。這樣，指針可以改變，但整數值卻不能改變。然而int *const x是一個const指針，指向整數，整數的值可以改變，但指針不能改變。下面代碼摘自fs/ext4/inode.c。

00347: static int ext4_block_to_path(struct inode *inode,

00348: ext4_lblk_t i_block,

00349: ext4_lblk_t offsets[4], int *boundary)

00350: {

00351: int ptrs = EXT4_ADDR_PER_BLOCK(inode- >i_sb);

00352: int ptrs_bits = EXT4_ADDR_PER_BLOCK_BITS(inode- >i_sb);

00353: const long direct_blocks = EXT4_NDIR_BLOCKS,

00354: indirect_blocks = ptrs,

00355: double_blocks = (1 << (ptrs_bits * 2));

關鍵字volatile標記變量可以改變，而沒有告警信息。volatile告訴編譯器每次訪問時，該變量必須重新加載，而不是從拷貝或緩存中讀取。需要使用volatile的場合有，當我們處理中斷寄存器時，或者並發進程之間共享的變量。

task_struct結構體如下，包含volatile和const兩個特殊關鍵字。

01231: struct task_struct {

01232: volatile long state; / * - 1 unrunnable, 0 runnable, >0 stopped */

01233: void *stack;

01234: atomic_t usage;

01235: unsigned int flags; / * per process flags, defined below */

01236: unsigned int ptrace;

01237:

01238: int lock_depth; / * BKL lock depth */

01239:

01240: #ifdef CONFIG_SMP

01241: #ifdef __ARCH_WANT_UNLOCKED_CTXSW

01242: int oncpu;

01243: #endif

01244: #endif

01245:

01246: int prio, static_prio, normal_prio;

01247: unsigned int rt_priority;

01248: const struct sched_class *sched_class;

3 雜項

3.1 volatile

在嵌入式彙編代碼中，經常看到__volatile__修飾符，我們提到__volatile__和volatile實際上是等同的，這裡不多作強調。__volatile__修飾符對彙編代碼非常重要。它告訴編譯器不要優化內聯的彙編代碼。通常，編譯器認為一些代碼是冗餘和浪費的，於是就試圖儘可能優化這些彙編代碼。

3.2 likely()和unlikely()

unlikely（）和likely（）這兩個語句也很常見。先看mm/page_alloc.c中的函數__alloc_pages（），這個函數是內存管理中分配物理頁面的核心函數。

02100: / *

02101: * This is the 'heart' of the zoned buddy allocator.

02102: */

02103: struct page *

02104: __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,

02105: struct zonelist *zonelist, nodemask_t *nodemask)

02106: {

02107: enum zone_type high_zoneidx = gfp_zone(gfp_mask);

02108: struct zone *preferred_zone;

02109: struct page *page;

02110: int migratetype = allocflags_to_migratetype(gfp_mask);

02111:

02112: gfp_mask &= gfp_allowed_mask ;

02113:

02114: lockdep_trace_alloc(gfp_mask);

02115:

02116: might_sleep_if(gfp_mask & __GFP_WAIT);

02117:

02118: if (should_fail_alloc_page(gfp_mask, order))

02119: return NULL;

02120:

02121: / *

02122: * Check the zones suitable for the gfp_mask contain at least one

02123: * valid zone. It's possible to have an empty zonelist as a result

02124: * of GFP_THISNODE and a memoryless node

02125: */

02126: if (unlikely(! zonelist- >_zonerefs- >zone))

02127: return NULL;

02128:

注意到2126行的unlikely（）語句。那麼unlikely（）和likely（）的含義是什麼？

在linux內核源碼中，unlikely（）和likely（）是兩個宏，它告訴編譯器一個暗示。現代的CPU都有提前預測語句執行分支（branch-prediction heuristics）的功能，預測將要執行的指令，以優化執行速度。unlikely（）和likely（）通過編譯器告訴CPU，某段代碼是likely，應被預測；某段代碼是unlikely，不應被預測。likely（）和unlikely（）定義在include/linux/compiler.h。

00106: # ifndef likely

00107: # define likely(x) (__builtin_constant_p(x) ? ! ! (x) : __branch_check__(x, 1))

00108: # endif

00109: # ifndef unlikely

00110: # define unlikely(x) (__builtin_constant_p(x) ? ! ! (x) : __branch_check__(x, 0))

00111: # endif

3.3 IS_ERR和PTR_ERR

許多內部的內核函數返回一個指針值給調用者，而這些函數中很多可能會失敗。在大部分情況下，失敗是通過返回一個NULL指針值來表示的。這種技巧有作用，但是它不能傳遞問題的確切性質。某些接口確實需要返回一個實際的錯誤編碼，以使調用者可以根據實際出錯的情況做出正確的決策。

許多內核接口通過把錯誤值編碼到一個指針值中來返回錯誤信息。這種函數必須小心使用，因為他們的返回值不能簡單地和NULL比較。為了幫助創建和使用這種類型的接口，中提供了一小組函數。

void *ERR_PTR(long error);

這裡error是通常的負的錯誤編碼。調用者可以使用IS_ERR來檢查所返回的指針是否是一個錯誤編碼：

long IS_ERR(const void* ptr);

如果需要實際的錯誤編碼，可以通過以下函數把它提取出來：

long PTR_ERR(const void* ptr);

應該只有在IS_ERR對某值返回真值時才對該值使用PTR_ERR，因為任何其他值都是有效的指針。

3.4 init,initdata,exit,exitdata

先看linux內核啟動時的一段代碼，摘自init/main.c。

00541: asmlinkage void __init start_kernel(void)

00542: {

00543: char * command_line;

00544: extern struct kernel_param __start param[],

__stop param[];

00545:

00546: smp_setup_processor_id();

00547:

00548: / *

00549: * Need to run as early as possible, to initialize the

00550: * lockdep hash:

00551: */

00552: lockdep_init();

00553: debug_objects_early_init();

00554:

00555: / *

00556: * Set up the the initial canary ASAP:

00557: */

00558: boot_init_stack_canary();

00559:

00560: cgroup_init_early();

00561:

00562: local_irq_disable();

00563: early_boot_irqs_off();

00564: early_init_irq_lock_class();

00565:

00566: / *

00567: * Interrupts are still disabled. Do necessary setups, then

00568: * enable them

00569: */

函數start_kernel（）有個修飾符__init。__init實際上是一個宏，只有在linux內核初始化是執行的函數或變量前才使用__init。編譯器將標記為__init的代碼段存放在一個特別的內存區域裡，這個區域在系統初始化後，就會釋放。

同理，__initdata用來標記只在內核初始化使用的數據，__exit和__exitdata用來標記結束或關機的例程。這些通常在設備驅動卸載時使用。

3.5 內核源碼語法檢查

看進程管理內容時，do_fork（）的源碼是必讀的。我們注意到do_fork（）最後兩個參數前，都有__user修飾符。那麼這麼修飾符的含義和用處是怎樣的？摘自kernel/fork.c。

01397: long do_fork(unsigned long clone_flags,

01398: unsigned long stack_start,

01399: struct pt_regs *regs,

01400: unsigned long stack_size,

01401: int __user *parent_tidptr,

01402: int __user *child_tidptr)

01403: {

01404: struct task_struct *p;

01405: int trace = 0;

01406: long nr;

01407:

01408: / *

01409: * Do some preliminary argument and permissions checking before we

01410: * actually start allocating stuff

01411: */

01412: if (clone_flags & CLONE_NEWUSER) {

01413: if (clone_flags & CLONE_THREAD)

01414: return - EINVAL;

01415: / * hopefully this check will go away when userns support is

01416: * complete

01417: */

01418: if (! capable(CAP_SYS_ADMIN) || ! capable(CAP_SETUID) ||

01419: ! capable(CAP_SETGID))

01420: return - EPERM;

01421: }

先來看__user的在include/linux/compiler.h中的定義：

00006: #ifdef CHECKER

00007: # define __user __attribute__((noderef, address_space(1)))

00008: # define __kernel / * default address space */

00009: # define __safe __attribute__((safe))

00010: # define __force __attribute__((force))

00011: # define __nocast__attribute__((nocast))

00012: # define __iomem __attribute__((noderef, address_space(2)))

00013: # define __acquires(x) __attribute__((context(x,0,1)))

00014: # define __releases(x)__attribute__((context(x,1,0)))

00015: # define __acquire(x) __context__(x,1)

00016: # define __release(x) __context__(x,- 1)

00017: # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)

00018: extern void __chk_user_ptr(const volatile void __user *);

00019: extern void __chk_io_ptr(const volatile void __iomem *);

00020: #else

00021: # define __user

00022: # define __kernel

00023: # define __safe

00024: # define __force

00025: # define __nocast

00026: # define __iomem

00027: # define __chk_user_ptr(x) (void)0

00028: # define __chk_io_ptr(x) (void)0

00029: # define __builtin_warning(x, y...) (1)

00030: # define __acquires(x)

00031: # define __releases(x)

00032: # define __acquire(x) (void)0

00033: # define __release(x) (void)0

00034: # define __cond_lock(x,c) (c)

00035: #endif

通過其定義，似乎Gcc中現在還沒有支持這個用法。通過字面意思理解，__user很顯然是告訴它是一個用戶數據。雖然Gcc還不支持這種用法，但借助適當的工具，就可以在內核編譯時就可以發現內核源碼中的一些錯誤；如前面的__user，若編譯時發現傳遞進來的不是用戶數據，那麼就產生告警。

在__user定義中，我們發現還有__kernel、__safe、__force、__iomem，這些都是用來做內核源碼語法檢查的；其中__iomem在驅動代碼中很常見。

目前內核社區使用SPARSE工具來做內核源碼的檢查。SPARSE是語法分析器，能在編譯器前端發現源碼的語法。它能檢查ANSI C以及很多Gcc的擴展。SPASE提供一系列標記來傳遞語法信息，如地址空間的類型、函數所需獲取或釋放的鎖等。

Source https://blog.csdn.net/leeshuheng/article/details/5800377
__pure函數是指除了作一些運算以外（例如：strlen(...)）沒有其他作用，函數的返回值只影響調用函數的auto變量和按值傳遞的參數。
__const函數是更嚴格的__pure函數，它的返回值只影向調用函數的按值傳遞的參數。
__noreturn函數是指該函數一定會調用類似於exit(int)的函數終止進程。必須謹慎使用該擴展。
__deprecated函數是指該函數已經廢棄或應謹慎使用。在調用__deprecate前綴的函數時，編譯器會給出警告。
__must_check函數是指調用函數一定要處理該函數的返回值，否則編譯器會給出警告。
__used函數是告訴編譯器必須編譯該函數，即使在可見的代碼範圍內從沒使用過此函數。
其餘的幾個屬性都很容易理解了，不贅述了。

簡單.減嘆

2019年2月6日星期三

Linux內核源碼特殊用法

1 前言

2 Gcc中C語言的擴展用法

2.1 attribute

2.2 關鍵字替代

2.3 typeof

2.4 asmlinkage

2.5 UL

2.6 const和volatile

3 雜項

3.1 volatile

3.2 likely()和unlikely()

3.3 IS_ERR和PTR_ERR

3.4 init,initdata,exit,exitdata

3.5 內核源碼語法檢查

沒有留言:

標籤

網誌存檔

關於我自己

2019年2月6日 星期三

Linux內核源碼特殊用法

1 前言

2 Gcc中C語言的擴展用法

2.6 const和volatile

3.2 likely()和unlikely()

3.3 IS_ERR和PTR_ERR

沒有留言:

標籤

網誌存檔

關於我自己

2019年2月6日星期三