2018年8月30日 星期四

UBIFS study

Source: Wikipedia

無排序區塊圖像檔案系統(Unsorted Block Image File System, UBIFS)是用於固态硬盘儲存裝置上,並與LogFS相互競爭,作為JFFS2的後繼檔案系統之一。真正開始開發於2007年,並於2008年10月第一次加入穩定版本於Linux核心2.6.27版。
UBIFS最早在2006年由IBMNokia的工程師Thomas Gleixner,Artem Bityutskiy所設計,專門為了解決MTD(Memory Technology Device)裝置所遇到的瓶頸。由於Nand Flash容量的暴漲,YAFFS等皆無法再去控制Nand Flash的空間。UBIFS透過子系統UBI處理與MTD device之間的動作。與JFFS2一樣,UBIFS 建構於MTD device 之上,因而與一般的block device不相容。
UBIFS在設計與性能上均較YAFFS2JFFS2更適合MLC NAND FLASH。[1]例 如:UBIFS 支持 write-back, 其寫入的資料會被cache, 直到有必要寫入時才寫到flash, 大大地降低分散小區塊數量并提高I/O效率。UBIFS UBIFS檔案系統目錄儲存在flash上,UBIFS mount時不需要scan整個flash的資料來重新建立檔案目錄。支援on-the-flight壓縮檔案資料,而且可選擇性壓縮部份檔案。另外 UBIFS使用日誌(journal),可减少对flash index的更新频率。
UBIFS 目前是 Nokia N900 智慧型手機上的預設檔案系統。[2]


Source: http://shyuanliang.blogspot.com/2011/05/ubifs-flash-file-system.html

2011年5月18日 星期三

UBIFS

有關ubifs的詳細介紹,請參考: 
http://www.linux-mtd.infradead.org/doc/ubi.html 
http://www.linux-mtd.infradead.org/doc/ubifs.html 

linux 2.6.28 開始支援 ubifs.

UBIFS 是由NOKIA Engineers開發用於Flash memory的檔案系統.
UBIFS可以視為下一代的Jffs2 file system.
與Jffs2一樣,UBIFS建構於MTD device之上而與一般的block device是不相容的.

1.Jffs2的架構與限制
Jffs2在mount時會scan整個flash所有的資料,再將檔案系統目錄儲存在system memory.
這種作法帶來的缺點是mount動作會消耗很多時間.
而當flash size越大所需的時間及system memory都將成線性倍數成長

Jffs2沒有write-back機制.(write-back : 先cache寫入的資料到一定的量再一次作write動作)
當Application寫入資料,Jffs2幾乎是同步將資料寫入實體flash.
會說幾乎是因為Jffs2的確有一塊NAND page size大小的buffer用來紀錄最後寫入的資料.
沒有write-back機制的缺點是對flash I/O的動作頻繁

Jffs2檔案存取所需要的時間跟檔案大小呈線性倍數成長

Jffs2如果歷經許多檔案小部份修改寫入動作,Jffs2的運作效率會逐漸變差.

2.UBIFS帶來的改進

UBIFS有個子系統UBI用以處理與MTD device之間的動作.
UBIFS檔案系統目錄儲存在flash上.這代表UBIFS mount時不需要scan整個flash的資料來重新建立檔案目錄.
因此mount所需時間約為幾百個ms而且不隨著flash size增加.

UBIFS support write-back.
寫入的資料會被cache住直到有必要寫入時才寫到flash.
這樣的作法降低分散小區塊數量及I/O效率.

但write-back非同步的寫入行為使得Application在寫入檔案時要謹慎處理同步問題.
重要的檔案使用fsync強迫UBIFS將資料寫入flash.

UBIFS supports on-the-flight compression.
UBIFS可壓縮檔案資料而且可選擇性壓縮部份檔案.

UBIFS具有日誌紀錄(journal)以減少檔案目錄更新的頻率 

*使用方法 

1. 先建立 ubifs image

a. 先將要製作的目錄作成 ubifs 格式

mkfs.ubifs -r -m  -e -c  -o -x 需要填入不同的參數
mkfs.ubifs -m 2048 -e 126976 -c 1872 -r ./mtd -o mtd.ubifs -x lzo
mkfs.ubifs -m 2048 -e 126976 -c 1872 -r ./mtd -o mtd.ubifs -x zlib
用 lzo or zlib 壓縮, default : lzo

PS: 

What is the the purpose of the -F (--space-fixup) mkfs.ubifs option?

Because of subtle ECC errors that can arise when programming NAND flash (see here), ubiformat is the recommended way of flashing a UBI image which contains a UBIFS file system. However, this is not always possible - for example, some embedded devices are manufactured using an industrial NAND flash programmer which has no knowledge of UBI or UBIFS.
The -F option causes mkfs.ubifs to set a special flag in the superblock, which triggers a "free space fixup" procedure in the kernel the very first time the filesystem is mounted. This fixup procedure involves finding all empty pages in the UBIFS file system and re-erasing them. This ensures that NAND pages which contain all 0xFF data get fully erased, which removes any problematic non-0xFF data from their OOB areas.
Of course it is not possible to re-erase individual NAND pages, and entire PEBs are erased. UBIFS performs this procedure by reading the useful (non 0xFF'ed) contents of LEBs and then invoking the atomic LEB change UBI operation. Obviously, this means that UBIFS has to read and write a lot of LEBs which takes time. But this happens only once, and the "free space fixup" procedure then unsets the "fixup" UBIFS superblock flag.
This option is supported if you are running a kernel version 3.0 or higher, or if you have pulled the changes from a UBIFS back-port tree. Note that ubiformat is still the preferred flashing method if the image is not being flashed for the first time, since it preserves existing erase counters (while using nandwrite or its equivalent does not).

b. 將上面的檔案做成可燒錄的 image

ubinize -o -m -p -s -O ubinize.cfg

ubinize 會吃 ubinize.cfg 來作成 image
ubinize -o ubifs.img -m 2048 -p 128KiB -s 512 -O 2048 ubinize.cfg

ubinize.cfg
[ubifs]
mode=ubi
image=mtd.ubifs
vol_id=0
vol_size=230MiB
vol_type=dynamic
vol_name=APPS
vol_flags=autoresize

PS: mkfs.ubifs -m 2048 -e 126976 -c 1888 → -e 為 logical eraseblock size(124KiB), 而不是 physical eraseblock size 的 128KiB, 有時好像是129024, 原因不明?
PS:  vol_size=230MiB 必須比實際的 mtd size 小. 

2. 將 ubifs image 掛載起來

flash_eraseall /dev/mtdX
ubiformat /dev/mtd X -O 2048 -s 512 -f ubifs.img
ubiattach /dev/ubi_ctrl  -O 2048 -m X
mount -t ubifs ubi0_0 /mnt/mtd or mount -t ubifs ubi0:APPS /mnt/mtd

----------------------------------------------------------------------------------------------------------------------
指的是 mtd partition
ubi0:APPS  指的是 ubinize.cfg 裡的 vol_name
ubi0_0         指的是 ubinize.cfg 裡的 vol_id


如果要將 rootfs 用 ubifs 方式開機, boot command 為:

setenv bootargs 'console=ttyS0,115200 ubi.mtd=2 root=ubi0:rootfs rootfstype=ubifs' 
or
setenv bootargs 'console=ttyS0,115200 ubi.mtd=2 root=ubi0_0 rootfstype=ubifs' 
---------------------------------------------------------------------------------------------------------------------- 
root=ubi0:rootfs 指的是 ubinize.cfg 裡的 vol_name
root=ubi0_0       指的是 ubinize.cfg 裡的 vol_id

 當 ubiattach 時, 如果出現 error 9, 如下. 表示 flash mtd 的 size 太小
UBI error: vtbl_check: volume table check failed: record 7, error 9 Reference:

不同版本的 mtd-utils 好像會有不同的參數 @@@@
以上都是用 mtd-utils 1.50 測試

----------------------------------------------------------------------------------------------------------------------
以下節錄 ti wiki

Calculations

Usable Size Calculation
As documented here, UBI reserves a certain amount of space for management and bad PEB handling operations. Specifically:
  • 2 PEBs are used to store the UBI volume table
  • 1 PEB is reserved for wear-leveling purposes;
  • 1 PEB is reserved for the atomic LEB change operation;
  • a % of PEBs is reserved for handling bad EBs. The default for NAND is 1%
  • UBI stores the erase counter (EC) and volume ID (VID) headers at the beginning of each PEB. 1 min I/O unit is required for each of these.
To calculate the full overhead, we need the following values:
SymbolMeaningValue for XO test case
SPPEB Size128KiB
SLLEB Size128KiB - 2 * 2KiB = 124 KiB
PTotal number of PEBs on the MTD device200MiB / 128KiB = 1600
BNumber of PEBs reserved for bad PEB handling1% of P = 16
OThe overhead related to storing EC and VID headers in bytes, i.e. O = SP - SL4KiB

UBI Overhead = (B + 4) * SP + O * (P - B - 4) 
      = (16 + 4) * 128Kib + 4 KiB * (1600 - 16 - 4)
      = 8880 KiB 
      = 69.375 PEBs (round to 69)
This leaves us with 1531 PEBs or 195968KiB available for user data.
Note that we used "-c 1580" in the above mkfs.ubifs command line to specify the maximum filesystem size, not "-c 1531" The reason for this is that mkfs.ubifs operates in terms of LEB size (124 KiB), not PEB size (128Kib). 195968KiB / 124 Kib = 1580.39 (round to 1580).
Volume size = 195968KiB (~192MiB)


----------------------------------------------------------------------------------------------------------------------

Usage: mkfs.ubifs [OPTIONS] target
Make a UBIFS file system image from an existing directory tree

Examples:
Build file system from directory /opt/img, writting the result in the ubifs.img file
mkfs.ubifs -m 512 -e 128KiB -c 100 -r /opt/img ubifs.img
The same, but writting directly to an UBI volume
mkfs.ubifs -r /opt/img /dev/ubi0_0
Creating an empty UBIFS filesystem on an UBI volume
mkfs.ubifs /dev/ubi0_0

Options:
-r, -d, --root=DIR       build file system from directory DIR
-m, --min-io-size=SIZE   minimum I/O unit size
-e, --leb-size=SIZE      logical erase block size
-c, --max-leb-cnt=COUNT  maximum logical erase block count
-o, --output=FILE        output to FILE
-j, --jrn-size=SIZE      journal size
-R, --reserved=SIZE      how much space should be reserved for the super-user
-x, --compr=TYPE         compression type - "lzo", "favor_lzo", "zlib" or
                         "none" (default: "lzo")
-X, --favor-percent      may only be used with favor LZO compression and defines
                         how many percent better zlib should compress to make
                         mkfs.ubifs use zlib instead of LZO (default 20%)
-f, --fanout=NUM         fanout NUM (default: 8)
-F, --space-fixup        file-system free space has to be fixed up on first mount
                         (requires kernel version 3.0 or greater)
-k, --keyhash=TYPE       key hash type - "r5" or "test" (default: "r5")
-p, --orph-lebs=COUNT    count of erase blocks for orphans (default: 1)
-D, --devtable=FILE      use device table FILE
-U, --squash-uids        squash owners making all files owned by root
-l, --log-lebs=COUNT     count of erase blocks for the log (used only for
                         debugging)
-v, --verbose            verbose operation
-V, --version            display version information
-g, --debug=LEVEL        display debug information (0 - none, 1 - statistics,
                         2 - files, 3 - more details)

Usage: ubinize [-o filename] [-p ] [-m ] [-s ] [-O ] [-e ]
[-x ] [-Q ] [-v] [-h] [-V] [--output=] [--peb-size=]
[--min-io-size=] [--sub-page-size=] [--vid-hdr-offset=]
[--erase-counter=] [--ubi-ver=] [--image-seq=] [--verbose] [--help]
[--version] ini-file
Example: ubinize -o ubi.img -p 16KiB -m 512 -s 256 cfg.ini - create UBI image
         'ubi.img' as described by configuration file 'cfg.ini'

-o, --output=     output file name
-p, --peb-size=       size of the physical eraseblock of the flash
                             this UBI image is created for in bytes,
                             kilobytes (KiB), or megabytes (MiB)
                             (mandatory parameter)
-m, --min-io-size=    minimum input/output unit size of the flash
                             in bytes
-s, --sub-page-size=  minimum input/output unit used for UBI
                             headers, e.g. sub-page size in case of NAND
                             flash (equivalent to the minimum input/output
                             unit size by default)
-O, --vid-hdr-offset=   offset if the VID header from start of the
                             physical eraseblock (default is the next
                             minimum I/O unit or sub-page after the EC
                             header)
-e, --erase-counter=    the erase counter value to put to EC headers
                             (default is 0)
-x, --ubi-ver=          UBI version number to put to EC headers
                             (default is 1)
-Q, --image-seq=        32-bit UBI image sequence number to use
                             (by default a random number is picked)
-v, --verbose                be verbose


Usage: ubiformat [-s ] [-O ] [-n]
[-f ] [-S ] [-e ] [-x ] [-y] [-q] [-v] [-h] [-v]
[--sub-page-size=] [--vid-hdr-offset=] [--no-volume-table]
[--flash-image=] [--image-size=] [--erase-counter=]
[--ubi-ver=] [--yes] [--quiet] [--verbose] [--help] [--version]

Example 1: ubiformat /dev/mtd0 -y - format MTD device number 0 and do
           not ask questions.
Example 2: ubiformat /dev/mtd0 -q -e 0 - format MTD device number 0,
           be quiet and force erase counter value 0.

-s, --sub-page-size=  minimum input/output unit used for UBI
                             headers, e.g. sub-page size in case of NAND
                             flash (equivalent to the minimum input/output
                             unit size by default)
-O, --vid-hdr-offset=  offset if the VID header from start of the
                             physical eraseblock (default is the next
                             minimum I/O unit or sub-page after the EC
                             header)
-n, --no-volume-table        only erase all eraseblock and preserve erase
                             counters, do not write empty volume table
-f, --flash-image=     flash image file, or '-' for stdin
-S, --image-size=     bytes in input, if not reading from file
-e, --erase-counter=  use as the erase counter value for all
                             eraseblocks
-x, --ubi-ver=          UBI version number to put to EC headers
                             (default is 1)
-Q, --image-seq=        32-bit UBI image sequence number to use
                             (by default a random number is picked)
-y, --yes                    assume the answer is "yes" for all question
                             this program would otherwise ask
-q, --quiet                  suppress progress percentage information
-v, --verbose                be verbose


Usage: ubiattach []
[-m ] [-d ] [-p ]
[--mtdn=] [--devn=]
[--dev-path=]
UBI control device defaults to /dev/ubi_ctrl if not supplied.
Example 1: ubiattach -p /dev/mtd0 - attach /dev/mtd0 to UBI
Example 2: ubiattach -m 0 - attach MTD device 0 (mtd0) to UBI
Example 3: ubiattach -m 0 -d 3 - attach MTD device 0 (mtd0) to UBI
           and create UBI device number 3 (ubi3)

-d, --devn=   the number to assign to the newly created UBI device
                      (assigned automatically if this is not specified)
-p, --dev-path= path to MTD device node to attach
-m, --mtdn=   MTD device number to attach (alternative method, e.g
                      if the character device node does not exist)
-O, --vid-hdr-offset  VID header offset (do not specify this unless you really
                      know what you are doing, the default should be optimal)



Source: https://bootlin.com/blog/creating-flashing-ubi-ubifs-images/

Creating and flashing UBI / UBIFS images

Embedded-oriented filesystems are a scattered world. Flash-optimized filesystems are less so. JFFS2 has been widely used but has several performance issues (mount time, especially, though CONFIG_SUMMARY and sumtool fixes that since 2.6.15). LogFS doesn’t seem to be actively maintained. The most active and promising flash filesystem is UBIFS. It runs on top of UBI (“Unsorted Block Images”), an abstraction layer for MTD devices.

Why flash-oriented filesystems ?

MTDs (Memory Technology Devices) are very different from block devices: instead of a sequence of writable sectors, they contain an array of writable pages, organized in so-called “erase blocks”.
To write on a page that already has data on it, you first have to erase this data. However, it is only possible to erase whole eraseblocks. Only then, you can write your new data (including what you didn’t change). Erasing causes the memory cells to wear out. At some point, they won’t be usable anymore and have to be skipped.
Because it is memory-based, random access is theoretically as fast as sequential access. So, you don’t need to keep the fragments of your files together. It makes it possible to do wear-leveling and thus, “increase” the lifetime of the chip.
A simple way of doing wear-leveling is to keep track of the number of times a block has been erased and use the block that has been the least erased when updating data.
All these constraints make it hard to write a flash filesystem.
UBI intends to deal with all MTD-specific operations while still presenting random-access volumes to the the upper-layer. The first – and as for now, only – implementation using UBI is UBIFS. UBI is a “volume manager” and maps physical erase blocks (PEB) to logical erase blocks (LEB). The LEBs are smaller than the PEBs because of meta-data and headers.

How to use UBI on my board ?

There are mainly 2 ways to do that:
  • On a booted Linux system, approximately the same way you would create a partition on your desktop’s hard drive ;
  • From the bootloader, by flashing a previously prepared UBI image ;
Whatever solution you choose, you need to know the sizes of:
  • the eraseblocks (PEB) ;
  • the pages (or “minimum input/output size”) ;
  • the subpages (it may be the same as the min i/o size) ;
From these details, you can deduce another one: the size of logical erase blocks. It is the size of the PEB minus a data offset which is:
(int((Subpage_size + Page_size) / Page_size))  * Page_size
(subpage+page truncated to page size). This formula makes some assumption but should be correct if the subpage size is more than 8B and the page size more than 64B (see the source for more information). The best way to be sure of this size is to use mtdinfo on linux on the board. mtdinfo is part of the ubi-utils (part of mtd-utils). It’s probably available in your build system.
In both cases, you will also need a UBIFS image. In the way of JFFS2, mkfs.ubifs comes in mtd-utils (thus, you also need them on your desktop. Warning: mtd-utils in Ubuntu 10.10 are reported to be buggy ; if you use this distribution, recompile them from their git tree). Here is an example of how you can invoke it:
# mkfs.ubifs -r 
-m -e -c -o

Solution A – On a booted Linux system

I think it’s the best method to understand how UBI is structured.
You first need to enable UBI and UBIFS in the kernel and install the mtd-utils package (for Debian and Ubuntu) on your box. You may also compile mtd-utils from its sources.
Once you have your UBIFS image at hand, let’s sing the UBI song:
# ubiformat /dev/mtdX
# ubiattach -p /dev/mtdX
# ubimkvol /dev/ubi0 -N volume_name -s 64MiB
# ubiupdatevol /dev/ubi0_0 /path/to/ubifs.img
# mount -t ubifs ubi0:volume_name /mount/point
Let’s examine each command. ubiformat erases an MTD partition but keeps its erase counters ((‘X’ is the number of the partition you want to use). ubiattach creates a UBI device from the MTD partition. This UBI device is then referred to by UBI as ubi0 (if it is the first device). ubimkvol creates a volume on a UBI device ; this volume is referred to as ubi0_0 (if it is the first volume on the device). ubiupdatevol puts an image on an empty volume. (use ubiupdatevol -t /dev/ubi0_0 to empty a volume). At last, the well-known mount can be invoked using :

Solution B – Prepare a UBI image ready to be flashed

It is more common to directly flash filesystem images directly from the bootloader. It is made possible by ubinize to prepare a UBI device image containing one or more volumes.
ubinize reads a configuration file (in the very simple INI format) describing the volumes and their configuration. Here is an example of a device with two volumes ; one, named rootfs is read-only (static), the other one, data is read-write (dynamic) ; the autoresize flag makes UBI resize to volume to use the whole unused space at initialization. The name of the sections is totally arbitrary.
[rootfs_volume]
mode=ubi
image=rootfs.ubifs
vol_id=1
vol_type=static
vol_name=rootfs
vol_alignment=1

[rwdata_volume]
mode=ubi
image=data.ubifs
vol_id=2
vol_type=dynamic
vol_name=data
vol_alignment=1
vol_flags=autoresize
Next is the generation of the UBI image. The ubinize utility will need the Physical Erase Block size (PEB) (option -p) and the minimum I/O size (-m):
# ubinize -vv -o  -m 
  -p KiB 
Your image is ready. You may now want to boot on the rootfs UBIFS partition. Keep on reading, then.

Use a UBIFS partition as root partition

Some options need to be passed to the kernel to boot on a ubi volume and on a UBIFS partition:
ubi.mtd=
root=:
rootfstype=ubifs
For instance, with the previous examples and assuming the UBI device has been created/flashed on /dev/mtd1:
ubi.mtd=1 root=ubi0:rootfs rootfstype=ubifs

Conclusion

Creating and using a UBIFS filesystem is not as easy as with JFFS2 but UBI/UBIFS is designed to be more robust and UBI will ease the development of new filesystems. The authors of UBI have pointed some memory usage scalability problems but if a second version of UBI were written, filesystems on top of it would not need to be modified.

Troubleshooting

In case your system is missing the /dev/ubi_ctrl/dev/ubi0 or /dev/ubi0_X device files, we advise you to recompile your kernel with DEVTMPFS and DEVTMPFS_MOUNT. This way, all the devices existing on your system will appear in /dev.
If you get write errors (code -74 or -5, probably), check that CONFIG_MTD_NAND_VERIFY_WRITE (respectively, ONENAND) is disabled : verifying subpages writes isn’t supported yet.

Sources

The primary place for information about MTD support in Linux is infradead.org. There also is a mailing list which you can also subscribe to.
The kernel sources under drivers/mtd and fs/ubisfs are also very helpful.



[Yulin] 23 for 2K page size NAND and 26 for 4K page size NAND is the minimal NAND blocks required for a UBIFS volume.

An accounting of UBI+UBIFS overhead

First, the UBI layer takes 5 erase blocks of overhead:

  • 2 for the volume table
  • 1 reserved for the wear leveling algorithm
  • 1 reserved for the "atomic LEB change" feature, which allows for reliable in-place updates of a logical erase block
  • 1 (ideally more, as you mentioned) reserved for handling bad physical erase blocks.

Next, the UBIFS layer has a minimum number of erase blocks for the filesystem metadata:

  • 1 for the filesystem superblock, which identifies the volume as a valid UBIFS and stores the filesystem parameters
  • 2 for the master node area (redundant copies), which are the roots of the tree used for filesystem lookups
  • 2 or more for the log area (which counts towards usable space)
  • 2 for the LEB properties tree, which tracks how each logical erase block is used
  • 1 or more for the orphan area (for tracking deleted files, so they are cleaned up correctly after an unclean unmount)
  • 8 reserved for filesystem metadata (garbage collection, deletions, buds, index)
  • 1 or more for committed data not in the log (usable space).
$ mkfs.ubifs -r fwupac -o fwupac.ubifs -m 4096 -e 253952 -c 2146 -F -c 14
Error: too low max. count of LEBs, minimum is 17
$ mkfs.ubifs -r fwupac -o fwupac.ubifs -m 4096 -e 253952 -c 2146 -F -c 17
Error: too many log LEBs, maximum is 0
...
$ mkfs.ubifs -r fwupac -o fwupac.ubifs -m 4096 -e 253952 -c 2146 -F -c 21
Error: too many log LEBs, maximum is 4
$ mkfs.ubifs -r fwupac -o fwupac.ubifs -m 4096 -e 253952 -c 2146 -F -c 22
$

沒有留言: