Thursday Oct 09, 2008

既然是101, 那就先从概念讲起吧.

固态硬盘(Solid State Disk),大概分两种:基于FLASH的和基于DRAM的。基于FLASH的,基本都采用NAND FLASH,属于non-volatile, 断电后能保持数据,不许电池保护。DRAM则相反,volatile memory,内部电池和备份存储都是必要的,否则突然断电数据会丢失或败坏。现在SSD主流是NAND FLASH。

NAND FLASH, 就两种:SLC 和MLC。SLC 是single level cell的缩写。根据阿南德同学的描述,一个cell就是一个N-channel MOSFET 的晶体管。SLC的FLASH,一个cell存一比特(bit)数据。MLC 就是multi-level cell,一个cell存两比特数据。由于MLC和SLC的cell都用一种晶体管,实际上MLC就比SLC多存一倍的数据。

数据如何写进cell的呢? 简单的讲,就是给cell加不同的电压。MLC存两比特,有四个电压要控制。SLC只要控制两个电压。因此SLC比MLC要快。(好像越简单才能越快的道理在存储上也越来越普遍,比如传统的parallel SCSI现在都要被SAS取代) FLASH上的数据,由一堆cell组成page, page是对NAND FLASH device写操作的最小单位。大部分MLC上page大小是4KB。一堆page,组成一个block。block 是flash上进行erase操作的最小单位。Intel MLC上一个block有128page,一个block就有512KB.
由于写和擦除在最小单位上的不对称性(4K VS。512K),SSD 一般避免频繁擦除。想删除数据时只设标志(some kind of invalid),只有当一个block里面invalid标志超过一定限度,才把有效数据拷到新page, 然后删除该block.
flash 写和擦除是有限度的:MLC--10000 erase cycles;SLC--100000 erase cycles。超过这个限度,你就没法再写和擦了。尽管该限度是物理特性, 对一整块flash来说还是有算法来延长其寿命的,比如说wear leveling. wear leveling 实际上就是让一块flash上所有的cell来共同承担erase. 如果没有wear leveling, flash 上的某些cell可能很快就到达10000次的极限:尽管其他大部分cell还没怎么wear, 这块flash也就不能用了。wear leveling 通过“均贫富”的方式来延长flash 的寿命。 关于SSD的基本概念就这么多了。我后面的博客会讲Solaris对SSD的支持。


References:
  • http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403
    • This one is very informative despite its pro-intel stance.  If the data shown is true then the intel X25-M is really a good buy for consumer-grade MLC SSD.

  • http://en.wikipedia.org/wiki/Solid_state_disk
    • One of the interesting things mentioned is that StorageTek (now part of Sun) developed the first modern type of solid state drive, back in 1978. That is 30 years ago, with a mere capacity of 1MB.

Thursday Sep 18, 2008

In case you haven't noticed, we've integrated a nice little feature into open Solaris Build 97: a more user-friendly [s]sd-config-list. The detailed feature description can be found in PSARC 2008/465


What is sd-config-list?


sd-config-list has been a private property in [s]sd.conf for [s]sd driver since 1999. The original intention for this property was just for internal development/field diagnosis, as the name "private" suggests. As time flies, many people in/outside Sun started using it to tune its disk systems. Doing a google search on sd-config-list gives you a long list of storage vendors who tell you how to use sd-config-list on their disk storage, like this, this and this.


But using sd-config-list is painful and error prone. Here is an example:



  ssd-config-list= "SUN     T4", "t4-data";
t4-data=1,0x20000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1;

Basically it is based on a bit-mask system and each bit represents a different tunable. Over the years the number of bits has grown to 18!!! I don't think anybody can figure out what the above ssd-config-list is trying to configure, without looking up the bit map table. In case you are curious, it is trying to enable LOGICAL LUN RESET.

So what are the changes?


A picture is worth a thousand words and an example should counts at least five hundreds. The new sd-config-list looks like this:


  ssd-config-list =
"SUN T4", "delay-busy:600, retries-timeout:6",
"SUN StorEdge_3510", "retries-timeout:3";


Much more elegant than the previous one, right? Furthermore, the new sd-config-list is JSON-compliant. Using JSON gives us not only a language independent format but also a familiar C style conventions. Backward compatibility is maintained: the old bitmask way still works but I am sure you will prefer the new format.

One more thing, the sd-retry-count tunable is back to Solaris. Of course, you have to use sd-config-list to configure sd-retry-count(now called "retries-timeout", the retries to perform on an IO timeout) on a per device type basis.



Future works?



There are two fronts of future works:



  • Providing generic interfaces in Solaris libnvpair to convert between nvlist and JSON text

  • Adopting more driver properties to use the JSON format. A quick target would be tape-config-list in st.conf.


Kudos to Chris and Nikko who make this happen!

Thursday Mar 20, 2008


  • ATA:AT 嵌入式接口

  • SCSI:Small Computer System Interface 小型计算机系统接口

  • SATA: Serial ATA, 串行ATA

  • eSATA: External SATA, 扩展型SATA

  • NAS:Network Attached Storage-网络附加存储

  • DAS:Direct Attached Storage-直接附加存储

  • SAN:Storage Area Network-存储局域网络

  • Array:阵列

  • Cache Policy :高速缓存策略

  • Capacity Expansion :容量扩展

  • Format:格式化

  • SAS: Serial Attached SCSI, 串行SCSI

  • iSCSI: Internet SCSI

  • iSNS: Internet Storage Name Service, Internet 存储名称服务

  • iBFT:

  • EFI: 扩展固件接口

  • UEFI: Unified EFI

  • VTOC:

  • Fibre Channel: 光纤通道

  • Data De-duplication: 重复数据删除

  • CDP: Continuous Data Protection, 持续数据保护

  • Virtual Tape Library: 虚拟带库

  • COMSTAR:

  • VDI: Virtual Desktop Infrastructure

  • Thin-provisioning:

  • Single-instance storage:


I plan to update this list on a regular basis. Some of the translations are from this network dictionary.



This blog copyright 2009 by tauger