Yukun Zhang(张玉昆)@Sun Microsystems

I am a software member of Solaris core technologies engaged in networking virtualization.
The weblog is dedicated to Solaris adaption.

All | Industry | Life | Networking | Solaris

20061104 Saturday November 04, 2006

Solaris 网络虚拟化: Firewall 部分 Solaris 的 Containers(AKA zones) 的模型 在 Solaris 10 中就出现了, CPU, 内存,Disk 空间的虚拟化已经可以提供出基本的 OS 级的虚拟环境, 但是网络部分还是没能完善。 Solaris 10 的zone 提供了独立的name space(logic IP 地址, TCP/UDP/SCTP 端口, ), 但是基本的说, 一个系统中 Zones 共享这一个 TCP/IP 协议栈。 - Routing - ARP - Firewall policy - Statitics 都是共享着的, 这也就意味着: - Zone 与 Zone 之间有安全漏洞: 一个Zone 的网络配置和信息可能(或一定)为别的Zone 开到 - 配置问题: 为 一个 Zone 做配置(如路由配置)可能会使得别的zone 也受影响甚至不能工作 The Old Solaris 10 Network Model Zone A Zone B +_________________++________________ | APPLICATION || APPLICATION | | || | | TCP UDP || TCP UDP | |_________________||________________| __________|________________|________ | | | IP/ARP/IPSEC/Firewall | |___________________________________| ______|_______ _____|______ | NIC1 | | NIC2 | |------------| |----------| The New Solaris 10 Network Model Zone A Zone B +_________________++________________ | APPLICATION | | APPLICATION | | | | | | TCP UDP | | TCP UDP | |_________________| |________________| __________|________ ________|_______ | | | | | IP/ARP/ | | IP/ARP/ | | IPSEC/Firewall | | IPSEC/Firewall | |_________________| |________________| ______|_______ _____|______ | NIC1 | | NIC2 | |------------| |----------| 我设计和实现了Solaris Firewall 的虚拟化。 Soaris 的 Firewall 有 pfhook framework(neti 和 hook) 以及 firewall 引擎 ipf 组成。 Architecture: IP +---------+ module: neti Firewall | | +----------------------------------------------+ +-----------+ | | | | | | | | | | | | | | | | | | | | -----------> net_register/unregister | | | | | net_lookup <------ | | | | net_release | | | | | | | | | | net_walk | | | | | | | | | | -----------> net_register_family/unregister | | | | | -----------> net_register_event/unregister | | | | | \ \ net_register_hook/unregister <------ | | | \-- | | | | | | | | \ | net_getifname <------ | | | | \ | net_getmtu <------ | | | | \ | net_getpmtuenabled <------ | | | | | | net_lifaddr <------ | | | | \ | net_phygetnext <------ | | | | \ | net_phylookup <------ | | | | \ | net_lifgetnext <------ | | | | \ | net_inject <------ | | | | \ | net_routeto <------ | | | | | | net_ispartialchecksum <------ | | | | | | net_isvalidchecksum <------ | | | | \ | | | | | | | \| | | | | | +-------------------^^^^^----------------------+ | | | | || | (neti, hook interaction) | | | | module: hook || | | | | | +-------------------<<>>|----------------------+ | | | | | | | | | | -----------> hook_run | | | | | | | | | | | +----------------------------------------------+ +-----------+ | | +---------+ Note: A few external functions in module hook called by neti hook_family_add/remove hook_event_add/remove hook_register/unregister The steps: 1. neti, and hook initialization 2. ip stack in ip_ddi_init - call ip_neti_init() - call net_register(&ipv4info) - call net_register(&ipv6info) - create task queue for eventq_queue_out eventq_queue_in eventq_queue_nic - call ipv4_hook_init() - call net_register_family(ipv4, &ipv4root) - call net_register_event(ipv4, &ip4_physical_in_event) - call net_register_event(ipv4, &ip4_physical_out_event) - call net_register_event(ipv4, &ip4_physical_forwarding_event) - call net_register_event(ipv4, &ip4_loopback_in_event) - call net_register_event(ipv4, &ip4_loopback_out_event) - call net_register_event(ipv4, &ip4_nic_events) - call ipv6_hook_init() - call net_register_family(ipv6, &ipv6root) - call net_register_event(ipv6, &ip6_physical_in_event) - call net_register_event(ipv6, &ip6_physical_out_event) - call net_register_event(ipv6, &ip6_physical_forwarding_event) - call net_register_event(ipv6, &ip6_loopback_in_event) - call net_register_event(ipv6, &ip6_loopback_out_event) - call net_register_event(ipv6, &ip6_nic_events) 3. arp - call arp_hook_init() - call net_register_family(arp, &arproot) - call net_register_event(arp, &arp_physical_in_event) - call net_register_event(arp, &arp_physical_out_event) - call net_register_event(arp, &arp_nic_events) 4. ipf iplattach: - ipf_ipv4 = net_lookup(NHF_INET); - net_register_hook(ipf_ipv4, NH_NIC_EVENTS, &ipfhook_nicevents) - net_register_hook(ipf_ipv4, NH_PHYSICAL_IN, &ipfhook_in) - net_register_hook(ipf_ipv4, NH_PHYSICAL_OUT, &ipfhook_out) - net_register_hook(ipf_ipv4, NH_LOOPBACK_IN, &ipfhook_in) - net_register_hook(ipf_ipv4, NH_LOOPBACK_OUT, &ipfhook_out) - ipf_ipv6 = net_lookup(NHF_INET6); - net_register_hook(ipf_ipv6, NH_NIC_EVENTS, &ipfhook_nicevents) - net_register_hook(ipf_ipv6, NH_PHYSICAL_IN, &ipfhook_in) - net_register_hook(ipf_ipv6, NH_PHYSICAL_OUT, &ipfhook_out) - net_register_hook(ipf_ipv6, NH_LOOPBACK_IN, &ipfhook_in) - net_register_hook(ipf_ipv6, NH_LOOPBACK_OUT, &ipfhook_out) 5. - when packets come in/out, nic event happens, IP calls hook_run() The data structures relationship looks like: netd_head -- \ \ net_data_t(ip4) ip6 arp \ ->+------------+ ---------> +------------+ -------------> +------------+ \ |net_info | |net_info | |net_info | | | | | | | | | | netd_hooks___ | netd_hooks___ | netd_hooks___ | +------------+ \ +------------+ \ +------------+ \ | neti \ \ \ / | | | / | | | / | | | | | | | | | \|/ \|/ \|/ \ familylist -------> +----------+ ----------> +----------+ -------------> +----------+ \ | | | | | | \ | | | | | | | hook_family_int_t | ____ | | | | | +----------+ \ +----------+ +----------+ | \ | +------+ | hook_event_int_t | |-------+----+ hook_int_t | hook | | | | | +------+ +----+ | | | | | +------+ | hook_event_int_t | |-------+----+ hook_int_t | | | | | | +------+ +----+ | | | | | +------+ | hook_event_int_t | |-------+----+ hook_int_t / | | | | / +------+ +----+ / 虚拟化之后的过程是: 1. kernel module neti: has the neti_stack_t, and neti_stack_init(), which malloc the local storage; 2. kernel module hook has the hook_stack_t, and hook_stack_init(), which malloc the local storage; 3. arp module changes: move the following from arp_ddi_init to arp_stack_init - call arp_neti_init(as) - call net_register(&arpinfo) - call arp_hook_init() - call net_register_family(arp, &arproot) - call net_register_event(arp, &arp_physical_in_event) - call net_register_event(arp, &arp_physical_out_event) - call net_register_event(arp, &arp_nic_events) 4. ip module: Move the following from ip_ddi_init to ip_stack_init, to make the - call ip_neti_init(pfs) - call net_register(&ipv4info) - call net_register(&ipv6info) - create task queue for eventq_queue_out eventq_queue_in eventq_queue_nic - call ipv4_hook_init() - call net_register_family(ipv4, &ipv4root) - call net_register_event(ipv4, &ip4_physical_in_event) - call net_register_event(ipv4, &ip4_physical_out_event) - call net_register_event(ipv4, &ip4_physical_forwarding_event) - call net_register_event(ipv4, &ip4_loopback_in_event) - call net_register_event(ipv4, &ip4_loopback_out_event) - call net_register_event(ipv4, &ip4_nic_events) - call ipv6_hook_init() - call net_register_family(ipv6, &ipv6root) - call net_register_event(ipv6, &ip6_physical_in_event) - call net_register_event(ipv6, &ip6_physical_out_event) - call net_register_event(ipv6, &ip6_physical_forwarding_event) - call net_register_event(ipv6, &ip6_loopback_in_event) - call net_register_event(ipv6, &ip6_loopback_out_event) - call net_register_event(ipv6, &ip6_nic_events) 下面的函数接口增加了一个 netstack_t * net_register(..., netstack_t *) net_lookup(..., netstack_t *) net_walk(..., netstack_t *) 其他函数保持不变: net_unregister(...) net_release(...) net_register_family(...) net_unregister_family(...) net_register_family(...) net_unregister_family(...) net_register_hook(...) net_unregister_hook(...) net_getifname(...) net_getmtu(...) net_getpmtuenabled(...) net_lifaddr(...) net_phygetnext(...) net_phylookup(...) net_lifgetnext(...) net_inject(...) net_routeto(...) net_ispartialchecksum(...) net_isvalidchecksum(...) 相应的这些数据结构和原型也有了一些变化: ------------------------------------------------------------------------------------- Old: 118 typedef struct net_info { 119 int neti_version; 120 char *neti_protocol; 121 int (*neti_getifname)(phy_if_t, char *, const size_t); 122 int (*neti_getmtu)(phy_if_t, lif_if_t); 123 int (*neti_getpmtuenabled)(void); 124 int (*neti_getlifaddr)(phy_if_t, lif_if_t, size_t, 125 net_ifaddr_t [], void *); 126 phy_if_t (*neti_phygetnext)(phy_if_t); 127 phy_if_t (*neti_phylookup)(const char *); 128 lif_if_t (*neti_lifgetnext)(phy_if_t, lif_if_t); 129 int (*neti_inject)(inject_t, net_inject_t *); 130 phy_if_t (*neti_routeto)(struct sockaddr *); 131 int (*neti_ispartialchecksum)(mblk_t *); 132 int (*neti_isvalidchecksum)(mblk_t *); 133 } net_info_t; New: 118 typedef struct net_info { 119 int neti_version; 120 char *neti_protocol; 121 int (*neti_getifname)(phy_if_t, char *, const size_t, netstack_t*); 122 int (*neti_getmtu)(phy_if_t, lif_if_t); 123 int (*neti_getpmtuenabled)(netstack_t *); 124 int (*neti_getlifaddr)(phy_if_t, lif_if_t, size_t, 125 net_ifaddr_t [], void *); 126 phy_if_t (*neti_phygetnext)(phy_if_t, netstack_t *); 127 phy_if_t (*neti_phylookup)(const char *, netstack_t *); 128 lif_if_t (*neti_lifgetnext)(phy_if_t, lif_if_t); 129 int (*neti_inject)(inject_t, net_inject_t *, netstack_t *); 130 phy_if_t (*neti_routeto)(struct sockaddr *, netstack_t *); 131 int (*neti_ispartialchecksum)(mblk_t *); 132 int (*neti_isvalidchecksum)(mblk_t *); 133 } net_info_t; ------------------------------------------------------------------------------------- Old: 139 struct net_data { 140 LIST_ENTRY(net_data) netd_list; 141 net_info_t netd_info; 142 int netd_refcnt; 143 hook_family_int_t *netd_hooks; 144 }; New: 139 struct net_data { 140 LIST_ENTRY(net_data) netd_list; 141 net_info_t netd_info; 142 int netd_refcnt; 143 hook_family_int_t *netd_hooks; 144 void * netd_netstack; 145 }; 146 ------------------------------------------------------------------------------------- Old: 147 typedef struct injection_s { 148 net_inject_t inj_data; 149 boolean_t inj_isv6; 150 } injection_t; New: 148 typedef struct injection_s { 149 net_inject_t inj_data; 150 boolean_t inj_isv6; 151 void * inj_ptr; 152 } injection_t; ------------------------------------------------------------------------------------- Old: 165 extern net_data_t net_register(const net_info_t *); New: 180 extern net_data_t net_register(const net_info_t *, netstack_t *); ------------------------------------------------------------------------------------- Old: 167 extern net_data_t net_lookup(const char *); New: 182 extern net_data_t net_lookup(const char *, netstack_t *); ------------------------------------------------------------------------------------- Old: 169 extern net_data_t net_walk(net_data_t); New: 184 extern net_data_t net_walk(net_data_t, netstack_t *); ( Nov 04 2006, 05:18:37 AM EST ) Permalink

20060916 Saturday September 16, 2006

IP instances for Solaris Exclusive IP instances for Solaris Zones. ( Sep 16 2006, 03:07:35 AM EDT ) Permalink

20060613 Tuesday June 13, 2006

Solaris packet filtering hooks (pfhooks)

  Solaris 10 防护墙 IP Filter 是基于 open source ipfilter 的。Sun 做了一些必要的有益的针对Solaris 的优化, 增加了一些 feature 比如完整的IPv6 的支持, IPv4/IPv6 pools, IPv6 fragment支持等)

  在 Solaris 10 基于 STREAMS 的网络框架里, Solaris 防护墙是由两个内核模块 pfil  + ipf 实现的。

这主要带来了两个问题:

  1. 性能差.
  2. 不能过滤 loopback traffic.  这个问题变得相当突出, 因为 Solaris container 之间的通信就是基于 loopback的。


  pfhooks 是内嵌于TCP、IP, ARP 协议站中的, 这就很好地解决了这两个问题:

    删除了 pfil, 提高了系统的性能;    loopback 的 traffic 在经过 IP 的时候也可以经由 pfhooks 到 ipf 做过滤, 第二个问题得以解决。


  更详细的说明请见 pfhook white paper.

( Jun 13 2006, 11:56:36 AM EDT ) Permalink

20051101 Tuesday November 01, 2005

Virutalization 技术

  Virutalization 技术有望成为计算机界的下一个big thing。传统地一个计算机系统(平台)就对应一个物理意义的计算机(server/workstation/PC, 或者还包括上面的软件)。
基本上说 virtualization 就是虚拟地提供出计算的运行环境(平台)。比如一台server可以提供出 n 个彼此之间逻辑上完全独立的计算机系统, 或多个server提供为一个计算机系统(不是重点)。

  类比的几个例子:
    1. Mutliprogramming 技术把一个计算机系统 virutalize 化以支持multitask;
    2. 一个物理信道可以时分/频分出多个信道;
    3. 一个人在公司同时做多个project(或多个人做一个项目);

  试着提供一个系列的blogs来介绍 virtulization 的历史、用途、技术,以及 Solaris virtualization 的内容。

( Nov 01 2005, 10:41:09 AM EST ) Permalink

20050602 Thursday June 02, 2005

Apache on AMD64 + Solairs10

  I have got to know quite a few customers are very interested in running Apache on the platform AMD64 + Solaris10. That is great!
  I am doing a investigation. Mainly I am concerned in the robustness, performance, and compatibilities. What's your Concerns? Pls tell me your ideas.
  Apache 在 AMD64 + Solaris 10 上。你有什么主意吗?:) 一块讨论讨论吧。

( Jun 02 2005, 11:57:10 PM EDT ) Permalink

20050523 Monday May 23, 2005

To Solaris novices -- Solaris 学习资料经验谈

 不知不觉中, 用Unix(Solaris/Linux) n 年了, 回顾这几年的学习过程, 还是有些体会。今天我(斗胆,有些心虚)谈一下学习 Solaris 的资料问题。

1. 大部分Solaris的书、资料、都是垃圾, 特别是国内一些写者写的;(来板砖我也这么说)  它们不是抄来抄去, 就是一知半解的乱说一气。

2. 我的观点是读好书多遍强于读多本书

3. To Solaris novices and intermediate programmers
  《Advanced Programming in the Unix Environment》    by Richard Stevens
   这本书10年前的经典有些老了, 但还是提供了大部分的Unix programming的精华,难能可贵的是Mr. Stevens超强的表述阐析能力, 让我们的很容易的把握住重点和微妙之处。 可惜大侠已逝, 我们是没法看到APUE的第二版了。
   (此外还强烈推荐 Richard 的其余的所有的书,都是Unix/Networking 的经典啊)

   好在我们还有《Solaris Systems Programming》      by Rich Teer
   Rich 是一个independant UNIX consultant、OpenSolaris CAB 的五成员之一。 他的 best sellor 提供了更新、更 Solaris-specific 的参考。 看看作者和Acknowledgement list这些Unix界的大牛, 像Casper Dik 这样的Unix Networking/Security 专家作 reviewers, 让人对这本书的权威性有十足的信心。
   Rich Teer's homepage: http://www.rite-group.com/rich/
   上两本书是 general-purpose 的 Solaris Programming 必备参考书。 存在没有这两本书的Solaris C/C++ programming guru 吗?! 别开玩笑了。:)

4. 你要GUI programming? Kernel programming? Networking programming?
   Multithreaded programming? Performance tuning for large scale middleware?
   ...

   这都是一些专题了。 希望有些帮助:
 
   GUI programming:     Sorry, i have no idea on it
   Kernel programming: 
                       Are you serious?
                       Try to be a OS guru first?
   STREAMS programming:
                       <Unix System V Networking Programming> by Stephen Rago
                        Outdated but informtive
   Device driver:                            
     最有用的material 可以下载得到。
     http://developers.sun.com/prodtech/solaris/reference/docs/index.html

  我觉得最快的方式是上 Training 课程, 当然自己学也可以,你有的学了。:)

4. Compilers and Tools
  你需要足够的工具:
  a. GNU 有一套 toolchain 可以选择. Solaris Companion CD(free的)上就有比较次就是了。 编程有 error/warning,提示的信息云山雾罩的经历有吧?更不用说效率了。
  b. Solaris 安装盘安装了除 compiler之外的其他的所以工具 as, ld, ... 可就是没有compiler. 好东西是要花银子的。 不过值。
     你有两个选择:
     * 只用compiler: C/C++/Fortran 95 的都有
     * Sun 有集成的开发工具包名曰 Sun Studio10(取代了以前的Sun Workshop, sun Forte)
       并且有Solaris on SPARC, Solarisx86 on AMD64/IA32, Linux on IA32等版本
       含有 compiler
            IDE的传统内容, editor/debugger/project manager/...
            test/performance anlysis: 可是物超所值的额外收获
  c. 还有,你要对学会用两个工具:DTrace and MDB
        这俩是Solaris guru的必修。 非常有用啊。 在此不多述了。

5. Script programming/web programming/java programming?
   I do not know. 我只会C/C++。 :(

 

( May 23 2005, 05:16:27 AM EDT ) Permalink Comments [1]


Today's Page Hits: 21