introduction to linux kernel tcp/ip procotol stack

2011-01-28

今天给公司同事做的分享.

introduction to linux kernel tcp/ip procotol stack

阅读全文

nginx中http request处理的流程

2011-01-25

这次主要来看nginx如何处理一个http的流程，也就是接收请求，解析，然后接收完毕，然后开始发送数据，这一系列是如何流转起来的，通过上2篇，我们知道了nginx初始化完毕之后会休眠在epoll(或者kqueue等等).

下面就是nginx的事件处理流程图.

阅读全文

nginx的启动流程分析(二)

2011-01-21

接上篇，这篇主要来看nginx的网络部分的初始化

首先是ngx_http_optimize_servers函数，这个函数是在ngx_http_block中被调用的，它的主要功能就是创建listening结构，然后初始化。这里ngx_listening_t表示一个正在监听的句柄以及它的上下文。

  
static ngx_int_t
  
ngx_http_optimize_servers(ngx_conf_t \*cf, ngx_http_core_main_conf_t \*cmcf,
      
ngx_array_t *ports)
  
{
      
ngx_uint_t p, a;
      
ngx_http_conf_port_t *port;
      
ngx_http_conf_addr_t *addr;

if (ports == NULL) {
          
return NGX_OK;
      
}

port = ports->elts;
      
for (p = 0; p < ports->nelts; p++) {
  
&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;..
  
//初始化listen结构
          
if (ngx_http_init_listening(cf, &port[p]) != NGX_OK) {
              
return NGX_ERROR;
          
}
      
}

return NGX_OK;
  
}

阅读全文

nginx的启动流程分析(一)

2011-01-06

这篇我们会主要来分析配置文件相关的一些初始化，而在下一篇我们会详细分析http协议相关，以及socket的初始化信息。

nginx启动最重要的部分是在ngx_init_cycle中，我们接下来就会详细的分析这个函数，以及相关的函数.

下面就是ngx_init_cycle的流程图

阅读全文

再见2010

2010-12-31

一晃眼，2010就过去了，又要开始整理这1年读的书，看的电影，听的歌了。

2010 读过的书，今年很惭愧，看的书太少了，明年会更加努力。今年觉得最好的两本是天朝的崩溃和罗马帝国衰亡史.今后会把更多的时间分配给读书。

<tr>
  <td>
    <a title="HTTP" target="_blank" href="http://book.douban.com/subject/1440226/"><img border="0" src="http://img3.douban.com/spic/s4255750.jpg" /></a>
  </td>

  <td>
    <a title="Assembly Language Step-by-Step edition 3" target="_blank" href="http://book.douban.com/subject/3781682/"><img border="0" src="http://img3.douban.com/spic/s3825074.jpg" /></a>
  </td>

  <td>
    <a title="停滞的帝国:两个世界的撞击" target="_blank" href="http://book.douban.com/subject/2173201/"><img border="0" src="http://img5.douban.com/spic/s2900135.jpg" /></a>
  </td>

  <td>
    <a title="天朝的崩溃" target="_blank" href="http://book.douban.com/subject/1675478/"><img border="0" src="http://img3.douban.com/spic/s1681072.jpg" /></a>
  </td>

  <td>
    <a title="C 陷阱与缺陷" target="_blank" href="http://book.douban.com/subject/2778632/"><img border="0" src="http://img3.douban.com/spic/s2870233.jpg" /></a>
  </td>

  <td>
    <a title="深入理解LINUX内核(第三版 涵盖2.6版)" target="_blank" href="http://book.douban.com/subject/2287506/"><img border="0" src="http://img3.douban.com/spic/s2749490.jpg" /></a>
  </td>
</tr>

<tr>
  <td>
    <a title="独唱团（第一辑）" target="_blank" href="http://book.douban.com/subject/4886245/"><img border="0" src="http://img3.douban.com/spic/s4436571.jpg" /></a>
  </td>

  <td>
    <a title="罗马帝国衰亡史（第二卷）" target="_blank" href="http://book.douban.com/subject/3018351/"><img border="0" src="http://img3.douban.com/spic/s3118156.jpg" /></a>
  </td>

  <td>
    <a title="源泉" target="_blank" href="http://book.douban.com/subject/1431870/"><img border="0" src="http://img3.douban.com/spic/s1451416.jpg" /></a>
  </td>

  <td>
    <a title="实战Nginx：取代Apache的高性能Web服务器" target="_blank" href="http://book.douban.com/subject/4251875/"><img border="0" src="http://img3.douban.com/spic/s4149106.jpg" /></a>
  </td>

  <td>
    <a title="罗马帝国衰亡史（第一卷）" target="_blank" href="http://book.douban.com/subject/3018347/"><img border="0" src="http://img3.douban.com/spic/s3119972.jpg" /></a>
  </td>

  <td>
    <a title="软件随想录" target="_blank" href="http://book.douban.com/subject/4163938/"><img border="0" src="http://img3.douban.com/spic/s4073980.jpg" /></a>
  </td>
</tr>

<tr>
  <td>
    <a title="软件优化技术——IA-32平台的高性能手册（第2版）" target="_blank" href="http://book.douban.com/subject/2068755/"><img border="0" src="http://img3.douban.com/spic/s2522202.jpg" /></a>
  </td>

  <td>
    <a title="红色骑兵军" target="_blank" href="http://book.douban.com/subject/3902733/"><img border="0" src="http://img3.douban.com/spic/s3934830.jpg" /></a>
  </td>

  <td>
    <a title="自由的伦理" target="_blank" href="http://book.douban.com/subject/3237711/"><img border="0" src="http://img3.douban.com/spic/s3284559.jpg" /></a>
  </td>

  <td>
    <a title="计算机组成与设计：硬件/软件接口" target="_blank" href="http://book.douban.com/subject/2110638/"><img border="0" src="http://img3.douban.com/spic/s2506234.jpg" /></a>
  </td>

  <td>
    <a title="米格尔街" target="_blank" href="http://book.douban.com/subject/3902734/"><img border="0" src="http://img3.douban.com/spic/s3947258.jpg" /></a>
  </td>

  <td>
    <a title="TCP/IP Architecture, Design and Implementation in Linux" target="_blank" href="http://book.douban.com/subject/3397220/"><img border="0" src="http://img5.douban.com/spic/s3518315.jpg" /></a>
  </td>
</tr>

<tr>
  <td>
    <a title="中國近代史（下冊）" target="_blank" href="http://book.douban.com/subject/1476218/"><img border="0" src="http://img3.douban.com/spic/s1493866.jpg" /></a>
  </td>

  <td>
    <a title="中國近代史（上冊）" target="_blank" href="http://book.douban.com/subject/1476213/"><img border="0" src="http://img3.douban.com/spic/s1493864.jpg" /></a>
  </td>
</tr>

阅读全文

nginx中slab分配器的实现

2010-12-20

nginx的slab分配器主要用于共享内存部分的内存分配，代码包含在core/slab.c和core/slab.h中。slab是针对小于1页的内存的fenpei 它的大体思想和jeff的那篇paper中描述的一致，因此可以先看看jeff的那篇关于slab的论文。有关于slab的优点也可以去看jeff的paper，这里就不描述了。

下面就是nginx的slab的内存图.

阅读全文

linux kernel 网络协议栈之xps特性详解

2010-12-12

xps全称是Transmit Packet Steering，是rfs/rps的作者Tom Herbert提交的又一个patch，预计会在2.6.37进入内核。

阅读全文

linux kernel 网络协议栈之GRO(Generic receive offload)

2010-11-26

GRO(Generic receive offload)在内核2.6.29之后合并进去的，作者是一个华裔Herbert Xu ,GRO的简介可以看这里：

http://lwn.net/Articles/358910/

先来描述一下GRO的作用，GRO是针对网络接受包的处理的，并且只是针对NAPI类型的驱动，因此如果要支持GRO，不仅要内核支持，而且驱动也必须调用相应的借口，用ethtool -K gro on来设置，如果报错就说明网卡驱动本身就不支持GRO。

GRO类似tso，可是tso只支持发送数据包，这样你tcp层大的段会在网卡被切包，然后再传递给对端，而如果没有gro，则小的段会被一个个送到协议栈，有了gro之后，就会在接收端做一个反向的操作(想对于tso).也就是将tso切好的数据包组合成大包再传递给协议栈。

如果实现了GRO支持的驱动是这样子处理数据的，在NAPI的回调poll方法中读取数据包，然后调用GRO的接口napi_gro_receive或者napi_gro_frags来将数据包feed进协议栈。而具体GRO的工作就是在这两个函数中进行的，他们最终都会调用napi_gro_receive。下面就是napi_gro_receive，它最终会调用napi_skb_finish以及napi_gro_receive。

阅读全文

linux kernel tcp拥塞处理之cubic算法

2010-11-19

接上一篇(可以看我协议栈分析的那个pdf).这里我的内核版本是2.6.36.

这次主要来看一下内核拥塞控制算法cubic的实现，在linux kernel中实现了很多种拥塞控制算法，不过新的内核(2.6.19之后)默认是cubic(想得到当前内核使用的拥塞控制算法可以察看/proc/sys/net/ipv4/tcp_congestion_control这个值).下面是最新的redhat 6的拥塞控制算法(rh5还是bic算法):

  
[root@rhel6 ~]# cat /proc/sys/net/ipv4/tcp_congestion_control
  
cubic

这个算法的paper在这里：

http://netsrv.csc.ncsu.edu/export/cubic_a_new_tcp_2008.pdf

阅读全文

linux kernel中如何保证append写的原子性

2010-11-02

先来描述一下，write系统调用的大体流程，首先内核会取得对应的文件偏移，然后调用vfs的write操作，而在vfs层的write操作的时候会调用对应文件系统的write方法，而在对应文件系统的write方法中aio_write方法，最终会调用底层驱动。这里有一个需要注意的就是内核在写文件的时候会加一把锁(有些设备不会加锁，比如块设备以及裸设备).这样也就是说一个文件只可能同时只有一个进程在写。而且快设备是不支持append写的。

而这里append的原子操作的实现很简单，由于每次写文件只可能是一个进程操作(计算文件偏移并不包含在锁里面)，而append操作是每次写到末尾(其他类型的写是先取得pos才进入临界区，而这个时间内有可能pos已经被其他进程改变，而append的pos的计算是包含在锁里面的),因此自然append就是原子的了.

阅读全文

pagefault