Hash table

Posted on Sun 28 November 2010 in misc0 Comments

Hash table定义(摘自wikidpedia)\ In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number).\ hash table简单来说就是一种数据结构,可以通过一个key来查询对应的value值(一般情况下效率是很高的)。

记得以前大学时,老师曾出过这样的一道题目 ...

Continue reading

最大流与最小割

Posted on Fri 26 November 2010 in misc0 Comments

最大流与最小割定义(摘自wikipedia)\ In optimization theory, the max-flow min-cut theorem states that in a flow network, the maximum amount of flow passing from the source to the sink is equal to the minimum capacity which when removed in a specific way from the network causes the situation that no flow ...

Continue reading

二分图最大匹配

Posted on Tue 23 November 2010 in misc0 Comments

二分图定义(摘自wiki)\ In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint sets U and V such that every edge connects a vertex in U to one in V; that is, U and V are independent sets ...

Continue reading

python解析html

Posted on Wed 10 November 2010 in misc0 Comments

python自带有一个html的解析库,但这个库的功能有限,而且对网页中异常情况的处理不好。\ 后来在网上找到一个叫[BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/)的网页解析库,这个库利用了正则表达式对网页进行处理,能比较完美地处理异常情况,还支持unicode。\ 除此之外还有lxml等python库。\ 下面是BeautifulSoup的一些例子,是从官网摘过来的。更多详细信息可以看[官方文档](http://www.crummy.com/software/BeautifulSoup/documentation.html),有[中文版](http://www.crummy.com/software/BeautifulSoup/documentation.zh.html)\

from BeautifulSoup import BeautifulSoup
import re
doc = ['<html><head><title>Page title</title></head>',
'<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
'<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
'</html>']
doc = ''.join(doc)
soup = BeautifulSoup(doc)
print soup.prettify()#这里相当于把网页标准化。如<b>some thing后忘记了</b>,prettify会补上。
#下面是打印的结果
# <html>
# <head>
# <title>
# Page title
# </title>
# </head>
# <body>
# <p id="firstpara" align="center">
# This is paragraph
# <b>
# one
# </b>
# .
# </p>
# <p id="secondpara" align="blah">
# This is paragraph
# <b>
# two
# </b>
# .
# </p>
# </body>
# </html>
soup.contents[0].name#可以使用dot的方式访问节点,十分方便。
# u'html'
soup.contents[0].contents[0].name
# u'head'
head = soup.contents[0].contents[0]
head.parent.name
# u'html'
head.next
# <title>Page title</title>
head.nextSibling.name
# u'body'
head.nextSibling.contents[0]
# <p id="firstpara" align="center">This is paragraph <b>one</b>.</p>
head.nextSibling.contents[0].nextSibling
# <p id="secondpara" align="blah">This is paragraph <b>two</b>.</p>
soup.findAll('p', align="center")#可以使用筛选器,有点像css selectors
# [<p id="firstpara" align="center">This is paragraph <b>one</b>. </p>]
soup.find('p', align="center")
# <p id="firstpara" align="center">This is paragraph <b>one</b>. </p>
soup('p', align="center")[0]['id']
# u'firstpara'
soup.find('p', align=re.compile('^b.*'))['id']
# u'secondpara'
soup.find('p').b.string
# u'one'
soup('p')[1].b.string
# u'two'

Continue reading

判断两个路径(文件或文件夹)所指的内容是否相同

Posted on Mon 25 October 2010 in misc0 Comments

思路:\ 1.判断文件是否都存在\ 2.判断类型是否相同(是否都是文件夹,是否都是文件)\ 3.如果是文件, 比较文件的二进制内容\ 4.如果是文件夹:\ 4.1 列出文件夹下的所有文件,并按文件名排序\ 4.2 依次递归比较所有文件

代码:\

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <windows.h>
#include <fstream>
#include <cstring>
#include <cassert>
using namespace std;
#define BUFFER_SIZE 4096
//判断是否为dir
bool is_dir(const char *file_name)
{
HANDLE hFile;
WIN32_FIND_DATA FileInformation;
hFile = ::FindFirstFile(file_name, &FileInformation);
if (hFile != INVALID_HANDLE_VALUE)
{
::FindClose(hFile);
if (FileInformation.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
return true;
}
}
return false;
}
//判断文件是否存在
bool file_exist(const char *file_path)
{
HANDLE hFile;
WIN32_FIND_DATA FileInformation;
hFile = ::FindFirstFile(file_path, &FileInformation);
if (hFile != INVALID_HANDLE_VALUE)
{
::FindClose(hFile);
return true;
}
return false;
}
//比较两个文件内容是否相同
//没有判断文件是否存在
bool file_compare(const char *src, const char *dest)
{
//判断内容是否相同
char src_buf[BUFFER_SIZE];
char dest_buf[BUFFER_SIZE];
fstream src_fs, dest_fs;
int len;
src_fs.open(src, ios_base::in);
dest_fs.open(dest, ios_base::in);
assert(src_fs.is_open());
assert(dest_fs.is_open());
while (true)
{
if (src_fs.eof() && dest_fs.eof())
{
src_fs.close();
dest_fs.close();
return true;
}
else if (src_fs.eof() || dest_fs.eof())
{
break;
}
src_fs.read(src_buf, BUFFER_SIZE);
dest_fs.read(dest_buf, BUFFER_SIZE);
if ((len = src_fs.gcount()) != dest_fs.gcount())
{
break;
}
if(memcmp(src_buf, dest_buf, len) != 0) break;
}
src_fs.close();
dest_fs.close();
return false;
}
//列出dir目录下的所有文件
//dir不能以\\结尾
void list_dir(const char *dir, vector<string> &fs)
{
string strDir = dir;
string strPattern;
HANDLE hFile;
WIN32_FIND_DATA FileInformation;
strPattern = strDir + "\\*.*";
hFile = ::FindFirstFile(strPattern.c_str(), &FileInformation);
if (hFile != INVALID_HANDLE_VALUE)
{
do
{
//去掉..和.目录
if (FileInformation.cFileName[0] != '.')
{
fs.push_back(FileInformation.cFileName);
}
} while (::FindNextFile(hFile, &FileInformation) == TRUE);
::FindClose(hFile);
}
}
//比较两个文件或两个文件夹是否相同
//src和dest不能以\\结尾
bool diff(const char *src, const char *dest)
{
string src_path = src;
string dest_path = dest;
//判断是否存在
if (file_exist(src_path.c_str()) == false)
{
#ifdef DEBUG
cout << src_path << " no exist" << endl;
#endif
return false;
}
if (file_exist(dest_path.c_str()) == false)
{
#ifdef DEBUG
cout << dest_path << " no exist" << endl;
#endif
return false;
}
bool dir = is_dir(src_path.c_str());
//是否同为dir或file
if (dir == is_dir(dest_path.c_str()))
{
//同为dir
if (dir)
{
#ifdef DEBUG
cout << "dir compare" << endl;
#endif
vector<string> src_files, dest_files;
//找到目录下所有文件,包括文件夹
list_dir(src_path.c_str(), src_files);
list_dir(dest_path.c_str(), dest_files);
//按照字符串排序
sort(src_files.begin(), src_files.end());
sort(dest_files.begin(), dest_files.end());
vector<string>::iterator src_it = src_files.begin();
vector<string>::iterator dest_it = dest_files.begin();
for (; src_it != src_files.end() && dest_it != dest_files.end(); src_it++, dest_it++)
{
//如果文件名不相同或文件内容不同,则返回false
#ifdef DEBUG
cout << *src_it << " " << *dest_it << endl;
#endif
if ((*src_it) != (*dest_it) || diff((src_path + "\\" + src_it->c_str()).c_str(), (dest_path + "\\" + dest_it->c_str()).c_str()) == false)
{
return false;
}
}
//所有内容都相同,返回true
if (src_it == src_files.end() && dest_it == dest_files.end())
{
return true;
}
return false;
}
else //同为file
{
#ifdef DEBUG
cout << "file compare" << endl;
#endif
return file_compare(src_path.c_str(), dest_path.c_str());
}
}
#ifdef DEBUG
else {
cout << src_path << " and " << dest_path << " not the same type!" << endl;
}
#endif
return false;
}
int main(int argc, char *argv[])
{
vector<string> fs;
list_dir(".", fs);
for (vector<string>::iterator it = fs.begin(); it != fs.end(); it++)
{
cout << *it << endl;
}
cout << file_compare("main.cpp", "main.cpp_cp") << endl;//1
cout << diff("main.cpp", "main.cpp_cp") << endl;//1
cout << diff("main.cpp", "not_main.cpp") << endl;//0
cout << is_dir(".") << endl;//1
cout << is_dir("..") << endl;//1
cout << is_dir("main.cpp") << endl;//1
cout << is_dir("nothing") << endl;//0
cout << file_exist(".") << endl;//1
cout << file_exist("nothing") << endl;//0
cout << diff("dir", "dir_cp") << endl;//1
cout << diff("dir", "not_dir") << endl;//0
getchar();
return 0;
}

Continue reading

python实现daemon程序

Posted on Thu 21 October 2010 in miscLeave a comment

为了实现python程序的daemon运行,找到了如下的代码。通过继承这个类,就能实现daemon程序。\ 除此之外还可以用screen,或nohup来实现。\ 代码出处:http://www.jejik.com/articles/2007/02/a_simple_unix_linux_daemon_in_python/

import sys, os, time, atexit
from signal import SIGTERM

class Daemon:
    """
   A generic daemon class.

 Usage: subclass the Daemon class and override the run() method
    """
   def __init__(self, pidfile, stdin='/dev/null', stdout='/dev/null ...
Continue reading

获取自增长id(mysql)

Posted on Wed 20 October 2010 in misc0 Comments

java

PreparedStatement ps =conn.prepareStatement("insert into table (col) values (?)", Statement.RETURN_GENERATED_KEYS);
ps.getGeneratedKeys();

sql

最后插入的id

select last_insert_id();

下一个id

SHOW TABLE STATUS LIKE "table_name";//auto_increment列
Continue reading

ttserver相关

Posted on Wed 20 October 2010 in misc0 Comments

ttserver在32位机器上支持超过4g的数据库,编译时加上如下的选项:\ --enbale-off64

Continue reading

当前目录和程序所在目录

Posted on Wed 20 October 2010 in miscLeave a comment

1.python\ 当前目录:

import os
os.getcwd()

程序所在目录:

import os
#os.path.split( os.path.realpath( sys.argv[0] ) )[0]
os.path.dirname(os.path.realpath(__file__))

2.vc/mfc\ 当前目录:

TCHAR path[MAX_PATH];
GetCurrentDirectory(path, MAX_PATH);

程序所在目录:

TCHAR path[MAX_PATH];
GetModuleFileName(NULL, path, MAX_PATH);

3.c/c++(linux)\ 当前目录 ...

Continue reading

确保FileOutputStream将内容写入硬盘

Posted on Thu 14 October 2010 in misc0 Comments

最近写了一个程序,需要在数据写入文件后立即读取该文件内容。这一简单的功能,却会时不时抛异常,令我十分不解。

原程序如下:\

FileOutputStream os = new FileOutputStream("some_file");\ os.write(data);\ os.close();

在查了jdk中关于flush()函数的介绍后,发现了一些问题。flush只保证数据已交付给了操作系统,无法确定已写入硬盘。

以下是jdk中关于flush()函数的介绍:\

Flushes this output stream and forces any buffered output bytes to be written out. The general contract of flush is that calling it is an indication that ...

Continue reading