(转)字符编码转换


有时候你从外部文件读进来的字符串是MBCS编码(如GB2312),而你程序里面都是统一用Unicode处理字符串,这时候要进行字符编码转换。 Windows为我们提供了很好用的API函数 MultiByteToWideChar 和 WideCharToMultiByte 帮我们轻松实现转换。 代码如下:

GB2312 转换成 Unicode:
CODE:

wchar_t* GB2312ToUnicode(const char* szGBString)
{
        UINT nCodePage = 936; //GB2312
        int nLength=MultiByteToWideChar(nCodePage,0,szGBString,-1,NULL,0);
        wchar_t* pBuffer = new wchar_t[nLength+1];
        MultiByteToWideChar(nCodePage,0,szGBString,-1,pBuffer,nLength);
        pBuffer[nLength]=0;
        return pBuffer;
}

BIG5 转换成 Unicode:
CODE:

wchar_t* BIG5ToUnicode(const char* szBIG5String)
{
        UINT nCodePage = 950; //BIG5
        int nLength=MultiByteToWideChar(nCodePage,0,szBIG5String,-1,NULL,0);
        wchar_t* pBuffer = new wchar_t[nLength+1];
        MultiByteToWideChar(nCodePage,0,szBIG5String,-1,pBuffer,nLength);
        pBuffer[nLength]=0;
        return pBuffer;
}

Unicode 转换成 GB2312:
CODE:

char* UnicodeToGB2312(const wchar_t* szUnicodeString)
{
        UINT nCodePage = 936; //GB2312
        int nLength=WideCharToMultiByte(nCodePage,0,szUnicodeString,-1,NULL,0,NULL,NULL);
        char* pBuffer=new char[nLength+1];
        WideCharToMultiByte(nCodePage,0,szUnicodeString,-1,pBuffer,nLength,NULL,NULL);
        pBuffer[nLength]=0;
        return pBuffer;
}

Unicode 转换成 BIG5:
CODE:

char* UnicodeToBIG5(const wchar_t* szUnicodeString)
{
        UINT nCodePage = 950; //BIG5
        int nLength=WideCharToMultiByte(nCodePage,0,szUnicodeString,-1,NULL,0,NULL,NULL);
        char* pBuffer=new char[nLength+1];
        WideCharToMultiByte(nCodePage,0,szUnicodeString,-1,pBuffer,nLength,NULL,NULL);
        pBuffer[nLength]=0;
        return pBuffer;
}

繁体和简体的相互转换

利用Unicode作为媒介,还可以做出很有意思的应用。在处理中文过程中,一个经常用到的功能就是繁体和简体的互相转换。 代码如下:

繁体中文BIG5 转换成 简体中文 GB2312
CODE:

char* BIG5ToGB2312(const char* szBIG5String)
{
        LCID lcid = MAKELCID(MAKELANGID(LANG_CHINESE,SUBLANG_CHINESE_SIMPLIFIED),SORT_CHINESE_PRC);

        wchar_t* szUnicodeBuff = BIG5ToUnicode(szBIG5String);
        char* szGB2312Buff = UnicodeToGB2312(szUnicodeBuff);

        int nLength = LCMapString(lcid,LCMAP_SIMPLIFIED_CHINESE, szGB2312Buff,-1,NULL,0);
        char* pBuffer = new char[nLength + 1];
        LCMapString(0x0804,LCMAP_SIMPLIFIED_CHINESE,szGB2312Buff,-1,pBuffer,nLength);
        pBuffer[nLength] = 0;
       
        delete[] szUnicodeBuff;
        delete[] szGB2312Buff;
        return pBuffer;
}

简体中文 GB2312 转换成 繁体中文BIG5
CODE:

char* GB2312ToBIG5(const char* szGBString)
{
        LCID lcid = MAKELCID(MAKELANGID(LANG_CHINESE,SUBLANG_CHINESE_SIMPLIFIED),SORT_CHINESE_PRC);

        int nLength = LCMapString(lcid,LCMAP_TRADITIONAL_CHINESE,szGBString,-1,NULL,0);
        char* pBuffer=new char[nLength+1];
        LCMapString(lcid,LCMAP_TRADITIONAL_CHINESE,szGBString,-1,pBuffer,nLength);
        pBuffer[nLength]=0;

        wchar_t* pUnicodeBuff = GB2312ToUnicode(pBuffer);
        char* pBIG5Buff = UnicodeToBIG5(pUnicodeBuff);

        delete[] pBuffer;
        delete[] pUnicodeBuff;
        return pBIG5Buff;
}

文章作者: 2356
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 2356 !