protobuf的动态编译      
allen2660 +
    

protobuf的动态编译

动态编译

使用场景

通常情况下,使用Protobuf的人会首先写好.proto文件,再使用protoc编译器编译成想要的目标语言文件。然后将这些自动生成的代码和应用程序一起编译。

在某些场景下,人们无法或者说不想预先知道.proto文件(比如Hive的Protobuf Serde)。这就需要动态编译.proto文件。

C++

通过前面分析protobuf源码知道,Protobuf提供了compiler包来编译.proto文件,protoc就是这么用的。

创建Message

const Descriptor * GetDescriptorFromProtoByTypeName (const string & input_file, const string &type_name) {
    DiskSourceTree *source_tree = new DiskSourceTree;
    ErrorPrinter *error_collector = new ErrorPrinter(ERROR_FORMAT_GCC, source_tree);
    Importer *importer = new Importer(source_tree, error_collector);    

    source_tree->MapPath("","./");  

    //下面这两行,至少需要执行一行,不然DescriptorPool找不到Message。DescriptorPool会首先去Tables的symbols_找,找不到会去database中找Symbol
    //这个和SourceTreeDescriptorDatabase::FindFileContainingSymbol 直接返回false有关系。
    //所以需要先FindFileByName或者Import将.proto文件 load到内存中
    //const FileDescriptor *fd = importer.Import(input_file);
    const FileDescriptor *fd = importer->pool()->FindFileByName(input_file);    

    const Descriptor *desc = importer->pool()->FindMessageTypeByName(type_name);
    cout << desc->DebugString();
    return desc;
}

Message *createMessage(const Descriptor *descriptor) {
    Message *message = NULL;
    //再一次踩坑,这个factory中也有一个scoped_ptr
    //一开始是使用的栈上的,总是core
    DynamicMessageFactory *factory = new DynamicMessageFactory;
    if (descriptor) {
        // 创建default message(prototype)
        const Message *prototype = factory->GetPrototype(descriptor);
        if (NULL != prototype) {
            // 创建一个可修改的message
            message = prototype->New();
        }   
    }       
    return message;
}

使用反射设置Message

/**
* @brief generateMessage 
 * 实际类型是tutorial.AddressBook, 但是我们没有类型的定义(动态Parse的,不是静态生成)
 * 这里使用反射接口来设置Message的Field,最后打出Message
 *
 * @param descriptor
 */
void generateMessage(const Descriptor *descriptor, ostream *output) {
    Message *message = createMessage(descriptor);
    if(message == NULL) {
        cout<< "message is NULL" << endl;
    }
    const Reflection *reflection = NULL;
    reflection = message->GetReflection();  

    const FieldDescriptor * field = NULL;
    field = descriptor->FindFieldByName("name");
    reflection->SetString(message, field, "allen"); 

    const FieldDescriptor * field_id = NULL;
    field_id = descriptor->FindFieldByName("id");
    reflection->SetInt32(message, field_id, 12345); 

    cout << message->DebugString();
    message->SerializeToOstream(output);    

    //delete message;   

}

使用反射解析Message

当用动态接口得到FileDescriptor,以及Descriptor后,要想解析一个二进制pbMsg,可以这么做:

void parseMessage(const Descriptor *descriptor, istream *in_stream) {
    Message *message = createMessage(descriptor);   

    message->ParseFromIstream(in_stream);
    const Reflection *reflection = message->GetReflection();    

    const FieldDescriptor * field = NULL;
    field = descriptor->FindFieldByName("name");
    string name = reflection->GetString(*message, field);   

    const FieldDescriptor * field_id = NULL;
    field_id = descriptor->FindFieldByName("id");
    int id = reflection->GetInt32(*message, field_id);  

    cout << message->DebugString(); 

    delete message; 

}

Java

Java目前没有找到直接编译.proto文件的方法,这篇文章中讲解可以使用desc文件来build,但是生成desc文件还是要使用protoc二进制。这个待研究。

Demo

上面的完整代码如下:

/**
* @file dynamic_demo.cc
 * @brief 
 * @author liwei
 * @version 
 * @date 2014-07-14
 */ 


#include <iostream>
#include <fstream>
#include <string>
#include <google/protobuf/compiler/importer.h>
#include <google/protobuf/compiler/command_line_interface.h>
#include <google/protobuf/descriptor.h>
#include <google/protobuf/dynamic_message.h>    

using namespace std;
using namespace google::protobuf::compiler;
using namespace google::protobuf::io;
using namespace google::protobuf;   


enum ErrorFormat {
    ERROR_FORMAT_GCC,   // GCC error output format (default).
    ERROR_FORMAT_MSVS   // Visual Studio output (--error_format=msvs).
  };    

class ErrorPrinter : public MultiFileErrorCollector,
                                           public ErrorCollector {
 public:
  ErrorPrinter(ErrorFormat format, DiskSourceTree *tree = NULL)
    : format_(format), tree_(tree) {}
  ~ErrorPrinter() {}    

  // implements MultiFileErrorCollector ------------------------------
  void AddError(const string& filename, int line, int column,
                const string& message) {    

    // Print full path when running under MSVS
    string dfile;
    if (format_ == ERROR_FORMAT_MSVS &&
        tree_ != NULL &&
        tree_->VirtualFileToDiskFile(filename, &dfile)) {
      cerr << dfile;
    } else {
      cerr << filename;
    }   

    // Users typically expect 1-based line/column numbers, so we add 1
    // to each here.
    if (line != -1) {
      // Allow for both GCC- and Visual-Studio-compatible output.
      switch (format_) {
        case ERROR_FORMAT_GCC:
          cerr << ":" << (line + 1) << ":" << (column + 1);
          break;
        case ERROR_FORMAT_MSVS:
          cerr << "(" << (line + 1) << ") : error in column=" << (column + 1);
          break;
      }
    }   

    cerr << ": " << message << endl;
  } 

  // implements io::ErrorCollector -----------------------------------
  void AddError(int line, int column, const string& message) {
    AddError("input", line, column, message);
  } 

 private:
  const ErrorFormat format_;
  DiskSourceTree *tree_;    

};  

const Descriptor * GetDescriptorFromProtoByTypeName (const string & input_file, const string &type_name) {
    DiskSourceTree *source_tree = new DiskSourceTree;
    ErrorPrinter *error_collector = new ErrorPrinter(ERROR_FORMAT_GCC, source_tree);
    Importer *importer = new Importer(source_tree, error_collector);    

    source_tree->MapPath("","./");  

    //下面这两行,至少需要执行一行,不然DescriptorPool找不到Message。DescriptorPool会首先去Tables的symbols_找,找不到会去database中找Symbol
    //这个和SourceTreeDescriptorDatabase::FindFileContainingSymbol 直接返回false有关系。
    //所以需要先FindFileByName或者Import将.proto文件 load到内存中
    //const FileDescriptor *fd = importer.Import(input_file);
    const FileDescriptor *fd = importer->pool()->FindFileByName(input_file);    

    const Descriptor *desc = importer->pool()->FindMessageTypeByName(type_name); 
    cout << desc->DebugString();
    return desc;
}   


Message *createMessage(const Descriptor *descriptor) {
    Message *message = NULL;
    //再一次踩坑,这个factory中也有一个scoped_ptr
    //一开始是使用的栈上的,总是core
    DynamicMessageFactory *factory = new DynamicMessageFactory;
    if (descriptor) {
        // 创建default message(prototype)
        const Message *prototype = factory->GetPrototype(descriptor);
        if (NULL != prototype) {
            // 创建一个可修改的message
            message = prototype->New();
        }
    }   

    return message;
}   

/**
 * @brief generateMessage 
 * 实际类型是tutorial.AddressBook, 但是我们没有类型的定义(动态Parse的,不是静态生成)
 * 这里使用反射接口来设置Message的Field,最后打出Message
 *
 * @param descriptor
 */
void generateMessage(const Descriptor *descriptor, ostream *output) {
    Message *message = createMessage(descriptor);
    if(message == NULL) {
        cout<< "message is NULL" << endl;
    }
    const Reflection *reflection = NULL;
    reflection = message->GetReflection();  

    const FieldDescriptor * field = NULL;
    field = descriptor->FindFieldByName("name");
    reflection->SetString(message, field, "allen"); 

    const FieldDescriptor * field_id = NULL;
    field_id = descriptor->FindFieldByName("id");
    reflection->SetInt32(message, field_id, 12345); 

    cout << message->DebugString();
    message->SerializeToOstream(output);    

    //delete message;   

}   

void parseMessage(const Descriptor *descriptor, istream *in_stream) {
    Message *message = createMessage(descriptor);   

    message->ParseFromIstream(in_stream);
    const Reflection *reflection = message->GetReflection();    

    const FieldDescriptor * field = NULL;
    field = descriptor->FindFieldByName("name");
    string name = reflection->GetString(*message, field);   

    const FieldDescriptor * field_id = NULL;
    field_id = descriptor->FindFieldByName("id");
    int id = reflection->GetInt32(*message, field_id);  

    cout << message->DebugString(); 

    delete message; 



}   

int main(int argc, char * args[]) {
    //buildFromProtoByImporter("addressbook.proto");
    //buildFromProto("tutorial.AddressBook");

    const Descriptor *descriptor = GetDescriptorFromProtoByTypeName("addressbook.proto", "tutorial.Person");
    cout << "-----------Generate use reflection" <<endl;
    ofstream output;
    output.open("file.pb");
    generateMessage(descriptor, &output);
    cout << "-----------Parseing use reflection" <<endl;
    output.flush();
    output.close();
    ifstream input;
    input.open("file.pb");
    //cerr<< input <<endl;
    parseMessage(descriptor, &input);   

}

反射

这一段主要包含下面几个类:

Message

Message位于 google/protobuf/message.h。Message继承自MessageLite类。MessageLite已经可以处理protobuf的绝大部分日常使用了,Message在其基础上加上了descriptor和reflection的功能,也就是动态类型特征。可以使用在.proto文件中加上选项使compiler生成的代码是Lite的:

option optimize_for = LITE_RUNTIME;

Message的主要方法:

struct Metadata {
    const Descriptor* descriptor;
    const Reflection* reflection;
};

//返回一个新的实例,内存将由调用者管理
virtual Message* New() const = 0;

//得到Reflection指针
virtual const Reflection* GetReflection() const {
    return GetMetadata().reflection;
}

//一般静态生成的代码中都会有这个方法的实现,将Descriptor和Reflection通过protobuf_AssignDescriptorsOnce();方法来生成xxx_descriptor_和xxx_reflection_,并将其赋给metadata,返回
virtual Metadata GetMetadata() const  = 0;

在静态代码中生成Reflection:(以addressbook.proto生成的代码举例)

namespace {
const ::google::protobuf::Descriptor* Person_descriptor_ = NULL;
const ::google::protobuf::internal::GeneratedMessageReflection*
  Person_reflection_ = NULL;
const ::google::protobuf::Descriptor* Person_PhoneNumber_descriptor_ = NULL;
const ::google::protobuf::internal::GeneratedMessageReflection*
  Person_PhoneNumber_reflection_ = NULL;
const ::google::protobuf::EnumDescriptor* Person_PhoneType_descriptor_ = NULL;
const ::google::protobuf::Descriptor* AddressBook_descriptor_ = NULL;
const ::google::protobuf::internal::GeneratedMessageReflection*
  AddressBook_reflection_ = NULL;       
}  // namespace

void protobuf_AssignDesc_addressbook_2eproto() {
  // 这个方法用于注册静态信息
  protobuf_AddDesc_addressbook_2eproto();
  const ::google::protobuf::FileDescriptor* file =
    ::google::protobuf::DescriptorPool::generated_pool()->FindFileByName(
      "addressbook.proto");
  GOOGLE_CHECK(file != NULL);
  Person_descriptor_ = file->message_type(0);
  static const int Person_offsets_[4] = {
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person, name_),
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person, id_),
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person, email_),
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person, phone_),
  };
  Person_reflection_ =
    new ::google::protobuf::internal::GeneratedMessageReflection(
      Person_descriptor_,
      Person::default_instance_,
      Person_offsets_,
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person, _has_bits_[0]),
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person, _unknown_fields_),
      -1,  
      ::google::protobuf::DescriptorPool::generated_pool(),
      ::google::protobuf::MessageFactory::generated_factory(),
      sizeof(Person));
  Person_PhoneNumber_descriptor_ = Person_descriptor_->nested_type(0);
  static const int Person_PhoneNumber_offsets_[2] = {
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person_PhoneNumber, number_),
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person_PhoneNumber, type_),
  };
  Person_PhoneNumber_reflection_ =
    new ::google::protobuf::internal::GeneratedMessageReflection(
      Person_PhoneNumber_descriptor_,
      Person_PhoneNumber::default_instance_,
      Person_PhoneNumber_offsets_,
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person_PhoneNumber, _has_bits_[0]),
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(Person_PhoneNumber, _unknown_fields_),
      -1,
      ::google::protobuf::DescriptorPool::generated_pool(),
      ::google::protobuf::MessageFactory::generated_factory(),
      sizeof(Person_PhoneNumber));
  Person_PhoneType_descriptor_ = Person_descriptor_->enum_type(0);
  AddressBook_descriptor_ = file->message_type(1);
  static const int AddressBook_offsets_[1] = {
    GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(AddressBook, person_),
  };
  AddressBook_reflection_ =
    new ::google::protobuf::internal::GeneratedMessageReflection(
      AddressBook_descriptor_,
      AddressBook::default_instance_,
      AddressBook_offsets_,
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(AddressBook, _has_bits_[0]),
      GOOGLE_PROTOBUF_GENERATED_MESSAGE_FIELD_OFFSET(AddressBook, _unknown_fields_),
      -1,
      ::google::protobuf::DescriptorPool::generated_pool(),
      ::google::protobuf::MessageFactory::generated_factory(),
      sizeof(AddressBook));
}   

namespace { 

// 这里让protobuf_AssignDesc_addressbook_2proto方法有且被执行一遍
GOOGLE_PROTOBUF_DECLARE_ONCE(protobuf_AssignDescriptors_once_);
inline void protobuf_AssignDescriptorsOnce() {
  ::google::protobuf::GoogleOnceInit(&protobuf_AssignDescriptors_once_,
                 &protobuf_AssignDesc_addressbook_2eproto);
}
void protobuf_RegisterTypes(const ::std::string&) {
  protobuf_AssignDescriptorsOnce();
  ::google::protobuf::MessageFactory::InternalRegisterGeneratedMessage(
    Person_descriptor_, &Person::default_instance());
  ::google::protobuf::MessageFactory::InternalRegisterGeneratedMessage(
    Person_PhoneNumber_descriptor_, &Person_PhoneNumber::default_instance());
  ::google::protobuf::MessageFactory::InternalRegisterGeneratedMessage(
    AddressBook_descriptor_, &AddressBook::default_instance());
}   

}  // namespace

void protobuf_ShutdownFile_addressbook_2eproto() {
  delete Person::default_instance_;
  delete Person_reflection_;
  delete Person_PhoneNumber::default_instance_;
  delete Person_PhoneNumber_reflection_;
  delete AddressBook::default_instance_;
  delete AddressBook_reflection_;
}   

void protobuf_AddDesc_addressbook_2eproto() {
  static bool already_here = false;
  if (already_here) return;
  already_here = true;
  GOOGLE_PROTOBUF_VERIFY_VERSION;   

  ::google::protobuf::DescriptorPool::InternalAddGeneratedFile(
    "\n\021addressbook.proto\022\010tutorial\"\332\001\n\006Person"
    "\022\014\n\004name\030\001 \002(\t\022\n\n\002id\030\002 \002(\005\022\r\n\005email\030\003 \001("
    "\t\022+\n\005phone\030\004 \003(\0132\034.tutorial.Person.Phone"
    "Number\032M\n\013PhoneNumber\022\016\n\006number\030\001 \002(\t\022.\n"
    "\004type\030\002 \001(\0162\032.tutorial.Person.PhoneType:"
    "\004HOME\"+\n\tPhoneType\022\n\n\006MOBILE\020\000\022\010\n\004HOME\020\001"
    "\022\010\n\004WORK\020\002\"/\n\013AddressBook\022 \n\006person\030\001 \003("
    "\0132\020.tutorial.PersonB)\n\024com.example.tutor"
    "ialB\021AddressBookProtos", 342);
  ::google::protobuf::MessageFactory::InternalRegisterGeneratedFile(
    "addressbook.proto", &protobuf_RegisterTypes);
  Person::default_instance_ = new Person();
  Person_PhoneNumber::default_instance_ = new Person_PhoneNumber();
  AddressBook::default_instance_ = new AddressBook();
  Person::default_instance_->InitAsDefaultInstance();
  Person_PhoneNumber::default_instance_->InitAsDefaultInstance();
  AddressBook::default_instance_->InitAsDefaultInstance();
  ::google::protobuf::internal::OnShutdown(&protobuf_ShutdownFile_addressbook_2eproto);
}

在动态代码中生成Reflection:(这个就拿上面的动态信息Demo举例)

//生成代码
DynamicMessageFactory *factory = new DynamicMessageFactory;
const Message *prototype = factory->GetPrototype(descriptor);
const Reflection *reflection = message->GetReflection();

//下面就要看看DynamicMessageFactory->GetPrototype(descriptor),这个方法会返回一个Message对象。dynamic_message.cc line 471

const Message* DynamicMessageFactory::GetPrototypeNoLock( const Descriptor* type) {

    //1. 看看是不是已经编译进来的类型:
    if (delegate_to_generated_factory_ &&
        type->file()->pool() == DescriptorPool::generated_pool()) {
        return MessageFactory::generated_factory()->GetPrototype(type);
    }

    //2. 看看map_中有没有

    //3. 构建DynamicMessage::TypeInfo
    // 需要三个方面的数据来构造GeneratedMessageReflection: memory, offsets of each field, bitfield


    //4. 构建GeneratedMessageReflection对象

    //5. 构建DynamicMessage prototype

    //6. prototype->CrossLinkPrototypes(); 这个方法主要功能是将Message中的依然为Messge的属性也调用GetProtoType

    //7. 返回prototype
}

Descriptor

descriptor.h

DynamicMessage

dynamic_message.cc

从任意的Descriptor构造Message,DynamicMessage提供了这个功能。该类中使用了placement new

重要函数:

//构造函数 DynamicMessage::DynamicMessage(const TypeInfo type_info){ 分配type_info中的offsets[i]所代表的FileDescriptor 所指向的内存。 }

DynamicMessage::~DynamicMessage() {}

DynamicMessageFactory

google/protobuf/dynamic_message.h

主要方法是MessageFactory的主要方法,上面讲Message的时候已经描述了:

const Message GetPrototype(const Descriptor type);

Reflection

google/protobuf/message.h

Reflection接口包含用于动态地访问和修改protocol消息的方法。大部分的Message的GetReflection()方法,都用的GeneratedMessageReflection实现(generated_message_reflection.h)。由于是反射接口,肯定不如生成的类那么安全。比如说:

主要接口有Get/Set/Add,具体可以参考message.h。

GeneratedMessageReflection

generated_message_reflection.h

#define DEFINE_PRIMITIVE_ACCESSORS(TYPENAME, TYPE, PASSTYPE, CPPTYPE)

MessageFactory

//单例模式,所有的compiled-in的Message,都会往MessageFactory中注册自己。 static MessageFactory* generated_factory();

//下面两个函数用于在自动化生成的类中,往MessageFactory中注册 static void InternalRegisterGeneratedFile( const char filename, void (register_messages)(const string&)); static void InternalRegisterGeneratedMessage(const Descriptor descriptor, const Message prototype);

点击查看评论