i analyzing scientific text format like
keyword { 1.0 22.2 59.6 'cm' 'yes' }
i new spirit, , after studying document, can use spirit solve fixed-format keyword .
but following format, don't know how build grammar. question is: in scientific keyword i've meet, items of data can defaulted built-in default value. keyword description indicates when defaults can applied. there 2 ways of setting quantities default values. firstly, ending data record prematurely slash '}' quantities remaining unspecified set default values. secondly, selected quantities positioned before '}' can defaulted entering n* n number of consecutive quantities defaulted. example, 3* causes next 3 quantities in keyword data given default values.
for example,
person { 'tom' 188 80 'male' 32 }
say 'male' , '32' default value, , equivalent can be:
person { 'tom' 188 88 2* }
or
person { 'tom' 188 88 'male' 1* }
or
person { 'tom' 188 88 }
i've searched past posts, , this gives me idea, how can write rule of n*?
the parser you're asking complex has solve several tasks:
- handle missing elements in end
- handle "2*" syntax replacement missing elements @ end
- properly not parse valid inputs fill given data structure matched values
the trick here utilize qi::attr
in different ways:
to supply default values missing elements:
qi::int_ | qi::attr(180)
i.e. either match integer or use default value
180
to supply remaining values "2*" syntax (as @vines suggested):
"2*" >> qi::attr(attr2)
i.e. if
2*
matched use default value attr2 (whichfusion::vector
).
overall, came solution, seems parse , return default values fine (even if looks complex):
#include <string> #include <iostream> #include <boost/spirit/include/qi.hpp> #include <boost/fusion/include/vector.hpp> int main() { namespace qi = boost::spirit::qi; namespace fusion = boost::fusion; // attribute passed parser has match (in structure) // parser, requiring create nested fusion::vector's typedef fusion::vector<std::string, int> attribute1_type; typedef fusion::vector<int, attribute1_type> attribute2_type; typedef fusion::vector<int, attribute2_type> attribute3_type; // overall attribute type typedef fusion::vector<std::string, attribute3_type> attribute_type; // initialize attributes default values attribute1_type attr1("male", 32); attribute2_type attr2(80, attr1); attribute3_type attr3(180, attr2); qi::rule<std::string::iterator, std::string()> quoted_string = "'" >> *~qi::char_("'") >> "'"; qi::rule<std::string::iterator, attribute_type(), qi::space_type> data = qi::lit("person") >> "{" >> quoted_string >> -( ("4*" >> qi::attr(attr3)) | (qi::int_ | qi::attr(180)) >> -( ("3*" >> qi::attr(attr2)) | (qi::int_ | qi::attr(80)) >> -( ("2*" >> qi::attr(attr1)) | (quoted_string | qi::attr("male")) >> -( "1*" | qi::int_ | qi::attr(32) ) ) ) ) >> "}"; std::string in1 = "person\n{ 'tom' 188 80 'male' 32 }"; attribute_type fullattr1; if (qi::phrase_parse(in1.begin(), in1.end(), data, qi::space, fullattr1)) std::cout << fullattr1 << std::endl; std::string in2 = "person\n{ 'tom' 188 80 'male' }"; attribute_type fullattr2; if (qi::phrase_parse(in2.begin(), in2.end(), data, qi::space, fullattr2)) std::cout << fullattr2 << std::endl; std::string in3 = "person\n{ 'tom' 188 3* }"; attribute_type fullattr3; if (qi::phrase_parse(in3.begin(), in3.end(), data, qi::space, fullattr3)) std::cout << fullattr3 << std::endl; return 0; }
splitting rule separate rules (as @vines suggests) require input parsed more once, why used nested structure of sequences , alternatives.
Comments
Post a Comment