MS>> Comment extraction is not as easy as one might first think.
AR> ...
WM> Some pseudocode:
WM> Read source file until '/' or EOF
WM> If '/', check next character:
WM> If '/', read comment (C++) to EOL (optional)
WM> If '*', read comment (C) to '*/'
WM> Loop
WM> What have I missed?
AR> String constants. E.g.:
AR> char foo[] = " /* comment ??? ";
AR> char bar[] = " end comment */ ";
AR> After that is handled: char baz = '"';
If one can get all the states of a C-parser into one's head, eliminate 3/4 of
them due to being irrelevant, and put them down into a state-graph, it
quickly becomes simple. Tedious, but simple.
I recently went through the task of finding links in web pages this way. C
is a little (anyone sense a bit of understatement here?) more complex, but
not that much more so when you're limited to finding only certain things,
such as comments.
Your baz example, for instance, would be handled simply by knowing when
you've entered, and when you've left, a single-quote "string" (I can't recall
the term - it can't be "character" because you have have more than one
character in it). Same for string constants - in and out of double quotes.
Then you have to ignore the character after a backslash. Finally, you
disregard ALL of the above when you're in a comment.
read next byte until EOF
if inComment
if is end of comment
inComment := false
append a cr to output
else
add byte to output
endif
else if backslash
skip next byte
else if inSingleQuote
if is single quote
inSingleQuote := false
endif
else if inDoubleQuote
if is double quote
inDoubleQuote := false
endif
else if is single quote
inSingleQuote := true
else if is double quote
inDoubleQuote := true
else if start of comment
inComment := true
endif
loop
There is some obvious expanding to do in the "start of comment" and "end of
comment", but the basics are there. This can quickly grow, however, to
difficult-to-manage code where you end up being better off using yacc/bison
or some similar tool. I'm gonna have to learn one of these so I can improve
my HTML parser... :-) However, I won't just be able to pick one up off the
shelf - I need it to produce Java code. :-/
Just my 2 cents.
---
---------------
* Origin: Tanktalus' Tower BBS (1:250/102)
|