AltME: Parse

Messages

PeterWood
A web-public Parse group.
GrahamC
And Peter's first message is redacted in the web mirror!
Bo
As a show of support, this is my first post to any Parse group, not just the web-public one. :-)
My favorite use of parse:  parse some-str { }
Breaks apart any string into individual words.
sqlab
this looks like Rebol3, Rebol2 uses parse some-str none, but in both cases it breaks a string in strings, not in words
Andreas
Works in both, R2 and R3.
But for this particular use, Rebol 3 now has a _much_ better tool: SPLIT.
Bo
I learned something already. :-)

sqlab
it works in R2. as even
>> parse "a b c" {,}
== ["a" "b" "c"]
breaks at white spaces
Gabriele
don't forget that PARSE str "," (or any other delimiter), in R2 at least, is meant to parse CSV lines, so it has some built in magic that may surprise you if you're not aware of it.
>> parse {a,b,c} ","
== ["a" "b" "c"]
>> parse {a,b,c d} ","
== ["a" "b" "c" "d"]
>> parse {a,b,"c d"} ","
== ["a" "b" "c d"]
>> parse {a,b,"c d,e"} ","
== ["a" "b" "c d,e"]
note also:
>> parse/all {a,b,c d} ","
== ["a" "b" "c d"]
>> parse/all {a,b,c,d} ","
== ["a" "b" "c" "d"]
>> parse/all {a,b,"c,d"} ","
== ["a" "b" "c,d"]
sqlab
Just at what I wanted to point

Geomol
Would it makes sense to let
    parse "abc" [3 char!]
be the same as
    parse "abc" [3 skip]
Geomol
Maybe letting this be true is better:
    parse "abc" [word!]
Like this is true:
    parse "123" [integer!]

DocKimbel
Your opinion is welcome: COMMENT in PARSE
https://github.com/red/red/issues/724
Andreas
Related CureCode issue for COMMENT in PARSE:
http://issue.cc/r3/1966
Gregg
Opinion posted.

Rondon
Hi Folks, I'm having problem to parse the '&' commercial symbol. I'm using web-to-plain.r from rebol.org, but the 'Inc names, I'm having problem to parse it.. any clues ?
My problem is because I have some html entities starting with &, but the problem is to find just companies such as AT&T, A&E, Film&Arts and transform this  loner '&' to '&'

NickA
Rondon, so, the company names are always sandwiched between two other characters -is that correct?  Are the html entities always characterized by a different matching pattern (not sandwiched the same way)?
Arnold
We have a company/restaurant in Holland that is called Keuk& (translation Kitchen). Otoh the amp is not allowed in urls is it?
An idea might be to hardcode these few examples and transform them in an extra parse step, or just before returning the value from the db.
(forget the remark about url and &).
Rondon
yes.. Nick
I will have to scan all the "&" and compare this with html entities (&, ´) if those two words between '&', I have to keep those words and replace '&' with "&".
Rondon
I was trying to make a patch to web-to-plain.r from rebol.org

Tomc
@Rondon   would love to see what you come up with, web-to-plain.r  changes web encoded chars to accii plaintext so emitting an "&amp" in place of an "&" in the input would be counter to its nature.  maybe describe the problem a bit more, what the input is and what needs to be changed in the output.  Just looking at it now (after a decade) I think I would at least change the call from parse to parse/all

Endo
How do I use NOT in PARSE in R2?
;On R3
>> parse "a" [not "b" skip]
== true
On R2?
Pekr
Is 'not availabe in terms of R2 parse at all?
Endo
Better asking; how do I do on R2 something like:
PARSE/all "abc" ["a" not "x" "c"]  ;==true
@Perk: No, unfortunately not.
>> parse "x" [not "a"]
** Script Error: Invalid argument: ?native?
Endo
Normally I won't compare with one char, so using a complemented charset is not useful for me.
not-a: complement charset "a"
parse/all "x" [not-a] ; == true
Geomol
Won't this work?
>> not-x: complement charset "x"
>> parse "abc" ["a" not-x "c"]
== true
>> parse "axc" ["a" not-x "c"]
== false
Pekr
Geomol just beat me to that :-)
Endo
:-) I beat you both :P
Geomol
Why is complemented charset not useful?
Endo
Because I need to NOT a word.
Something like:
>> parse/all "this" [some ["this" (print "ok" halt) | skip] ]
ok
>> parse/all "this" [some ["that" (print "ok" halt) | skip] ]
== true
But I don't know how to stop PARSE in the first example.
Instead of HALTing.
Geomol
to end 1 skip
Which will always return false, I think.
or just: to end skip
Endo
Sorryi I confused I think, how do I write "except this one" like rule.
>> not-four: ["four" to end skip]
== ["four" to end skip]
>> parse/all "one two three" ["one " not-four " three"]
== false
Geomol
>> parse/all "one two three" ["one " [not-four | to " "] " three"]
== true
Or use block parsing, if that's an option.
argh :)
Doesn't work.
I guess, you need to include all the ok possibilities?
Endo
Block parsing could be useful but I'm parsing huge SQL files which are not LOADable (and different formats from each other).
Switching to R3 is easier.
I think I need to play with index positions during the parse.
Thank you for your time Geomol.
Geomol
welcome
Endo
I'll go with regular expression, ^((?!four).)*$
gives the lines does not contain "four".

sqlab
not-four: [[(not-four/2: [])  "four" (not-four/2:  [thru end skip] ) | to " "]  []]
>> parse/all "one two three" ["one " not-four  " three"  ]
== true
>> parse/all "one four three" ["one " not-four  " three"  ]
== false
Gabriele
A variation of sqlab's approach:
>> space: [some #" "]
== [some #" "]
>> parse/all "one two three" [(fail?: none) "one" space ["four" space (fail?: [end skip]) | to #" " space] fail? "three"]                                                == true
>> parse/all "one four three" [(fail?: none) "one" space ["four" space (fail?: [end skip]) | to #" " space] fail? "three"]                                               == false
>> parse/all "one five three" [(fail?: none) "one" space ["four" space (fail?: [end skip]) | to #" " space] fail? "three"]                                               == true
Gabriele
If your case is more specific maybe it can be done in a different way, like checking for the condition after parsing, or filtering out the input you don't want in advance, and so on.
In Topaz you could do something like:
>> parse [one two three] ['one either 'four [(false)] [skip 'three (true)]]
== true
>> parse [one four three] ['one either 'four [(false)] [skip 'three (true)]]  
== false
(no string parsing yet so I used a block to illustrate)

Ladislav
Endo, you can try my parse enhancements at
http://www.rebol.org/view-script.r?script=parseen.r&sid=ypnz89xc

Arnold
I have an SQL text with arguments in the form ":argument_1" How do I get a list of the used arguments used in this SQL using parse?

Endo
Something like this?
>> digit: charset [#"0" - #"9"]
>> alpha: charset [#"a" - #"z" #"A" - #"Z"]
>> alphanum: union alpha digit
>> validchars: union alphanum charser [#"_"] ;put any other valid chars here
>> sql: {select * from table where a = :param1 and b=:param2     or x=3}
>> parse/all sql [some [thru ":" copy p some avlidchars (print p) | skip]]
param1
param2
You may need to cleanup comments, You can use something like:
remove-sql-comments: has [m n] [
    parse/all read clipboard:// [
        some [
            m: "--" to newline n: (remove/part m n) :m
        |
            m: "/*" thru "*/" n: (remove/part m n) :m
        |
            skip
        ]
    ]
]

Arnold
Thank you Endo. This is very useful.
Endo
It is not complete, it doesn't care about comments in strings, but you got the idea.

szeng
Can anybody help me to replace all of "on-init" in the parttern of "space on-init non-word" with "abcd" in a string?
I've tried
space: charset [#" " #"^-"]
word: charset [#"a" - #"z" #"A" - #"Z" #"-"]
non-word: complement word
on-init-rule: [
    space mark: "on-init" non-word (
            remove/part mark 7 ;remove on-init
            insert mark "abcd"
            )
]
parse/all inp: {abcasdfasdf on-init
a on-init
b
} [
    any [
        thru on-init-rule
    ]
]
it failed with:
** Script error: PARSE - invalid rule or usage of rule: make bitset! #{0040000080}
** Where: parse do either either either -apply-
** Near: parse/all inp: {abcasdfasdf on-init
a on-init
b
} [
    any ...
>> q
DocKimbel
Here is a working version:
space: charset [#" " #"^-"]
word: charset [#"a" - #"z" #"A" - #"Z" #"-"]
non-word: complement word
on-init-rule: [
    space mark: "on-init" non-word (
        remove/part mark 7 ;remove on-init
        mark: insert mark "abcd"
    ) :mark
]
parse/all inp: {abcasdfasdf on-init
a on-init
b
} [some [on-init-rule | skip]]
szeng
Thanks Doc, I'''' give it a try
Yes, it works. Thanks!
DocKimbel
You're welcome.

Endo
parse/all #{010203} [thru #{03} (print ".")] ;works on R3 and Red but fails on R2, any workaround for this?
Arnold
There is no refinement all. Leave that out and the output is like the output for R3 with all refinement.
Rebolek
Arnold, there is /all refinement in R2.
sqlab
@Endo
parse/all  to-string #{010203}  compose [thru (to-string #{03})  ([(print ".")]) ]
Endo
So is the only workaround parsing binary! is converting to string!?
Although this one works, it looks parsing with binary! works but TO / THRU doesn't.
R2> parse/all #{010203} [#{010203}]
== true
Arnold
Sorry I only use a Red version from before the libRed changes.  that is why I got the message
red>> parse/all #{010203} [thru #{03} (print ".")]
*** Script Error: parse has no refinement called all
*** Where: parse
Endo
Sure, there is no /all in Red, it is default. I meant the difference of TO with binary!.
DocKimbel
Using a char! or string! as matching target works on R2:
parse/all #{010203} [thru #"^(03)" (print ".")]
parse/all #{010203} [thru "^(03)" (print ".")]
Gabriele
You can also use AS-STRING instead of TO-STRING so that there is no conversion really going on.
>> bin: #{010203}
== #{010203}
>> str: as-string bin
== "^A^B^C"
>> append str "A"
== {^A^B^CA}
>> bin
== #{01020341}

Endo
Thank you all, using char/string or as-string looks good solutions.

Last message posted 48 weeks ago.