Wikipedia:Reference desk/Archives/Computing/2020 October 12

= October 12 =

Java regex help
I am making a Java program that will parse PGN files. I need to make a regex that will identify a valid move in Algebraic notation. The way I understand it, it will need to match Strings that meet the following criteria in this order:


 * 1) (Optional) One of the following uppercase letters:
 * 2) (Optional) One of the following lowercase letters:
 * 3) (Optional) The lowercase letter
 * 4) A valid location on a chess board, such as   or  . This is not optional
 * 5) (Optional) An equals sign  followed by one of the following uppercase letters:

I am new to Java regex and I'm having some difficulty making this. I would appreciate if someone can point me in the right direction. Keep in mind, I intend to first trim the following trailing symbols from each token: !, ?, +, #, =, and others, so there is no need to test for those. -- Puzzledvegetable Is it teatime already?  01:24, 12 October 2020 (UTC) + minor edit -- Puzzledvegetable  Is it teatime already?  14:38, 12 October 2020 (UTC)


 * I don't know anything about Java, but I don't think you should use a regex for this
 * Maybe you should use a separate function instead? (deleted lots of text here, about examples: "e8, xe5, R1c4, Qa1xc3")


 * I found something from google that might help you: https://stackoverflow.com/questions/40007937/regex-help-for-chess-moves-san
 *  AltoStev Talk 06:20, 12 October 2020 (UTC)
 * You could, but it's not LL(1) because of the disambiguation part (#2), so a recursive-descent parser would need lookahead or backtracking, or an ugly conversion. fiveby(zero) 06:49, 12 October 2020 (UTC)
 * How about:
 * '''(K|Q|R|B|N)?((?:[a-h](?:[1-8])?)|[1-8])?(x)?([a-h][1-8])(=(?:Q|R|B|N))?
 * or
 * (K|Q|R|B|N)?((?:[a-h](?:[1-8])?)|[1-8])?(x)?([a-h][1-8])(=(?:Q|R|B|N))?(\+|#)?(!!|\?\?|!\?|\?!|!|\?)?
 * if decide not to trim the trailing chars. Paste into https://regexr.com to test and mouse over for explanation. You can click on details and enter moves into the text box to see what the capturing groups are doing. That's my reading of a SAN move from the PGN reference, added rank and file+rank disambiguation to your description. Didn't see '=' as a valid move suffix, and there isn't a stalemate indicator. Depending how you parsing you might want separate regex's for castles (O-O-O) rather than extend the above. By the way someone already converted the formal grammar to ANTLR. fiveby(zero) 06:30, 12 October 2020 (UTC)
 * Thanks for all your help. While I have you here, how are people learning regex? It seems so complicated. Where should I start if I want to learn this? I can't just post a question here every time I need to use one. -- Puzzledvegetable Is it teatime already?  14:37, 12 October 2020 (UTC)
 * The reference for java regex is in the javadoc for java.util.regex.Pattern. That site https://regexr.com/ is great for testing and experimenting with regular expressions. I suspect the above would look pretty straightforward if not for the capturing groups. Start with a simple expression such as [a-h][1-8], then try it out in the expression field at regexr.com and look at the explanation provided. Then gradually add more to the expression: (x)?([a-h][1-8]), and try it out. You usually don't need the full power of regular expressions to accomplish what you need, the simple constructs go a long way. By all means, feel free to ask more questions. fiveby(zero) 15:25, 12 October 2020 (UTC)
 * Regexes are great, but they're not the solution to every text processing problem. You state that you're writing a parser. Regexes are not what you use to parse a language. You need to read a theoretical computer science book or two on formal languages and parsers. This is assuming you're doing this as self-teaching. If you just have a problem you want to solve, then what you want is to use existing code (there are numerous software packages for handling PGN, many of which are linked in the article), or feed the formal language definition to a parser generator like ANTLR (as mentioned) and have it generate one for you. --05:46, 16 October 2020 (UTC)